Plaque It!
Sponsored by: Flash of Genius |
[0001] This application is a continuation in part of application Ser. No. 10/335,272 filed Jan. 31, 2002, now pending. The Ser. No. 10/335,272 application is incorporated by reference herein, in its entirety, for all purposes.
[0002] The present invention relates to network management. More specifically, the present invention is an automated change management system and method to manage diverse management functions across a network in an automated fashion.
[0003] It is difficult to imagine a communication process that does not involve a collection of devices connected by a network. Networks carry voice and data communications for communication, entertainment, business, and defense endeavors to name but a few. For a variety of reasons, most networks are collections of smaller sub-networks that are managed first at the sub-network level and then at the integrated network level. Management comprises configuring devices for connection to the network, monitoring and reporting on network and device loads, and managing device failure.
[0004] A device is often managed by a variety of applications depending on the function to be managed. For example, the workload of a device may be managed by application A supplied by vendor A and the configuration of a device may be managed by application B supplied by vendor B. In this example, application A is configured via a script to manage device A and reports its results to a workload database. Application B is configured using a text file to manage the configuration of device B and reports its results to a configuration database. Typically, applications A and B cannot directly communicate with each other or share data.
[0005] In modern day networks such as wireless networks, intranets or the Internet, there are a number of network devices of various types. Such network devices may be workstations, routers, servers, and a wide variety of other smart devices that appear on networks. Network management tools have evolved to manage these devices. As networks have increased in size and complexity, network management functions have become increasingly resource intensive.
[0006] Network management comprises a number of functions, including (but without limitation) fault management, configuration management, performance management, security management, inventory management and cost management. Of these functions, configuration management is of particular importance as it affects in varying degree the effectiveness of the other network management systems in managing all of the other functions.
[0007] Most devices and applications on a network (sometimes collectively referred to as objects) are designed to be configured, thus broadening the applications for which a particular object can be used. The information comprising an objects configuration is both object and context dependent. That is, the configuration of a device may depend on the device, where in a network it is installed, what it is connected to, what applications it is intended to run, and the like. In order for a network to operate efficiently, the configuration of the various objects comprising the network must be known at all times. An unplanned change in the configuration of a router, for example, may cause the network performance to deteriorate or to fail altogether, may result in increased error reporting and error correction processing time, and cause the network operator to expend resources to locate and correct the configuration error.
[0008] Network management tools have been developed to detect changes in the configurations of critical network components. These tools monitor the configuration files of such objects, issue alarms when a change is detected, and offer manual or automatic restoration of the changed configuration file to a file known to be good. However, current configuration monitoring tools are reactionary. Such tools can determine that a configuration has changed, but cannot initiate a reconfiguration of specific devices or applications oil the network or sub-network, or relate the configuration of one device on a network to another device on that network without human intervention. Rather, many traditional network management systems are maintained by hand-entering device lists into individual network management applications with no common-ties between the different applications.
[0009] Whenever a network device is changed or upgraded, it frequently becomes necessary to insure that the upgrade is populated throughout the network in order for devices to talk to one another in an error free way. The difficulty with updating distributed network devices is that this typically occurs on a device-by-device basis. Therefore the possibility of human error is ever present. Misentering or omitting device information into different network management applications results in a network that is not effectively managed. Further, if different network management applications are present on various network devices, over time, the network applications become increasingly asynchronous resulting in critical failures and the potential for loss of visibility on the network of various devices.
[0010] At any point in time, it is desirable for a network management application to know the configuration of each configurable device that such network management application is managing. This is accomplished by the network management application polling the managed devices and keeping a record of the polled data. However, networks with a large number of network management applications have difficulty synchronizing against a single inventory of devices and synchronizing device status over all of the network management applications. And, as previously noted, the network management applications are typically from diverse vendors and may not be able to communicate with each other. The result is that over the network, the data used to manage the configuration of network devices and network device polling applications is not current, and becomes less current (more asynchronous) as time goes on.
[0011] Various approaches to improving network management systems have been disclosed. U.S. Pat. No. 5,785,083 ('083 Patent) to Singh, et al. entitled “Method And System For Sharing Information Between Network Managers,” discloses a technique for managing a network by sharing information between distributed network managers that manage a different portion of a large network. Databases in the different network managers can be synchronized with each other. The information that is shared is to be used by an end-user who monitors the network and takes corrective action when necessary.
[0012] U.S. Pat. No. 6,295,558 ('558 Patent) to Davis, et. al., entitled “Automatic Status Polling Failover For Devices In A Distributed Network Management Hierarchy,” discloses an automatic failover methodology whereby a central control unit, such as a management station, will automatically takeover interface status polling of objects of a collection station that is temporarily unreachable. The '558 Patent teaches a failover methodology that reassigns polling responsibility from a failed collection station to a central control unit (such as a management station). A polling application at the central control unit obtains the topology of the failed collection station and performs polling until the polling station returns to operational status.
[0013] U.S. Pat. No. 6,345,239 (the '239 Patent) to Bowman-Amuah, entitled “Remote Demonstration Of Business Capabilities In An E-Commerce Environment,” discloses and claims a system, method and article of manufacture for demonstrating business capabilities in an e-commerce environment. The '239 Patent discloses, but does not claim, network management functionality that refers to synchronization of configuration data over a communication system as an objective. The disclosures, made in the context of a discussion of a network configuration and re-routing sub-process, describe functions but not means.
[0014] U.S. patent application Ser. No. 20020057018 (the '018 Application) to Branscomb, et. al., entitled “Network device power distribution scheme,” discloses and claims a telecommunications network device including at least one power distribution unit capable of connecting to multiple, unregulated DC power feeds. The '018 Application further discloses (but does not claim) an approach to a network management system that features a single data repository for configuration information of each network device. Network servers communicate with network devices and with client devices. Client devices communicate with a network administrator. The administrator can use a client to configure multiple network devices. Client devices also pass configuration requirements to the network servers and receive reports from network relating configuration data of network devices. According to this approach, pushing data from a server to multiple clients synchronizes the clients with minimal polling thus reducing network traffic. Configuration changes made by the administrator directly are made to the configuration database within a network device (through the network server) and, through active queries, automatically replicated to a central NMS database. In this way, devices and the NMS are always in synch.
[0015] The approaches described in these references are those that relate to management of the network manually. What would be particularly useful is a system and method that automates the change management process in real-time using a two-way communications model that permits a central database to affect changes on all or some network management applications/systems in the field, while also allowing those same field systems to affect the central database. It also would be desirable for such a system and method to update all network management applications on the network upon the occurrence of a change in a network device and to manage failover through logically assigned buddies. Finally, such a system and method would also decrease the errors associated with human intervention to update network management applications.
[0016] An embodiment of the present invention is a system and method for managing and synchronizing network management applications from a common source. A change management process is automated by employing a real time two way communications model that permits a central database comprising the latest network management software and configuration to effect changes on all or some network management applications and systems in the field.
[0017] It is therefore an aspect of the present invention to eliminate human errors associated with updating network management applications.
[0018] It is a further aspect of the present invention to insure that network applications are synchronized when a network device is added or removed, or when the configuration of a network device is changed.
[0019] It is yet another aspect of the present invention to significantly reduce the time required to update network monitoring systems when device changes occur in the network.
[0020] It is still another aspect of the present invention to create and install a configuration file on the network management system applications for any new network device added to the network.
[0021] It is still another aspect of the present invention to provide application fail over capabilities for those devices using the same application and between different applications on a network according to certain rules and based on logically assigned backup servers (“buddies”).
[0022] It is yet another aspect of the present invention to automatically detect changes in devices on the network and immediately update all network management system applications associated with changed devices.
[0023] It is still another aspect of the present invention to update a central database concerning all network management applications and devices on the network.
[0024] It is still another aspect of the present invention to maintain complete synchronization of all devices that are being monitored on a network.
[0025] These and other aspects of the present invention will become apparent from a review of the description that follows.
[0026] In an embodiment of the present invention, a change management engine synchronizes the configuration of distributed network management applications, as well as synchronize device status from those same distributed network management applications with a central database. “Change management” as used in this context means the process by which network management poller and aggregation applications are synchronized to the exact configurations of the devices they monitor in real-time without human intervention. The network can be a wired, or wireless network. Further, embodiments of the present invention operate on an intranet, the Internet, or any other wired or wireless network that is to be managed as an entity. These embodiments operate in an application-diverse environment allowing the synchronization of networks that use applications of different vendors to perform various network management functions.
[0027] In an embodiment of the present invention, the change management process is automated by employing a real time two way communications model that permits a central database comprising the latest network management software and configuration to effect changes on all or some network management applications and systems in the field. In this embodiment, field systems also affect the central database by transmitting polled information into that database. Each network device is entered into a central database one time. After the initial data entry, this embodiment of the present invention handles all of the processes associated with configuring different and distributed network management systems and applications in the field. Thus, this embodiment of the present invention acts as a manager of other system managers in order to insure that all network management applications are synchronized across the network and binds many disparate functions of change management under one control model. Further, automating the configuration process reduces the risk that human error will disrupt the monitoring of critical systems.
[0028] In yet another embodiment of the present invention, the process of handing over tasks of a failed monitoring device (fail over) is managed in real-time fail over capability. This embodiment allows a single graphical user interface to be the means of monitoring a plurality of devices over the network. The plurality of devices is polled by any number of different servers and applications with responses from the polling reported via Simple Network Management Protocol (SNMP) to a central database. Thus a unified view of the status of each of the devices on the network is created and monitored.
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039] The description of the present invention that follows utilizes a number of acronyms the definitions of which are provided below for the sake of clarity and comprehension.
[0040] APISC—Application Programming Interface Super Controller
[0041] ASCII—American Standard Code for Information Interchange
[0042] DIDB—Device Inventory Database
[0043] DPM—Data Poller Module
[0044] DSM—Distributed Status Monitor
[0045] FTP—File Transfer Protocol
[0046] GUI—Graphical User Interface
[0047] ID—Identification
[0048] IP—Internet Protocol
[0049] NDB—Network Database
[0050] NMS—Network Management System
[0051] NOC—Network Operations Center
[0052] Object—a network application or network device that is configurable.
[0053] ODBC—Open Database Connectivity
[0054] OID—Object Identifier
[0055] OSPF—Open Shortest Path First Interior Gateway Protocol
[0056] PDC—Regional Data Center
[0057] SNMP—Simple Network Management Protocol
[0058] TMP—Temporary
[0059] In addition, certain NMS software products are referred to by their product names, which include the following:
[0060] Netcool (MicroMuse, Inc.)
[0061] Visionary (MicroMuse, Inc.)
[0062] Internet Service Monitor or “ISM” (MicroMuse, Inc.)
[0063] Remedy (BMC Software, Inc.)
[0064] Referring to
[0065] Central database
[0066] Each NMS poller server/data aggregator pair manages the sub-network to which it is assigned by polling the sub-network for relevant data. The particular tasks performed by a NMS poller server depend on the application software running on that server. Typical tasks include monitoring network devices for changes in configuration, performance, load, and environmental parameters, analyzing the data received from network devices, and sending the data to the central database
[0067] In the NMS illustrated in
[0068] Referring now to
[0069] In an embodiment of the present invention, application server
[0070] In an embodiment of the present invention and as illustrated in
[0071] Core engine
[0072] In another embodiment, the autocontroller resides on each server that contains network management applications requiring core engine control. The autocontroller installs updated configuration files, launches and restarts applications, executes shell commands, parses and analyzes output files, returns any requested results back to be the core engine, and backs up another autocontroller (a “buddy”). With respect to this latter function, an autocontroller is capable of performing the functions of its buddy autocontroller should the buddy autocontroller experience a failure. Additionally, each autocontroller comprises redundancy features to determine when the assigned buddy autocontroller fails or becomes unreachable. While
[0073] The network management systems illustrated in
[0074] Referring to
[0075] The new configuration data are stored in the DIDB (
[0076] The change management process illustrated in
[0077] The exemplary embodiments that follow are intended to illustrate aspects of the present invention, but are not meant as limitations. As will be apparent to those skilled in the art, the present invention may be practiced in embodiments other than the exemplary embodiments described herein without departing from the scope of the present invention.
[0078] A. The Core Engine
[0079] Referring to
[0080] The core engine
[0081] In another exemplary embodiment, the core engine uses static memory resident structures
[0082] In another exemplary embodiment of the present invention, the core engine comprises a data poller module (DPM)
[0083] In yet another exemplary embodiment, the DPM
[0084] With respect to devices that have been flagged as “changed”, the core engine
[0085] B. The Autocontroller
[0086] Referring to
[0087] According to the exemplary embodiment illustrated in
[0088] One of the primary functions of the autocontroller is to update files for network management applications in the field with files created by the core engine. After being generated by the core engine, the freshly created configuration files, binary files, modules and the like are transferred to the appropriate application server. In an exemplary embodiment of the present invention, this transfer is accomplished via file transfer protocol (FTP) or secure protocol (SCP) and the transferred filed is stored in an incoming directory
[0089]
[0090] The first word, acfile, identifies the file as one that the autocontroller should process. The <ID> represents the instance number in the meta-data configuration file. The <TAG> is one of the filename and tags listed in the table above. The optional [DSM] defines the DSM to which this file pertains, and is used by the event reporting module and applications running on the NMS poller servers. As will be apparent to those skilled in the art, other file formats capable of conveying file, TAG, and DSM identifying information may be employed without departing from the scope of the present invention.
[0091] Each application governed by the autocontroller is unique and requires customized code for such management tasks as being stopped, started, restarted, manipulated, or directed. To that end, the autocontroller has an application code module
[0092] If the autocontroller successfully processes a given transfer file, the file is compressed and archived in a storage directory
[0093] The shell command processor
[0094] The shell commands executed using this feature run from the same account as the autocontroller, which is never the root user. Each command is run individually and has its output directed to a log file that the autocontroller will later analyze and return to the core engine as a result file. This logging allows the core engine to confirm that each shell command executed properly, and provides an easy mechanism for gathering data from the field servers. The format of the shell command input file consists of each shell command to be executed on a single line of ASCII text.
[0095] According to an exemplary embodiment, a result analyzer module
[0096] In its simplest form for shell commands, the result analyzer module
[0097] In a more complex context, a result analyzer module
[0098] A file return module
[0099] This generic operation of the file return module
[0100] In another exemplary embodiment, each autocontroller supports a redundancy module
[0101] The autocontroller has an internal ping module
[0102] Similarly, in another exemplary embodiment of the present invention, if connectivity to a buddy autocontroller is lost the autocontroller redundancy module initiates tasks to reestablish communication with the buddy autocontroller. The following cause/effect scenarios are accounted for in this embodiment of the autocontroller redundancy module:
[0103] Cause. Connectivity to the APISC core server is lost.
[0104] Effect.
[0105] All autocontroller instances involved will send alarm traps and e-mails, and log the event.
[0106] The autocontroller will launch one or more backup instances of the error reporting module in order to capture critical SNMP data in local files, which can then be transferred and uploaded to the NDB later.
[0107] When the core engine becomes reachable again, it commands the autocontroller to resume normal communication with the core engine.
[0108] The backup error reporting instances are shut down and their locally held data files are moved into the outgoing directory for transport.
[0109] Once in the outgoing directory the return file module will handle the actual transport back to the core engine.
[0110] Cause. Connectivity to a buddy NMS poller server is lost.
[0111] Effect.
[0112] All autocontroller instances involved will send alarm traps and e-mails, and log the event.
[0113] The autocontroller will launch a backup instance of the DSM to support and poll the devices normally polled by the unreachable buddy. This involves launching DSM No.
[0114] The autocontroller used by the event reporting servers will launch a modified version of event reporting module
[0115] C. Core Engine Configuration
[0116] According to an exemplary embodiment of the present invention, the core engine utilizes two configuration files to perform all of its necessary operations: Meta-Configuration and object identifier (OID) configuration. These files contain specific instructions for the management of network management applications. In this exemplary embodiment, the core engine and the autocontroller use the same Meta-configuration file, which allows the core and field elements to remain completely synchronized. The configuration file is read in when the autocontroller boots. This file is broken down into three main sections using a single simplified attribute/value pair table that is designed for direct integration with the DIDB database. In this manner, the DIDB control the activities of each field autocontroller instance. The Meta-configuration file contains three fields, an integer ID field and attribute/value pair fields. The ID number determines the application instance to which each attributes/value pair belongs. The first section designates the core engine core, the second the autocontroller, and the remaining sections are for each application instance.
[0117] Referring to
[0118] Another attribute of this format is that it is standardized and can be easily understood. The purpose of each variable is incorporated into its name, using a logical naming convention. If more than one word comprises a variable, each word in the variable is capitalized (example: PollingSite). The meta-data design is completely extensible out to an infinite number of application instances without requiring structural changes. This feature of the configuration file is especially useful in network management systems with large network device inventories.
[0119] The meta-data format further accommodates the creation and propagation of the same network management tool's configuration file to several locations. For example, multiple instances of an application may unique instances defined in the configuration file. Because both the core engine and each autocontroller use the same configuration file, the core engine core and the inventory of autocontrollers are always synchronized with one another.
[0120] At application boot time, the autocontroller attempts to connect to the DIDB and read its meta-configuration file using scripts. If this succeeds, a fresh local backup of the meta-configuration is saved to disk. If it fails, the autocontroller issues an alarm and falls back to the last known good copy of the meta-configuration file stored on disk. Once the meta-configuration file is read, it is stored in memory structures that mimic the file structure.
[0121] Referring to
[0122] As illustrated in
[0123] a Loopback IP the IP address of the device listed in the DIDB. This field acts as the primary key for each device;
[0124] SNMP index—the integer SNMP index value for the device interface to which this OID applies. A value of ‘0’ indicates that the OID is a chassis OID and thus does not apply to any interface. The value of ‘−1’ indicates that the OID should apply to all interfaces on the device;
[0125] OID—the dot-notated form of the OID being polled;
[0126] Polling frequency—how often the OID is to be polled in seconds. A value of 300 thus indicates that the OID is to be polled once every five minutes; and
[0127] Status—an integer binary (0/1) that determines whether the OID is active or inactive. In the exemplary embodiment, the status field is used to turn off regularly scheduled polling of four OIDs during outages, maintenance windows, failover scenarios, and the like.
[0128] The OID configuration file is similar in structure to a base configuration file, with the addition of two fields—‘Polling Interval’ and ‘Status’. The format thus allows each device and device interface known to the DIDB to have OIDs defined at custom intervals for retrieval, storage in the NDB, and reporting. Another similarity to the base meta-configuration file is that the OID configuration file is prepared from a table in the DIDB schema, and the same OID configuration file is used by all autocontroller instances.
[0129] The present invention has been described in the context of a network manage system in which the data to be synchronized comprises configuration data. The invention is not so limited. In another embodiment, the “network” is a distributed financial system and the data to be synchronized financial variables that are used by various applications of the financial system. In this embodiment, the central database receives reports of changes in financial variables from information gathering applications across a financial network. The core engine monitors the central data structure, determines if a financial variable has changed within the network, then populates the changes to all network applications. In this way, the financial network is “synchronized” as to the variables that are deemed important to the functioning of the financial network. As those skilled in the art of the present invention will appreciate, the present invention can be applied to any system in which disparate components benefit from synchronization (such as billing systems and weather systems) without departing from the scope of the present invention.
[0130] A system and method for the configuration of distributed network management applications and devices has now been illustrated. The management of these devices and applications (sometimes collectively referred to as “objects”) is performed without human intervention. Although the particular embodiments shown and described above will prove to be useful in many applications relating to the arts to which the present invention pertains, further modifications of the present invention herein disclosed will occur to persons skilled in the art. All such modifications are deemed to be within the scope of the present invention as defined by the appended claims.