[0001] This application claims the benefit of U.S. Provisional Patent Application No. 60/296,814 filed Jun. 7, 2001, the disclosure of which is hereby incorporated by reference.
[0002] 1. Field of the Invention
[0003] The present invention relates to a method and apparatus of runtime merging of hierarchical trees.
[0004] 2. Background Art
[0005] The Internet has made it possible for computer users to access to their applications, files, and data from any web-enabled device. The applications, files, and data are usually stored on a server and are typically organized in a hierarchical manner. There are preferences associated with each layer in the hierarchy so that each user has a unique presentation of the applications, files and data. The hierarchy is normally stored using a document object model (DOM) and might access data in an extensible markup language (XML) format.
[0006] When multiple preferences exist in the hierarchy, the system must choose between equivalent nodes in multiple trees, so that the preference that takes precedence is chosen in a final “merge tree”. To do so, however, is an expensive process as will be further explained below. Before discussing this problem, an overview is provided.
[0007] Internet
[0008] The Internet is a network connecting many computer networks and is based on a common addressing system and communications protocol called TCP/IP (Transmission Control Protocol/Internet Protocol). From its creation it grew rapidly beyond its largely academic origin into an increasingly commercial and popular medium. By the mid-1990s the Internet connected millions of computers throughout the world. Many commercial computer network and data services also provided at least indirect connection to the Internet.
[0009] The original uses of the Internet were electronic mail (e-mail), file transfers (ftp or file transfer protocol), bulletin boards and newsgroups, and remote computer access (telnet). The World Wide Web (web), which enables simple and intuitive navigation of Internet sites through a graphical interface, expanded dramatically during the 1990s to become the most important component of the Internet. The web gives users access to a vast array of documents that are connected to each other by means of links, which are electronic connections that link related pieces of information in order to allow a user easy access to them. Hypertext allows the user to select a word from text and thereby access other documents that contain additional information pertaining to that word; hypermedia documents feature links to images, sounds, animations, and movies.
[0010] The web operates within the Internet's basic client-server format. Servers are computer programs that store and transmit documents (i.e., web pages) to other computers on the network when asked to, while clients are programs that request documents from a server as the user asks for them. Browser software allows users to view the retrieved documents. A web page with its corresponding text and hyperlinks is normally written in HTML or XML and is assigned an online address called a Uniform Resource Locator (URL).
[0011] XML DOM
[0012] XML is emerging as the next generation of markup languages. XML DOM details the characteristic properties of each element of a web page, thereby detailing how one might manipulate these components and, in turn, manipulate the page. Each component is stored in memory. Components include for instance, objects, properties, methods, and events. An object is a container which reflects a particular element of a page. Objects contain the various characteristics which apply to that element (known as properties and methods). For example, the submit object contains properties and methods relevant to the submit button in a form.
[0013] Properties are characteristics of an object; for example, the document object possesses a bgColor property which reflects the background color of the page. Using a programming language (e.g., JavaScript) one may, via this property, read or modify the color of the current page. Some objects contain very many properties, some contain very few. Some properties are read-only, while others can be modified, possibly resulting in immediate on-screen results.
[0014] A method typically executes an action which somehow acts upon the object by which it is owned. Sometimes the method also returns a result value. Methods are triggered by the programming language being used, such as JavaScript. For example, the window object possesses a method named alert ( ). When supplied with string data, the alert ( ) method causes a window to pop up on the screen containing the data as its message; (e.g., alert(“Invalid entry!”)).
[0015] An event is used to trap actions related to its owning object. Typically, these actions are caused by the user. For example, when the user clicks on a submit button, this is a click event which occurs at the submit object. By virtue of submitting a form, a submit event is also generated, following the click event. Although these events occur transparently, one can choose to intercept them and trigger specified program code to execute.
[0016] Preferences
[0017] Using a web-enabled device to access data, files, and applications over the Internet significantly reduces issues associated with installation, configuration, maintenance, upgrades for end users and information technology (IT) departments. Furthermore, it eliminates license fees and lowers the total cost of ownership for enterprise and service providers. Enterprises are generally hierarchical in nature. Application and user preferences stored for each user running desktop applications are mostly collected from more than one layer.
[0018] For example, an organization can have users belonging to a group. Any preference data that is absent in the user layer can be picked up from the group layer. Groups can have system-wide global administrators that can dictate enterprise level policies. Layering data in IT organization provides a hierarchical structure to data and decreases redundancy in data storage. Imagine a specific preference data for
[0019] In the prior art, the second alternative is picked more frequently than the first one. For systems where the backend data store for configuration data is XML stored in flat files, this data is fetched through a configuration server which helps the user to read, edit and delete application data through a pure application program interface (API) that is platform independent, such as the Java API. A call to read preference data is translated to the correct set of XML files from the required layers and the data is merged at runtime to create the resultant data for the end client application. Often, the task of merging is left to the configuration server that uses pure DOM API to read XML from all the required layers and merge them into a temporary DOM tree that is written back to a socket stream.
[0020] Problems Associated with Merging DOM Trees On The Fly
[0021] Though the DOM API is simple and easy to use, it comes with the price of memory allocations that are proportionate to the size of data (e.g., XML data) and the depth of nesting. In general, tree traversals are expensive and it is wise to complete the decision of choosing the right information while traversing the tree for the first time. The difficulty in traversing unbalanced trees is due to the fact that leaf nodes present at a particular depth in one tree may not be present in the other trees that need to be traversed. XML DOM exacerbates the problem by not allowing data in a DOM tree to be manipulated without copying the node. Merging data from various layers involves two major traversals:
[0022] 1. The first traversal should traverse all the XML trees from the various layers to find the winning nodes (nodes that should be present in the merged tree). This can be done by copying the data to a temporary tree through the use of the cloneNode( ) method from the DOM API.
[0023] 2. Writing the temporary tree to the output stream used by a TCP socket that transports the data to the client.
[0024] Unfortunately, cloneNode( ) is the most expensive call in an XML parser. It has a list of drawbacks that makes it very unpopular to XML users:
[0025] 1. The call is recursive and so it takes a lot of stack space depending on the depth and size of the XML node that is cloned. It also allocates internal data structures that help build the cloned node that are not freed until the call is complete.
[0026] 2. Cloning large trees can prove to be very expensive specifically when it is used frequently in a multithreaded environment.
[0027] 3. Cloning creates a second copy of the data. If the data is merely meant for reading, creating a copy of the same data does not provide any advantage. In fact, it is disadvantageous because it takes more memory and CPU cycles from the machine.
[0028] 4. The call to cloneNode( ) is typically used often.
[0029] Embodiments of the present invention relate to a runtime merge system with a reference node implementation. According to one or more embodiments of the present invention, a reference node is implemented which holds a reference to a node in a DOM tree active in memory. A reference node class is implemented in one embodiment, which allows adding nodes to the merged tree without having to make a clone of the node, which is an expensive operation.
[0030] In one embodiment, if a particular node is not present below a certain level of the tree in any layer except a unique layer, it renders visiting the children of that node unnecessary. In a typical scenario, the user tree may not have any information about a component (Office, for instance). Without a reference node, one had to visit every node in the “Office” component to make a copy of the data. With the use of reference nodes, the invention neither traverses the children of such a node, nor copies any of its contents to a new node. A reference is simply kept to the node in the memory and later this reference is used to print the data to a stream.
[0031] These and other features, aspects and advantages of the present invention will become better understood with regard to the following description, appended claims and accompanying drawings where:
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039] The present invention relates to a method and apparatus for runtime merging of hierarchical trees. In the following description, numerous specific details are set forth to provide a more thorough description of embodiments of the invention. It will be apparent, however, to one skilled in the art, that the invention maybe practiced without these specific details. In other instances, well known features have not been described in detail so as not to obscure the invention.
[0040] Runtime Merge System
[0041] In one embodiment, a merge system is implemented which traverses DOM trees only once to decide on the winning nodes for the merged XML data (i.e., which preference takes precedence). Moreover, the system eliminates the use of cloneNode( ) from the whole infrastructure by defining a new concept of “Reference node.” A reference node is a node that keeps a reference or pointer to the location of the XML node rather than a copy of the node. The result is a very fast and efficient merging system with minimal allocation of memory.
[0042] A merged tree refers to the merging of layers of XML node data, where the node chosen to be included in the merged tree is determined by the highest priority layer. For instance, assume that the merged data for the user preference came from three layers of data: System, Group, and User, in the order of increasing priority. The User layer has the highest priority, followed by Group and then System. Equivalent nodes in each tree are compared. The node in the layer of highest priority, among the layers represented, is added to the merged tree. Once all the nodes have been compared, the decision to add which node in the merged tree (which we refer to as the winning node) is complete.
[0043]
[0044] Reference Node Implementation
[0045] Unlike prior art methods that stored the entire fragment of a DOM tree inside memory, embodiments of the present invention use a reference node that holds a reference to a node in a XML DOM tree active in memory. The reference may be a Java reference, a pointer, or other suitable reference to a fragment of an XML DOM tree held in an application cache. A reference node class allows the adding of nodes to the merged tree without having to make a clone of the node. This class helps us to eliminate cloning nodes, which is an expensive operation. The concept of reference node is simple, yet very powerful. It not only saves copying data, but also avoids visiting every child of a node while building the merged tree.
[0046] If a particular node is not present below a certain level of the tree in any layer except a unique layer, it renders visiting the children of that node unnecessary. In a typical scenario, the user tree may not have any information about a component (Office, for instance). Before the introduction of the concept of reference node, one had to visit every node in the “Office” component to make a copy of the data. With the use of reference nodes, the invention neither traverses the children of such a node, nor copies any of its contents to a new node. A reference to the node is simply kept in the memory.
[0047] In one embodiment, the reference node is a class (defined in Java) similar to classes found in other high level programming languages like C++ that have data members and member functions. The reference node class of the present invention contains an elementNode data member, and a few member functions which include Set (to create a new reference node), Get (to fetch a reference node), Name (to obtain the name of a reference node), and WritetoXML (to write a reference node to its tree, for example, in the form of I/O, disk, or stream on cache). This makes the reference node use minimal memory.
[0048] The elementNode data member extends the XML public class Node, and is hence a standard elementNode found throughout the current computing environment, especially the World Wide Web environment. The reason for having a standard elementNode member is that if in the future a fragment from a different DOM partner needs, for example, to be merged, then the definition of an elementNode of the present invention will work properly.
[0049]
[0050]
[0051] Embodiment of an XML Merging System
[0052] In one embodiment, the merging system is configured to operate on XML data. This embodiment is shown in
[0053] At box
[0054] A shallow clone (a copy of just the root node of the subtree and its attributes, but not including any children) of the node in the highest priority layer is made and added to the temporary XML document at box
[0055] If there are no more children at box
[0056] Clone
[0057] The XML language is structured as a tree, which makes editing of the tree easy and efficient. For example, in order to edit a portion of the tree, a copy or clone of the pertinent portion is made first. This clone is completely detached from the original portion. In other words, any edits made to this clone are not reflected in the original portion. This security feature allows a user the ability to clone a portion of his/her tree to another user who can make edits to the loaned portioned without affecting the original portion. This clone comes in two varieties, a shallow clone and a deep clone.
[0058] A shallow clone of a tree is a copy of only the first layer of the tree, while a deep clone is a copy of the entire tree, which may have several layers. Embodiments of the present invention eliminate the use of deep cloning, which is not only an expensive operation, but uses computer memory wastefully. The shallow clone, like the wrapper or reference nodes is light weight (uses memory sparingly).
[0059] Case Study
[0060] Embodiments of the present invention include an enterprise where desktop applications are accessed via a computer network. For simplicity, the users in the organization may have their data divided into three layers; namely a user layer, a group layer, and an administrative layer (admin). The group layer defines the set of data that is common to all users in the same group (for example, an engineering group or a human resources group). Administrators keep data that are common to the organization or all the groups in the organization. If the user of a group would like to fetch the preference data that pertains to any office application, the information is distributed in three layers, each represented as a DOM tree in memory.
[0061] Nodes (carrying preference data) in the user layer have precedence over the group layer and the group layer has precedence over the administrative layer. So, if the same node is present in the user and group layer, the winning node is picked from the user layer. Similarly, if the node is present in the group and the admin layer, it is picked from the group layer, etc.
[0062] Though an organization can have as many layers as possible, the following seven cases uniquely identifies the conditional blocks in one embodiment of an iterative merge system working on three layers (User, Group, Admin)
[0063] Case 1: XML node present in all the three layers (A,G,U).
[0064] Case 2: XML node present in User and Group (U,G) layers and not present in Admin layer.
[0065] Case 3: XML node present in Group and Admin (G,A) layers and not present in User layer.
[0066] Case 4: XML node present in User and Admin (U,A) layers and not present in Group layer.
[0067] Case 5: XML node present in User (U) only and not present in Admin and Group layers.
[0068] Case 6: XML node present in Admin (A) only and not present in User and Group layers.
[0069] Case 7: XML node present in Group (G) only and not present in Admin and User layers.
[0070]
[0071] To achieve a merged tree from the trees in
[0072] Iteration 1: An XML Document element is created with no leaf nodes attached to it. This temporary document is used to create the merged tree.
[0073] Iteration 2: Node “A” is not present in any preceding layers. So a reference node from “A” is created (a node that holds reference to the actual data) and added to the merged tree under “Office“.
[0074] Iteration 3: Node “B” is present in both Group and User and so a shallow copy of just “B” is made from the user layer and added to the merged tree under “Office”.
[0075] Iteration 4: Node “C” is present in both Group and Admin, so a shallow copy of just “C” is made from the group layer and added to the merged tree under “Office”.
[0076] Iteration 5: Node “D”is only present in admin and so a reference for this node is created and added to the merged tree under “Office”.
[0077] Iteration 6: The child under “B” is present only in the Group layer, so a reference node is made of the leaf node, “N”, in the Group layer and added to the merged tree under “Office/B”.
[0078] Iteration 7: The child under “C” is present only in the Admin layer. So a reference is made of the leaf node, “M”, from the Admin layer and added to the merged tree under “Office/C”.
[0079] Iteration 8. Now that the merged tree is created, a recursive print function iterates through the whole merged tree built out of references, and prints the nodes to a stream. One difference with this print function and a standard print function in the parser is that the enhanced print function understands the concept of reference nodes. Once the enhanced print function hits a reference node, it knows how to extract the actual element node in the reference and call its “print” function.
[0080] In one embodiment, the whole mechanism is based on object oriented programming and so each type of node in the tree will do its work to print itself to the stream.
[0081] The system of various embodiments of the present invention eliminates the use of cloning nodes and introduces a new concept of reference nodes that helps documents (e.g., XML documents) to be merged at runtime. The data structure required for the temporary merged tree is minimized because the data structure contains just two entities, reference or wrapper nodes, and shallow clones. Both entities use memory sparingly, thus saving the copying of the same data in multiple places. For example, laboratory experiments of a merged tree formed from 40 Kbytes of data obtained from an Admin tree and 30 Kbytes of data obtained from a User tree occupied approximately 50 bytes of memory space. In prior art, the merged tree would have been almost the same size (70 Kbytes) as the combined data size of the Admin and User trees, but using embodiments of the present invention this size is substantially reduced because of the use of shallow clones and reference nodes. The new runtime merge system increases the speed of merging XML data at runtime. In one embodiment, speed is increased by approximately five times over prior art methods. Embodiments of the present invention also reduce a bottleneck of performance (the use of cloneNode( ) to replicate information for XML nodes), improves the response time for clients to fetch preference data, merges data effectively, introduces the use of reference nodes in XML to eliminate the copying of data, and considerably reduces the size of the intermediate data structures that need to be allocated for merging XML documents. In one embodiment, the invention is used to merge data of approximately 30 Kb in Admin tree, 5 Kb in Group tree, and 3 Kb in User tree in approximately 25 to 30 milliseconds.
[0082] Embodiment of Computer Execution Environment (Hardware)
[0083] An embodiment of the invention can be implemented as computer software in the form of computer readable program code executed in a general purpose computing environment such as environment
[0084] Computer
[0085] Network link
[0086] Processor
[0087] Computer
[0088] As with processor
[0089] The mass storage
[0090] In one embodiment of the invention, the processor
[0091] Computer
[0092] Application code may be embodied in any form of computer program product. A computer program product comprises a medium configured to store or transport computer readable code, or in which computer readable code may be embedded. Some examples of computer program products are CD-ROM disks, ROM cards, floppy disks, magnetic tapes, computer hard drives, servers on a network, and carrier waves.
[0093] The computer systems described above are for purposes of example only. An embodiment of the invention may be implemented in any type of computer system or programming or processing environment.
[0094] Thus, a method and apparatus for a runtime merging of hierarchical tree systems using a reference node implementation is described in conjunction with one or more specific embodiments. The invention is defined by the accompanying claims and their full scope of equivalents.