[0001] This application is a continuation-in-part of U.S. patent application Ser. No. 09/842,474, titled “System and Method for Accessing Information Content” and filed on Apr. 25, 2001, which is incorporated herein by reference in its entirety, and which in turn claims priority to U.S. provisional application Serial No. 60/199,858 filed on Apr. 26, 2000, which is also incorporated herein by reference in its entirety. This application is also a continuation-in-part of U.S. patent application Ser. No. 09/843,036, titled “System and Method for Adapting Information Content for an Electronic Device” and filed on Apr. 25, 2001, which is incorporated herein by reference in its entirety, and which in turn claims priority to U.S. provisional application Serial No. 60/199,858 filed on Apr. 26, 2000.
[0002] This present invention relates generally to the field of network communications. More specifically, it relates to a system and method for accessing, adapting, and presenting information content for electronic devices.
[0003] Today, an abundant amount of meaningful and feature rich infonnation content is truly at one's fingertips. Currently, using a personal computer (PC) and PC-based browser, one can find information online regarding just about anything they desire. One can communicate with people on the other side of the U.S. or world, set up a teleconference call, tap into the resources of other computers across the earth, search through the world's finest libraries, and visit images from the world's most remarkable museums. One can even use the PC-based browser to watch videos and listen to their favorite music, monitor the financial markets, find the local weather forecast, go shopping, download application software, and so on. Currently, all of this can be done with a personal computer and a PC-based browser that is tapped into a feature rich network of computers such as the Internet, Intranet, or Extranet.
[0004] At the same time, the field of communications, and more specifically wireless telecommunications, is currently undergoing a radical expansion. This technological expansion allows an electronic device, such as mobile personal digital assistant (PDA), cellular phone, pager, and other electronic devices to connect to the same information sources, such as a web server or database, as one could with the PC and a PC-based browser.
[0005] Unfortunately, this feature rich information content was developed for a standard PC-based browser, not a mobile or portable electronic device that might be limited in memory, screen size, bandwidth, navigation capabilities, power consumption, processing power, etc. For example, an electronic device, such as portable PDA, with a small screen size may be inappropriate to display the same information content originally intended for a PC-based browser, and viewed on a 15-inch or greater size display monitor. Consequently, the PDA cannot be able to faithfully access and display information content as it was originally intended to be viewed. Therefore, it would be desirable to access, organize, and navigate information content including applications.
[0006] In another example, a mobile or wireless device with only low bandwidth capability may be unable to view information content intended for only high bandwidth applications. Parameters such as the mobile or wireless device's network connection, memory capacity, power restrictions, or other limitations of the device may require customization of information content that is delivered to or from the device. Therefore, it would be desirable to streamline the information content such that the desired content is received and presented at the mobile device as it was intended to be viewed.
[0007] In yet another example, current electronic devices cannot take full advantage of dynamically generated content and interactive Web sites that are typically hosted on today's Web servers. According to this example, scripting languages like JavaScript or Jscript allow a user on a PC-based browser to interact with markup language such as Hypertext Markup Language (HTML) source code, thus enabling the use of dynamic content. However, it would be desirable for a portable electronic device, although possibly having limited abilities, to also utilize the modem and current scripting languages.
[0008] Currently, information content is sent to the device, but often in a format that the appliance, user, or network cannot conveniently accommodate, which produces undesirable results. For example, the data content might be unreadable on the display, displayed in an unorganized fashion, be too voluminous or bandwidth intensive to be received or displayed, and so on.
[0009] Creators of content may use tables and/or frames to control the placement of images and text and take better advantage of the available display screen. For further effects, tables and/or frames can also be nested within tables. Tables providing content designed for larger desktop screen displays, however, may not fit on smaller screen displays, such as on a personal digital assistant, cell phone, pager or internet information appliances, or even when windows are resized. For example, content may be wider than the available screen display with portions of the content then be rendered outside the viewable area of the display. The content is then only partially viewable with portions chopped off, which is at a minimum aesthetically displeasing and may also reduce the readability and usability of the content when significant portions are cut off.
[0010] To view the content information outside the limited viewable area, a user scroll request such as horizontal scrolling across the content page may be necessary. By horizontally scrolling, the user can selectively show the hidden portions of the content information. Scrolling, however, requires additional input from the user, which is inconvenient, and still does not allow the entire page to be seen at once. Scrolling across the page to see the hidden content usually hides the portion of the page that was original viewable and the entire content is still not viewable. Horizontal scrolling can be eliminated by reducing the width of the information content to always fit within the width of the screen. Indiscriminately reducing the width of information content destroys the integrity of the content as the creator intended and possibly results in the loss of information.
[0011] It would be desirable to present content on smaller screen displays to preserve the content, yet selectively minimize the scrolling necessary to view content.
[0012] A system and method is provided that enables electronic devices with limited hardware or network capability to successfully access the same feature rich information content as full featured PC-based browsers with a large display screen, extensive user input facilities (e.g., mouse, keyboard, etc), high CPU power, large memory, reliable network connections, a reliable power supply, and so on.
[0013] According to a preferred embodiment, the information content is optimized to selectively minimize the horizontal scrolling required to view the content. The need for horizontal scrolling is selectively removed where not necessary to preserve the context. Each component or sub-component of the original area, such as a frame, table row, table cell or nested table, is considered separately and may be preserved, resized, or replaced. Where the context requires that the content be wider than the viewable area of the screen, horizontal scrolling is preserved.
[0014] In an aspect of the present embodiment, the system enables an electronic device to access a number of different information sources including, but not limited to, marked up content like HTML, XML, WML, voice and multimedia. In the exemplary embodiment, a script execution engine is utilized to support scripting technologies such as JavaScript that dynamically generate content.
[0015] According to another aspect of the present embodiment, a distributed browser includes separable components, a server browser and a client browser, that enable an electronic device with a small display to efficiently access information content. In the exemplary embodiment, the server browser and the client browser work together to access the information content by separating functionality between the browsers, irrespective of the component's location. Preferably, the functionality applied to optimize information content access, arrangement, transmission, and navigation can be performed by the server browser rather than the client browser hosted on the portable or mobile device.
[0016] According to another aspect of the present embodiment, a QDOM converts data content into a document object tree represented by a mutable object having an array structure. Based on the nodes of the object tree, the QDOM generates an array of primitive data types for efficiently developing an optimized standard structure for use by a normalizer or other processing modules. In the manner, the QDOM extends the World Wide Web Consortium (W3C) DOM interface definition to an efficient model that provides high speed parsing, storage, and access while minimizing memory resource requirements.
[0017] In another aspect of the present embodiment, a normalizer adaptively tailors and folderizes markup based information content to accommodate an electronic device's particular software, hardware, and network characteristics. In the exemplary embodiment, the normalizer organizes any markup based information content into folders of interest. The user of the electronic device can then further explore the folders of interest as desired.
[0018] In another aspect of the present embodiment, a normalizer is utilized to selectively minimize the amount of horizontal scrolling required by the end user. In the exemplary embodiment, the normalizer modifies layout constructs such as tables and frames to achieve the reduction in horizontal scrolling. Nested layout is flattened and objects that are wider than the electronic device screen such as tables or images are analyzed to determine if they can be reduced to fit the screen without affecting context.
[0019] In yet another aspect of the present embodiment, metatags embedded in a markup language at the information source can provide instructions to the normalizer to take appropriate actions. Use of metatags can allow customization of original information content if a modified outcome is desired at the electronic device. In the exemplary embodiment, the metatags provide instruction to an automatic normalizer including, but not limited to, direct output of information content without normalization, the promotion of content into or out of folders, and dropping or filtering information content from the serialized output to an electronic device.
[0020] In another aspect of the present embodiment, pattern-matching templates are utilized to normalize the presentation of accessed information content. In the exemplary embodiment, a template normalizer utilizes regular expression pattern-matching to impose a template over a document and attempts to match the template to the document.
[0021] In another aspect of the present invention, an event translator provides additional compatibility with commercially available client browsers or end user applications that employ standardized protocols. In the exemplary embodiment, the event translator can be utilized on the server browser or the client browser to provide compatibility with standard client browsers.
[0022] In an aspect of the present embodiment, a serializer dynamically formats normalized content to a form that is optimized for a particular electronic device. The serialized output can be formatted to suit industry standard browsers, or targeted to an electronic device using the client side browser.
[0023] The present embodiments allow for electronic devices with limited hardware capability to access, on the fly, feature rich static and dynamic content, and applications. The server browser enables a client browser that utilizes a particular markup language to access information content that is of any type of markup language or technology. The distributed browser minimizes the functionality required on the device and implements the CPU and memory intensive functions on a server in the network, thus allowing wireless devices, with intermittent, limited connectivity, processing power capability etc. to provide a similar experience achieved with a desktop PC.
[0024] In an aspect of the invention, the system supports tiers of devices. For devices with enough processing power and capability, the system supports a mode where the client browser is able to access and render content from the information source without the need for a separate server browser. In this scenario, the server browser components are essentially co-existent with the client browser components.
[0025] In another aspect of the invention, the client browser can optionally utilize the server browser as a means to enhance capabilities, improve speed or add function. The use of the server browser by the client browser can be initiated either manually via a user preference or automatically via pre-defined algorithms that take into account the hardware and software capabilities of the device and the characteristics of the wireless network used for the request/response as well as the application needs indicated by the information source.
[0026] Multiple components including a serializer, normalizer, client browser, and/or the event translator work in conjunction with each other to convert user events within one markup domain into another markup domain while staying in the transaction to translate the meaning of the interaction appropriately. Thus, for example, user events such as scrolling, clicking, voice commands interact with the QDOM to result in a change in presentation of the content.
[0027] Additionally, the present embodiments provide significantly higher speed and an efficient use of network bandwidth as desired information content can be cached on the server browser and on the client browser, if so desired, to enable quick access to the desired portions of the information content.
[0028] The present embodiments also provide for server browser-centric access to user profile and client browser state information (such as cookies), thereby facilitating the use of multiple devices by a single user.
[0029] The present embodiments provide a number of advantages and applications as will be more apparent to those skilled in the art. The exemplary-embodiments utilize distributed architecture for adaptively tailoring information content to electronic device's hardware and network characteristics.
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050] The information source
[0051] The electronic device
[0052] Information content from the information source
[0053] In the exemplary embodiment, the server browser
[0054] The server browser
[0055] Preferably, communications between the client and server browsers
[0056] Therefore, the communications network
[0057] To provide an exemplary illustration, assume that a PDA hosts a client browser, a PC hosts a server browser, and the PDA and PC are both connected to an Ethernet network. Then, the client browser and the server browser could perform information transactions over the Ethernet network. Such transactions would utilize Ethernet or similarly IEEE 802.3 protocols. Nevertheless, in this example, the client and server browsers communicate over a wired network.
[0058] In another example, assume that an internet-enabled refrigerator hosts a client browser, a set-top box hosts a server browser, and both could perform information transactions over a Bluetooth or IEEE 802.11 wireless LAN. Then, according to this example, the client and server browsers are communicating over a wireless network.
[0059] The distributed browser supports network optimization between the client browser
[0060] One implementation of this type of transport layer optimization is based on the Stream Control Transmission Protocol (SCTP) [Network Working Group RFC 2960] with parameters and message sequencing optimized for wireless transmission media. The key feature of the SCTP-based network optimization protocol is support for multiple independent streams of application messages over a single connection. This reduces or can eliminate the “head-of-line-blocking” problem associated with the use of TCP due to the necessity to maintain strict sequence messaging. The ability to support multiple independent streams is key to the application of this protocol to the distributed browser
[0061] Another feature of the SCTP-based network optimization protocol is the capability to monitor the reachability of the remote end-point thereby providing transport-level fault tolerance. The SCTP-based network optimization protocol also supports preservation of application message boundaries for logical chunks of data sent over a single stream. This is a particularly efficient method for the client browser
[0062] Referring again to
[0063] The server browser
[0064] According to the exemplary embodiment, the server browser
[0065] Furthermore, the server browser
[0066] To deliver these capabilities, the server browser
[0067]
[0068] For example, the user agent
[0069] In addition to transmitting the resource identifier, the user agent
[0070] Preferably, the user agent
[0071] Beyond industry conformance, the server browser adds the concept of dynamic user agent
[0072] Information content might also use XML information content and XSL style sheets instead of HTML as the preferred internet/intranet information content format. By using XML information content and an XSL style sheet, it can provide a clear separation of data and presentation. The XSL style sheet is applied to the XML information content by an XSLT engine to present the information content to an electronic device
[0073] The XSL style sheet is applied to the XML information content at the information source
[0074] In addition to providing normalizer functionality, the system can also use templates and meta-tag markup to alter the original information content to better suit an end user application for which it was not originally designed. This can be achieved through the addition, removal or substitution of sections of content, tags and attributes (separately or together) in the markup, described more below.
[0075] Information content might also use VoiceXML (www.voicexml.org) which is an XML based language for specifying voice dialogs, including audio prompts and text-to-speech (TTS) for output and touch-tone keys (DTMF) as well as automatic speech recognition (ASR) for input. VoiceXML technology enables consolidation of voice and web applications. For example, it can be used with voice-only devices to access a voice portal, or used to facilitate multi-modal (graphical and voice) dialogs to VoiceXML enabled client browsers).
[0076] Preferably, the system (
[0077] Location based services might also prove to be very popular in this industry as they are well suited to mobile applications. Preferably, the server browser
[0078] Referring back to
[0079] Referring back to
[0080] The translated information is then organized into a logically structured format for further processing by the QDOM
[0081] Referring back to
[0082] This extension to the QDOM
[0083] Referring back to
[0084] Preferably, the CSS processor is capable of supporting the W3C specifications for CSS levels 1 & 2. In addition, the CSS processor interacts with the script executor
[0085] Referring again to
[0086] The serializer
[0087] The server browser
[0088] Referring back to
[0089]
[0090] Preferably, the event translator
[0091] Referring back to
[0092] Furthermore, portions of the client browser
[0093] Moreover, the electronic device
[0094] In an exemplary embodiment, the client browser
[0095] Referring now to
[0096] The event controller
[0097] Examples of such end user applications include, but are not limited to, email, instant messaging, media players and other such plug-ins. Further, multiple different kinds of browsers designed for particular markup types (HTML, cHTML, WML, etc.) and so forth can also be supported by the micro-gateway
[0098] In one aspect of the exemplary embodiment, the micro-gateway
[0099] Additionally, the micro-gateway
[0100] The microbrowser
[0101] Typically the microbrowser
[0102] It should be understood, however, that an additional property of the browser is the ability to download and install other applications or plug-ins as needed to support non-markup based content, including images, audio, video, and multipurpose internet mail extensions (MIME) or secure MIME (S/MIME) document formats such as plain text, Acrobat (e.g., “*.pdf” format), Microsoft Word and so forth. Content of these types is can be viewed through the use of these other applications or plug-ins by the micro-gateway
[0103] Preferably, the event controller
[0104] The DOM store
[0105] When the DOM is examined or modified, the event controller
[0106] In the case of a click event, the DOM facade
[0107] In cases where a standard browser (cHTML, WML, XHTML, etc.) is used, there is preferably no event controller. The requests from the standard browser are interpreted and translated by the event translator into events that the DOM facade
[0108] An “active” page is a page from the information source that is of higher than normal interest to the user of the client browser
[0109] Preferably, the client browser
[0110] Further, the client browser
[0111] The server browser
[0112] Preferably, the interface (API) to the Push Server will conform to the WAP Push Protocol specified by the WAP Forum. Accordingly, the push initiation may also be via a Multimedia Messaging Service (MMS) Server/Proxy-Relay. As described in that WAP Forum specification [WAP-205-MMSArchOverview-20010425-a] MMS Server is used to provide storage services for the messages while the MMS Proxy-Relay component interacts with the client browser
[0113] The distributed browser
[0114] An events message preferably carries information between the server and client browsers
[0115] The first byte of the messages preferably contains an identifier that uniquely defines which type of message is contained in the data being sent. The 2-byte integral values are always “little endian” or bytes at lower addresses have lower significance.
[0116]
[0117] The event data area of the events message may be compressed and/or encrypted as specified in the header component. When converted to plain text, the data area is defined as having the following exemplary structure:
[0118] <XML Events Message>=<Event Protocol Version><Event Separator><Session ID><Event Separator><XML Events><EOM>
[0119] <Event Protocol Version>=Integer
[0120] <Session ID>=<Server Session ID>|<Server Session ID><integer separator><Client Session ID>
[0121] <Server Session ID>=<Device Type><integer separator><Page ID>
[0122] <Device Type>=Integer (uniquely identifying different devices)
[0123] <Page ID>=Integer
[0124] <Client Session ID>=Integer
[0125] <XML Events>=<Event>|<Event><Event Separator><XMLEvents>
[0126] <Event>=<Event ID><field separator><XML Node><field separator><Attributes>
[0127] <Event ID>=integer (see table below)
[0128] <XML Node>=Integer
[0129] <Attributes>=<value>|<value><value separator><Attributes>
[0130] <value>=ASCII text
[0131] <EOM>=\r\n (0x0D 0x0A)
[0132] <Event Separator>=|
[0133] <Field separator>=*
[0134] <Value separator>={circumflex over ( )}
[0135] <Integer separator>=,
[0136] All events contained within an events message belong to the same session or page of content.
[0137] When there is only a single client browser
[0138] Preferably, the page id is generated on the server browser
[0139] For any Uniform Resource Locator (URL) data that is contained in an events message, standard URL encoding as is known in the art, is used to ensure that the information content does not include any of the characters special to a proprietary packet format (i.e., “|”, “{circumflex over ( )}”, and “*”). For any XML content contained in an events message, standard HTML encoding, as is known in the art, is used (e.g., where characters can be represented by “n;” where n is the ASCII code for that character).
[0140] The node value maps directly to a node in the DOM tree (e.g., output of the QDOM
[0141] The following table lists the different events that can be contained within an events message:
Event Attribute list Description Cleaned (none) Notification to clients that old data has been purged so that clients can check any cached page references Error Error message Description of an error on server Expand (none) Client request for content of a normalizer folder Expand Document Content of a normalizer folder Load URL, summary Client request for a new content page, option, table option, including options on whether to normalize, JavaScript option include tables and allow JavaScript processing Load URL, summary Content page response or push from server, option, table option, including options set in original request JavaScript option, document Notify Message data Non-error message to client(s) Onblurchange new value for input The user has changed the content of an input element element and moved focus away from it Onclick (none) Click on link element Reload (none) User wishes to force a reload of the current content page in the device application. The server will replace any existing session with a new load of the page from the web. Stop (none) User request to stop any message identified by the MessageID in the header.
[0142]
Event Attribute list Description Submit (none) User has completed a form and is submitting all onblurchange data to the server Authenticate Realm The remote HTTP server issued a challenge string requiring the user to prove possession of a valid user id and password for the realm Authenticate Authentication tokens <username>:<password>, encoded in Base64 to be submitted back to the origin server Alert Message Server initiated message. The device displays the alert message followed by an OK button. Alert (none) Device returns when the user presses OK. Confirm Question Server initiated message. The device displays the question followed by an OK and Cancel button. Confirm confirmation status Device returns the button pressed. Prompt Message, default Server initiated message. The device displays the message followed by a text input field and OK, Clear and Cancel buttons. The default message is displayed as the initial input. Prompt Button pressed, return string If the user clicks the cancel button, return string should be null. If the user clicks the OK button, Device returns the value currently displayed in the input field.
[0143] To conserve bandwidth over the communications network
[0144]
[0145]
[0146] The server browser
[0147] The mutability of a QDOM
[0148] A QDOM
[0149]
[0150] In another exemplary embodiment, the same information can be stored in structures for each node and attribute. The QDOM
[0151] The actual string names of the tags and attributes of tree elements are replaced by a corresponding value equivalent. A dictionary of the strings and their corresponding value is preferably built up as necessary to deal with a particular set of XML tags. For performance reasons, pre-compiled dictionaries can be used for the well-known markup languages, such as HTML or WML.
[0152]
[0153] Preferably, the interface to the QDOM
[0154] Since the underlying structure of the QDOM
[0155] In situations where resources are limited, such as on a PDA, the QDOM
[0156] A QDOM
[0157] A number of preliminary tests have been taken to determine the time saved using the QDOM
[0158] The normalizer organizes the DOM tree into tiers or folders under headings that contain related content. The result is a set of hierarchical DOM node collections. The characteristics of font, font size, font color, hue saturation comparison of background and foreground color and Cascading Style Sheet or XSLT properties are used to determine the weight of a text node. The weight is then used to determine whether it will be inserted into a normalized document tree as a parent or child. The parent nodes become folder titles and the child nodes become the folder contents. Thus, higher weight document objects are pushed to the top of the tree so the user can decide whether to “walk” down the branch or not.
[0159] The normalizer dynamically streamlines and folderized the content automatically or via predefined additional rules to achieve automatically an experience similar to reading a newspaper. The normalizer including the template normalizer and meta-tags allow the content source to be redefined once for all networks and device types. The alternative technologies in the industry are large cycle time, re-development of the content, often specific to one or more of the following: each device ergonomics, or a particular client-only browser, or a particular network type.
[0160] The goal to normalizing is to adapt desktop focus web content to handheld browsers. This requires filtering unsupported content, dropping unneeded content, reordering and partitioning content to improve navigation and application flow for display on a limited device. Some of the functions to normalization are folderize/partition content, drop content not required on a handheld device, reorder content, provide prompts/names to input elements
[0161] The normalization process can utilize a weighted heuristic and pattern recognition to create a the contextual relationship of with nodes in the source tree. The output from the normalization process is a hierarchical content tree. Preferably, the normalized tree is not specific to a particular presentation language. Therefore it can be transcoded for display by any type of client browser.
[0162] Content collapsing rules in the automatic normalizer utilize the previous page loaded to determine if similar constructs exist in a page which can be collapsed into folders or selectable input elements on subsequent loads. This is performed by comparing the previous page loaded with the current page. The trees of the documents are compared to determine if similar fragments (list of links, table, image) exist. The similar fragments of the tree are collapsed into folders or select input elements. The effect is to conserve display space on the device.
[0163] Electronic devices can have limited display characteristics such as display size, font types, color etc. Most web content is tailored for display on desktop browsers which not only suppose a large screen, but also support a rich set of fonts, colors, and formatting constructs such as tables and frames. The normalizer adapts existing information content for display on the electronic device
[0164] Referring back to
[0165] It should be noted that while the exemplary embodiments of the normalizer processes concentrate on the normalization of HTML content, including that generated by scripting technologies such as JavaScript, these same processes can be applied to other markup content. It is well suited to any XML content.
[0166] The automatic normalization process traverses a DOM tree from the QDOM
[0167] The normalization process begins with the root node of the document and traverses the tree along a depth first path to maintain context at all times.
[0168] If a node is the beginning of a table, a table pattern recognition process is preferably executed. The entire table is weighted and pattern recognition criteria are compared to determine if the table matches a defined pattern. The table recognition criteria define a profile for different data table types. Each cell in the criteria is defined to be either greater than, less than or equal to a root table weight, a “don't care” or defined to contain certain nodes such as anchors or images. The root table weight can be derived from any cell in the table such as the cell at position row 0, column 0 or can be derived outside the context of the table. These criteria define the pattern that is attempting to be matched. If a pattern is recognized, the table cells are formatted corresponding to that pattern. If a pattern cannot be recognized, the weighted node processing continues.
[0169] The major part of the weighted node process is the maintenance of a weighted node stack. The first element of that stack is always the “DOCUMENT” itself, having by default the highest possible weight. The normalization process takes the next node from the DOM tree. The node is first filtered to determine if it has an effect on weighting or presentation. If the node is not significant it is preferably dropped. Nodes such as the HTML tag in HTML are not significant since the tag has no effect on presentation. Next it is determined whether the node is a weighting node or a content node. Weighting nodes are nodes that affect the display of rendered content such as a bold or heading format tag. Some weighted tags may have a negative weight that allows nesting of the tags and emulates a hierarchy of nodes weights such as nested list items. Content nodes are nodes such as text nodes, input nodes, and image nodes.
[0170] When a weighted node is encountered, the node weight is added to the accumulated weight. When a content node is encountered it is assigned the accumulated weight and becomes a weighted node. The weighted node finds its position on the weighted node stack by finding the lightest element on the stack with a weight greater than his (node's parent). Stack nodes from that point on are preferably deleted from the stack. The new weighted node becomes a child node for that parent. When the node goes out of scope (e.g. if a TABLE is ended), the normalization process checks the weighted node stack to remove all nodes that belonged to the expired scope of influence.
[0171] For example, if the node on the weighted node stack is part of a table and the table scope of influence has expired, then that node is removed from the weighted node stack. However, if that node belongs to more than one scope of influence (e.g. it is part of one table nested inside another table), all scopes of influences are checked against that node and it is removed preferably when they are all expired. When the inner table ends, node stays until it replaced by a heavier node or the outer table ends.
[0172] The template normalizer
[0173] The template normalizer
[0174] The template normalizer
[0175] The template normalizer
[0176] The template normalizer
[0177] Tags:
[0178] <xgrp> equivalent to a bracket in Regular Expressions
[0179] <xany> equivalent to a wild card in Regular expressions
[0180] <xalt> equivalent to ‘or’ in Regular expressions
[0181] <xadd> signifies the addition of contained markup
[0182] <xnone> equivalent to “empty” in Regular Expressions
[0183] Attributes:
[0184] “xtimes” the number of times to apply a recurring pattern
[0185] “xtitle” specifies a new title for a matched pattern
[0186] “xid” specifies a name of an XML node
[0187] “xparent” re-parent an XML node to a specified element
[0188] “xdrop” delete a node from the tree
[0189] “xformat” produce a formatting attribute for an input field
[0190] “xaction” apply a user-specified algorithm to the branch of a tree
[0191] The normalization process preferably scans a dictionary of templates and an initial comparison is made based on a URL specified for the template and the URL of a document to be processed. If the URLs match, then the template normalization process begins.
[0192] The template normalizer
[0193] If the match process succeeds, the template normalizer
[0194] During the application of the changes, the template normalizer
[0195] Where a counter is used in a variable expression, the variable will be resolved using the current value of the counter.
[0196] If $z=a and $ya=“data” the expression $(y($z)) would be resolved to $(y$(z))=$ya=“data”.
[0197] $z gets resolved to a and this creates the expression $ya which is then resolved to “data”.
[0198] If xcounter=“x” and the first iteration of an Xgrp is occurring, then the expression $a($x) will be resolved to
[0199] $al.
[0200] If a variable is used in the context of an Xtitle attribute, the value for the variable is derived from the text value of the DOM node that is referenced by the variable.
[0201] If the variable is used in the context of an xparent, then the variable is resolved to a DOM node to which to move the referring node as a child.
[0202] The xid attribute is used to set the value of a variable to the DOM node.
[0203] Template normalization involves matching a template DOM tree with an input DOM tree and applying changes to the input DOM tree. The automatic normalization algorithms may be called during the apply step of template normalization where specified by the xaction attribute.
[0204] With integration of the automatic normalization algorithms, a template can be utilized to drop and reorder content while calling the automatic normalization algorithms to partition the