Plaque It!
Sponsored by: Flash of Genius |
[0001] The present application claims the benefit, under 35 U.S.C. §119(e), of U.S. Provisional Application Serial No. 60/230,354, filed Sep. 6, 2000, and entitled “System and Method for Processing Database Queries” which is hereby incorporated by reference.
[0002] The field of the invention relates generally to databases, and more particularly to processing database queries.
[0003] Computer systems having databases are generally used to store and obtain information. These computer systems may be stand-alone computers that serve one or more users, or the systems may be networked to provide access to a database from multiple systems referred to as clients. These database systems carry out communication with clients through one or more communication protocols, as is known in the art. For example, these database and clients may communicate over a network, such as the Internet, using the well-known TCP/IP protocol. The client systems may interact with the database using one or more application programs such as an Internet browser which accepts input from a user and display information received from database systems. The database may be, for example, a relational database which stores character, binary, or other data format that may be searched and retrieved.
[0004] Database access has become standard for supporting operations performed on networks such as the Internet. For example, databases are used to support searching, by providing a resource which stores links to data resources and to other databases. For example, databases are used to classify and store links, called Uniform Resource Locators (URLs) that serve as addresses to resources such as Web sites, audio/video files, and other types of media. These addresses are provided by database systems in response to queries submitted to the database systems through interfaces to the database systems displayed in browser applications. These database systems are sometimes referred to as search engines, and may be used as a part of a directory service, company's Web site, or any other method for searching and retrieving information.
[0005] Interfaces for databases generally include a text entry field, wherein a user may enter one or more keywords associated with what he or she is searching for. These keywords are processed by the database system, and a set of results is displayed to the user. The user then reviews these results to determine how pertinent the results are to what he or she is looking for. These databases function as a directory services for resources on the Internet. Examples of such searching systems include Yahoo! (www.yahoo.com), Google (www.google.com), and others. Yahoo! and other search engines generally provide two methods for finding information. First, a directory method, which provides pathways for navigating through content related by a logical relationship. The second method is a key word search. Some services such as Yahoo! utilize the catalog information provided by other services such as Google for performing the search.
[0006] One main problem with existing systems that use the directory search method is that a user must navigate hierarchical directory structures. Taking a “wrong” or a non-ideal “turn” in the search path, by selecting a branch in the search which leads away from the best result, will significantly degrade the final outcome of the user's search.
[0007] Keyword search systems generally accept keyword entry and display a number of results, the results being ordered based upon the frequency of the keyword appearing in the resource, or some other ranking criteria. The database systems perform preprocessing on the resources by indexing data of the resources by keywords. This involves analyzing Internet resources with programs referred to as Web spiders or crawlers which visit Internet links and perform keyword processing on resource content associated with the link, generally involving millions of processed resources. Furthermore, a perplexingly large number of search results is typically returned by such keyword search engines. Thousands of search results are usually presented to the user, and the sorting of the search results can involve errors due to the automated indexing of the search results or the difficulties described above.
[0008] Further, more simple keyword association and relevancy also produce irrelevant results sometimes. When documents are retrieved based upon a keyword search, resources such as documents having those combination of keywords are retrieved, even though the documents' content may not be relevant. A user must evaluate individually each link to determine whether an indicated document is relevant. Also, the number of records produced is generally large, and a query retrieving thousands of records is not uncommon. Retrieving large number of records is problematic to a user, as the process of reviewing each link is tedious and time-consuming for the user and requires excessive computational resources.
[0009] Keywords and other natural language (NL) inputs are generally processed by the system as shown in
[0010] A database
[0011] As discussed above, search engines may include a NL query preprocessor
[0012] Because meaning may be attributed to a query at many levels, NL processing is a complex process which involves complex algorithms. Further, these algorithms are not perfect; there are frequent ambiguities in natural language interpretation. Because of these ambiguities, and because of the inherently subjective nature of database queries, NL processing of input queries yields imperfect search results. Natural language processing is more fully described in the book entitled “Natural Language Understanding” by James Allen, 2nd edition (January 1995), Addison-Wesley Publishing Co., which is hereby incorporated by reference.
[0013] As discussed, there are many drawbacks of implementing NL in association with database searching. For example, the user may pose a question, and the question is not interpreted properly, yielding incorrect results. The user may need to restructure the question in a different manner to obtain meaningful results.
[0014] There are sites that implement NL analysis such as the portal AskJeeves (www.askjeeves.com), which ascribes meaning to input queries by matching a user's question to a question that was previously defined. This portal allows a user to pose questions in a NL format, and retrieves the most relevant question based on a keyword analysis. However, as discussed above, natural language analysis produces ambiguous results and is complicated to perform. Thus, AskJeeves, and others, do not generally perform a perfect match. Further, questions posed to the system by a user do not necessarily have a corresponding question predefined in the system. Also, sample questions presented to the user in response to a query are usually not relevant. Because AskJeeves is linked to a keyword indexed database, the results returned must be processed by the user, and the AskJeeves system produces the same volume of information as standard keyword matching search engines.
[0015] Many database search engines also, in addition to keyword-based and NL-based search functions, provide a hierarchical listing of information to compliment these functions. This hierarchical listing is a categorization of links, usually programmed manually and take the form of directories. When new links are added, they are generally placed within the predetermined hierarchy or directory tree. As described earlier, navigating through a directory tree requires a user to accurately choose the best choice from a plurality of presented choices. The presented choices may themselves not include a choice corresponding to a path leading to the information the user actually desires to find. Making a non-ideal selection or being presented with selections none of which are ideal, forces a user down a search path that will not lead to the desired results. Also, excessively long search paths involving many user selections are generally required to reach the end point of a search. No logical relationship necessarily exists between members of a directory level or members of different directory levels. These difficulties cause directory-based search engines or navigators to be an inefficient means for retrieving information.
[0016] According to one embodiment of the invention, a system and method is provided wherein common elements of questions, or “queries”, are used to assist users in constructing their questions. In particular, users select a “query element” value from a list of allowed values, or a “domain,” of each query element. Instead of using keywords and determining their relevancy to elements in a database, user-selected values of query elements of a query are used to retrieve indexed results from the database system. The database system may be, for example, a relational classification system. Information may be stored in the database by classifying the information by assigning, for an information object of the database, a number of values for each query element. Also, a relevancy index may be assigned among the allowed values of a query element, if multiple values of a query element are possible. According to one aspect of the invention, the query submitted by a user is restricted to one of a plurality of predefined formats and allowed values, and the predefined formats and allowed values are associated with database information objects.
[0017] Because it is the user's responsibility to properly phrase a query or search criteria in a text-free format, conventional systems perform only as good as the input provided by the user. According to various embodiments of the present invention, the query response information is provided in a non-hierarchical format, such that the user does not need to traverse directories or hierarchies to find the requested information.
[0018] Further, the amount of calculation carried out by the server to respond to a query and retrieve information is often far less than that required by standard keyword-parsing and NL-processing systems. Additionally, the database needed to store the information is reduced.
[0019] Some embodiments call for a human or a machine “editor” to sort and assign relevance values to query element values. This way an optimized first-time query brings up only the most relevant database search results.
[0020] Also, Internet links may be represented by arbitrary numbers, or codes, to prevent other Internet services from directly copying the search results. That is, to hide content from other Internet services, results such as links that are returned to the user are assigned arbitrary numbers used by the system to refer to the actual links.
[0021] Additionally, state information may be associated with the links and maintained by the server. In this way outdated links could be tracked and the system can determine the last date a link was checked, or whether a link was found valid, invalid, or had any other condition worthy of reporting. Accordingly, some embodiments of the invention are directed to:
[0022] A method for processing database queries, comprising: presenting, to a user, a plurality of query elements, each query element having a plurality of allowed query element values; receiving, from the user, a plurality of selected query element values, each selected query element value corresponding to a respective query element, wherein the value of each selected query elements is selected by the user from the respective plurality of allowed query element values corresponding to the respective query elements; indexing a database by forming an index from the selected query element values; and retrieving, from the database, information objects corresponding to the index.
[0023] A system for processing a database query, comprising: a server; a client coupled to the server through a network; a database coupled to the server, the database comprising information objects; a user interface running on the client for presenting query results to a user; a selector, implemented in the user interface, for allowing the user to select a plurality of selected query element values from respective pluralities of allowed query element values for respective query elements; and an indexer for forming an index to the database, using the selected query element values.
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031] The present invention will be more completely understood through the following description, which should be read in conjunction with the accompanying drawings, in which similar reference numbers indicate similar structures.
[0032]
[0033] The communication network
[0034] Computer system
[0035] Computer system
[0036] It should be understood that the invention is not limited to a particular platform, processor, operating system, or network. Also, it should be apparent to those reading this application that the present invention is not limited to a specific programming language or computer system and that other appropriate programming languages and other appropriate computer systems could also be used.
[0037] In this application we generally consider that a database search operation, or query, is facilitated by breaking down the query into a plurality of query elements
[0038]
[0039]
[0040] Other query elements
[0041] Additionally, each individual query element value may be assigned a query element relevancy
[0042]
[0043] According to one aspect of the invention, server
[0044] In one embodiment of the invention, a relevancy is determined for each database object for domains having multiple possible values. In this manner, the query is executed based on values specified for each element against the classification of the information in the database rather than the information itself. Because keywords do not have to be indexed or NL analysis performed, search time is reduced. Also, because more relevant information is indexed, the result displayed to the user is more relevant, and unnecessary effort reviewing irrelevant links is not required.
[0045] It should be understood that one or more components of database query system
[0046]
[0047] Who (element 1): the subject of the question
[0048] Do (element 2): the action performed
[0049] What (element 3): the object of the action
[0050] Where (elements 4): the location for the action
[0051] When (element 5): when the action was or will be performed
[0052] In the context of a special purpose Internet site such as a government-related Internet site that handles processing of drivers' licenses, auto registrations, passports, tax returns, postal services, social security, and other aspects of government, the following query elements Who (element 1): Values: [I, my spouse, my family, my parents, etc.] D
o (element 2): Values: [Apply for, renew, inquire about, cancel, file for, travel, know, etc.] What (element 3) Values: [Driver's license, 401K, passport, tax return, USPS, SSA, etc.] Where/From_To Values: [In the U.S., State, County, City, Zip, (elements 4 and 5) Country, etc.] When (element 6): Values: [in yearly increment, not applicable, etc.]
[0053] For example, a user
[0054] Data may be classified in the database Who: [All] To: [apply for] What: [driver's license] Where: [MA] When: [not applicable]
[0055] Therefore, a user
[0056] <link information><All><apply for><driver's license><MA><not applicable>As discussed, a relevancy
[0057] According to another aspect of the invention, the query results
[0058] Further, the mapping table
[0059]
[0060] Table
[0061] Having now described a few embodiments of the invention, it should be apparent to those skilled in the art that the foregoing is merely illustrative and not limiting, having been presented by way of example only. Numerous modifications and other embodiments are within the scope of ordinary skill in the art and are contemplated as falling within the scope of the invention as defined by the appended claims and equivalents thereto.