Title:
Name browsing systems and methods
Kind Code:
A1


Abstract:
A system provides document browsing by proper name. The system identifies a subset of documents from a set of documents (310). The documents in the subset of documents include proper names. The system receives a selection of at least one of the proper names from the subset of documents (330) and searches the subset of documents to identify one or more of the documents in the subset of documents that include at least one occurrence of the selected proper name(s) (360). The system then presents the one or more of the documents as a result of the search (370).



Inventors:
Colbath, Sean (Cambridge, MA, US)
Boisen, Sean (Laurel, MD, US)
Shepard, Scott (Waltham, MA, US)
Nielsen, Susan S. (Annapolis, MD, US)
Wilson, Andrew (Columbia, MD, US)
Kubala, Francis G. (Boston, MA, US)
Application Number:
10/610799
Publication Date:
10/07/2004
Filing Date:
07/02/2003
Assignee:
COLBATH SEAN
BOISEN SEAN
SHEPARD SCOTT
NIELSEN SUSAN S.
WILSON ANDREW
KUBALA FRANCIS G.
Primary Class:
1/1
Other Classes:
707/999.003
International Classes:
G06F7/00; G06F17/00; G06F17/21; G06F17/28; G10L11/00; G10L15/00; G10L15/26; G10L21/00; (IPC1-7): G06F7/00
View Patent Images:



Primary Examiner:
ADAMS, CHARLES D
Attorney, Agent or Firm:
ROPES & GRAY LLP (BOSTON, MA, US)
Claims:

What is claimed is:



1. A method of providing name browsing across a plurality of documents, comprising: identifying a group of documents, documents in the group of documents including a plurality of proper names; receiving a selection of one or more of the proper names from within at least one of the documents in the group of documents; querying the group of documents based on the one or more of the proper names; and presenting, to a user, one or more of the documents from the group of documents as a result of the querying.

2. The method of claim 1, wherein the proper names correspond to names of at least one of people, places, and organizations.

3. The method of claim 1, wherein the documents include at least two of audio documents, video documents, and text documents.

4. The method of claim 1, wherein the identifying a group of documents includes: receiving one or more search terms from the user, searching a database of documents based on the one or more search terms, and identifying documents in the database, as the group of documents, based on a result of the searching.

5. The method of claim 1, wherein the identifying a group of documents includes: receiving, from the user, a plurality of documents as the group of documents.

6. The method of claim 1, further comprising: presenting the group of documents to the user; receiving a selection of at least one document from the group of documents; and providing the at least one document to the user with proper names visually distinguished.

7. The method of claim 6, wherein the proper names relate to names of people, places and organizations; and wherein the providing the at least one document includes: visually distinguishing the proper names relating to people, places, and organizations differently.

8. The method of claim 6, wherein the providing the at least one document includes: providing text relating to the at least one document.

9. The method of claim 8, wherein the at least one document originates from one of an audio source and a video source and the text relating to the at least one document includes a transcription of one of an audio signal from the audio source and a video signal from the video source.

10. The method of claim 1, wherein the querying the group of documents includes: identifying one or more of the documents that contain at least one occurrence of the one or more of the proper names.

11. The method of claim 1, wherein the querying the group of documents includes: identifying one or more of the documents that contain at least one occurrence of the one or more of the proper names in a plurality of languages or variations of the one or more of the proper names.

12. The method of claim 1, further comprising: receiving a selection of one of the one or more of the documents, as a selected document; providing the selected document to the user with proper names visually distinguished.

13. The method of claim 12, wherein the one or more of the proper names are visually distinguished differently from other ones of the proper names in the selected document.

14. The method of claim 12, further comprising: permitting the user to cycle through occurrences of the one or more of the proper names in the selected document.

15. The method of claim 1, further comprising: receiving a selection of another one or more of the proper names from the one or more of the documents; querying the group of documents based on the another one or more of the proper names; and presenting, to the user, at least one of the documents from the group of documents as a result of the querying.

16. The method of claim 1, further comprising: providing a histogram that includes the proper names from the group of documents and a count of a number of occurrences of each of the proper names.

17. A system for providing name browsing across a plurality of documents, comprising: means for receiving an identification of a subset of documents from a set of documents, documents in the subset of documents including a plurality of proper names; means for receiving a selection of at least one of the proper names, as at least one selected proper name, from the subset of documents; means for searching the subset of documents to identify one or more of the documents in the subset of documents that include at least one occurrence of the at least one selected proper name; and means for presenting the one or more of the documents as a result of the searching.

18. A name browsing system, comprising: a database configured to store a plurality of multimedia documents, the documents including a plurality of proper names; and a server connected to the database and configured to: identify a group of the documents in the database, receive a selection of one of the proper names, as a selected proper name, from within a document in the group of documents, use the selected proper name to query the group of documents and identify one or more of the documents in the group of documents that include at least one occurrence of the selected proper name, and provide text from the one or more of the documents with the selected proper name visually distinguished.

19. The system of claim 18, wherein the proper names correspond to names of at least one of people, places, and organizations.

20. The system of claim 18, wherein the documents include documents that originate from at least two of audio sources, video sources, and text sources.

21. The system of claim 18, wherein when identifying a group of the documents, the server is configured to: receive one or more search terms from a user, search the database based on the one or more search terms, and identify documents in the database, as the group of documents, as a result of the searching.

22. The system of claim 18, wherein the server is further configured to: present the group of documents to a user, receive a selection of at least one document from the group of documents, and provide the at least one document to the user with proper names visually distinguished.

23. The system of claim 22, wherein the proper names relate to names of people, places and organizations; and wherein when providing the at least one document, the server is configured to: visually distinguish the proper names relating to people, places, and organizations differently.

24. The system of claim 18, wherein when using the selected proper name to query the group of documents, the server is configured to: identify one or more of the documents that contain at least one occurrence of the selected proper name in a plurality of languages or variations of the selected proper name.

25. The system of claim 24, wherein the database includes: an alias table that provides variations for the proper names.

26. The system of claim 24, wherein the database includes: a translingual table that provides translations of the proper names into a plurality of languages.

27. The system of claim 18, wherein when providing text from the one or more of the documents, the server is configured to: visually distinguish all proper names in the one or more of the documents.

28. The system of claim 27, wherein the selected proper name is visually distinguished differently from other ones of the proper names in the one or more of the documents.

29. The system of claim 18, wherein a user is permitted to cycle through occurrences of the selected proper name in the one or more of the documents.

30. The system of claim 18, wherein the server is further configured to: receive a selection of another one of the proper names, as another selected proper name, from the one or more of the documents, query the group of documents based on the another selected proper name, and present at least one of the documents from the group of documents as a result of the querying.

31. The system of claim 18, wherein the server is further configured to: provide a histogram that includes the proper names from the group of documents and a count of a number of occurrences of each of the proper names.

32. The system of claim 18, wherein the documents include audio documents; and wherein the system further comprises: an indexing system configured to: generate transcriptions of the audio documents, and locate proper names in the transcriptions of the audio documents.

33. A method for providing browsing by proper name across a plurality of documents, comprising: identifying a group of documents that includes multimedia documents containing a plurality of proper names; presenting text corresponding to the multimedia documents with the proper names visually distinguished; receiving a selection of one or more of the proper names, as one or more selected proper names; identifying at least one of variations of the one or more selected proper names and translations of the one or more selected proper names; querying the group of documents to identify one or more of the documents, as one or more identified documents, that contain one or more occurrences of the one or more selected proper names, the variations of the one or more selected proper names, or the translations of the one or more selected proper names; and providing the one or more identified documents as a result of the querying.

Description:

RELATED APPLICATIONS

[0001] This application claims priority under 35 U.S.C. §119 based on U.S. Provisional Application Nos. 60/394,064 and 60/394,082, filed Jul. 3, 2002 and Provisional Application No. 60/419,214, filed Oct. 17, 2002, the disclosures of which are incorporated herein by reference.

GOVERNMENT CONTRACT

[0002] The U.S. Government may have a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Contract No. N66001-00-C-8008, awarded by Defense Advanced Research Projects Agency (DARPA).

BACKGROUND OF THE INVENTION

[0003] 1. Field of the Invention

[0004] The present invention relates generally to multimedia environments and, more particularly, to systems and methods for browsing multimedia information by name.

[0005] 2. Description of Related Art

[0006] Much of the information that exists today is not easily manageable. For example, databases exist for storing different types of multimedia information. Typically, these databases treat audio and video differently from text. Audio and video data are usually assigned text annotations to facilitate their later retrieval. Traditionally, the audio and video data are assigned the text annotations manually, which is a time-consuming task. The annotations also tend to be insufficient to unambiguously describe the media content.

[0007] A common problem confronting a user of these large databases of multimedia documents is that it is difficult to find pertinent information about people, places, and organizations, or the relationships between such people, places, and organizations, both across media and across languages. Traditional query approaches frequently generate too many hits or too many inaccurate matches.

[0008] As a result, there is a need for systems and methods that permit users to easily browse a collection of documents for names of people, places, and organizations.

SUMMARY OF THE INVENTION

[0009] Systems and methods consistent with the present invention address this and other needs by permitting users to browse and query groups of documents by proper name. Using such systems and methods, a user may select a document from a collection of documents and have all of the proper names highlighted in some manner. The user may then select one of the proper names and be presented with all of the documents (or sections or passages) that mention that name.

[0010] In one aspect consistent with the principles of the invention, a system that permits document browsing by proper name is provided. The system identifies a subset of documents from a set of documents. The documents in the subset of documents include proper names. The system receives a selection of at least one of the proper names from the subset of documents and searches the subset of documents to identify one or more of the documents in the subset of documents that include at least one occurrence of the selected proper name(s). The system then presents the one or more of the documents as a result of the search.

[0011] In another aspect consistent with the present invention, a name browsing system that includes a database and a server is provided. The database stores multimedia documents that include occurrences of proper names. The server identifies a group of the documents in the database, receives a selection of one of the proper names from within a document in the group of documents, and uses the selected proper name to query the group of documents and identify one or more of the documents in the group of documents that include at least one occurrence of the selected proper name. The server then provides text from the one or more documents with the selected proper name visually distinguished.

[0012] In a further aspect consistent with the present invention, a method for providing browsing by proper name is provided. The method includes identifying a group of documents that includes multimedia documents containing a plurality of proper names; presenting text corresponding to the multimedia documents with the proper names visually distinguished; receiving a selection of one or more of the proper names; identifying variations of the one or more selected proper names and/or translations of the one or more selected proper names; querying the group of documents to identify one or more of the documents that contain one or more occurrences of the one or more selected proper names, the variations of the one or more selected proper names, or the translations of the one or more selected proper names; and providing the one or more identified documents as a result of the querying.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the invention and, together with the description, explain the invention. In the drawings,

[0014] FIG. 1 is a diagram of a system in which systems and methods consistent with the present invention may be implemented;

[0015] FIG. 2 is an exemplary diagram of a portion of the database of FIG. 1 according to an implementation consistent with the principles of the invention;

[0016] FIG. 3 is a flowchart of exemplary processing for name browsing according to an implementation consistent with the principles of the invention;

[0017] FIG. 4 is an exemplary diagram of a user interface that may be presented to a user in response to a search query;

[0018] FIG. 5 is an exemplary diagram of the user interface of FIG. 4 when presenting a document with the proper names visually distinguished;

[0019] FIG. 6 is an exemplary diagram of the user interface of FIG. 4 when presenting resulting documents to the user; and

[0020] FIG. 7 is an exemplary diagram of a histogram table that may be presented to the user according to an implementation consistent with the principles of the invention.

DETAILED DESCRIPTION

[0021] The following detailed description of the invention refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements. Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents.

[0022] Systems and methods consistent with the present invention may facilitate the browsing and querying of groups of documents by proper name. Using such systems and methods, a user may follow thematic threads through documents of a database. These threads may include links by proper name.

[0023] Exemplary System

[0024] FIG. 1 is a diagram of an exemplary system 100 in which systems and methods consistent with the present invention may be implemented. System 100 may include multimedia (MM) sources 110, indexing system 120, database 130, and server 140 connected to clients 150 via network 160. Network 160 may include any type of network, such as a local area network (LAN), a wide area network (WAN) (e.g., the Internet), a public telephone network (e.g., the Public Switched Telephone Network (PSTN)), a virtual private network (VPN), or a combination of networks. The various connections shown in FIG. 1 may be made via wired, wireless, and/or optical connections.

[0025] Multimedia sources 110 may include audio, video, and text sources. The audio sources may include any source of audio data, such as radio, telephone, and conversations. The video sources may include any source of video data, such as television, satellite, and a camcorder. The text sources may include any source of text, such as e-mail, web pages, newspapers, and word processing documents.

[0026] Indexing system 120 may include any mechanism that captures the data from multimedia sources 110, performs data processing and feature extraction on the data, and outputs analyzed, marked up, and enhanced language metadata. In one implementation consistent with the principles of the invention, indexing system 120 includes mechanisms, such as the ones described in John Makhoul et al., “Speech and Language Technologies for Audio Indexing and Retrieval,” Proceedings of the IEEE, Vol. 88, No. 8, August 2000, pp. 1338-1353, which is incorporated herein by reference.

[0027] In one implementation consistent with the principles of the invention, indexing system 120 may include audio, video, and text analyzers. The audio analyzer may receive an input audio stream or file from one or more audio sources and generate metadata therefrom. For example, the audio analyzer may segment the input stream/file by speaker, cluster audio segments from the same speaker, identify speakers known to the audio analyzer, and transcribe the spoken words. The audio analyzer may also segment the input stream/file based on topic and locate the names of people, places, and organizations (i.e., named entities). The audio analyzer may further analyze the input stream/file to identify the time at which each word is spoken. The audio analyzer may include any or all of this information in the metadata relating to the input audio stream/file.

[0028] The video analyzer may receive an input video stream or file from one or more video sources and generate metadata therefrom. For example, the video analyzer may segment the input stream/file by speaker, cluster video segments from the same speaker, identify speakers known to the video analyzer, and transcribe the spoken words. The video analyzer may also segment the input stream/file based on topic and locate the names of people, places, and organizations. The video analyzer may further analyze the input stream/file to identify the time at which each word is spoken. The video analyzer may include any or all of this information in the metadata relating to the input video stream/file.

[0029] The text analyzer may receive an input text stream or file from one or more text sources and generate metadata therefrom. For example, the text analyzer may segment the input stream/file based on topic and locate the names of people, places, and organizations. The text analyzer may further analyze the input stream/file to identify where each word occurs (possibly based on a character offset within the text). The text analyzer may also identify the author and/or publisher of the text. The text analyzer may include any or all of this information in the metadata relating to the input text stream/file.

[0030] As mentioned above, the audio, video, and text analyzers of indexing system 120 may locate proper names within input streams or files. The proper names may include the names of people, places, and organizations. In one implementation consistent with the present invention, the audio, video, and text analyzers use techniques similar to the ones described in D. Bikel et al., “An Algorithm that Learns What's in a Name,” Machine Learning, Vol. 34, 1999, pp. 211-231, which is incorporated herein by reference, to locate proper names within an audio, video, and/or text stream or file.

[0031] Server 140 may include a computer (e.g., a processor executing instructions from a memory) or another device that is capable of managing database 130 and servicing client requests for information. Server 140 may provide requested information to a client 150, possibly in the form of a HyperText Markup Language (HTML) document or a web page. Client 150 may include a personal computer, a laptop, a personal digital assistant, or another type of device that is capable of interacting with server 140 to obtain information of interest. Client 150 may present the information to a user via a graphical user interface, possibly within a web browser window.

[0032] Exemplary Database

[0033] Database 130 may include a relational database that includes multiple, possibly interrelated, tables. FIG. 2 is an exemplary diagram of a portion of database 130 according to an implementation consistent with the principles of the invention. In the portion of database 130 shown in FIG. 2, database 130 includes a document table 210, a section table 220, a passage table 230, a full text table 240, topic labels table 250, named entity table 260, alias table 270, and translingual table 280. Before describing what is actually stored in these tables, it may be useful to define what is meant by document, section, and passage.

[0034] A document refers to a body of media that is contiguous in time (from beginning to end or from time A to time B) which has been processed and from which features have been extracted by indexing system 120. Examples of documents might include a radio broadcast, such as NPR Morning Edition on Feb. 7, 2002, at 6:00 a.m. eastern, a television broadcast, such as NBC News on Mar. 19, 2002, at 6:00 p.m. eastern, and a newspaper, such as the Washington Post for Jan. 15, 2002.

[0035] A section refers to a contiguous region of a document that pertains to a particular theme or topic. Examples of sections might include local news, sports scores, and weather reports. Sections do not span documents, but are wholly contained within them. A document may have areas that do not have an assigned section. It is also possible for a document to have no sections.

[0036] A passage refers to a contiguous region within a section that has a certain linguistic or structural property. For example, a passage may refer to a paragraph within a text document or a speaker boundary within an audio or video document. Passages do not span sections, but are wholly contained within them. A section may have areas that do not have an assigned passage. It is also possible for a section to have no passages.

[0037] Documents, sections, and passages may be considered to form a hierarchy because a document may have zero or more sections and a section may have zero or more passages.

[0038] Document table 210, section table 220, and passage table 230 may include a set of keys that are common to all types of media, such as a document key, a section key, and a passage key. A key in a relational database is a field or a combination of fields in a table that uniquely identify a record in the table or reference a record in another table. There are typically two types of keys: a primary key and a foreign key. A primary key uniquely identifies a record within a table. In other words, each record in a table is uniquely identified by one or more fields making up its primary key. A foreign key is a field or a combination of fields in one table whose values match those of a primary key of another table.

[0039] The document key may include a field that uniquely identifies a document. The section key may include a field that uniquely identifies a section within a particular document. The passage key may include a field that uniquely identifies a passage within a particular section. By combining the keys, any passage or section of a document may be uniquely identified based on the location of the passage or section within the document. For example, using a document key, section key, and passage key to uniquely identify a passage, it is easy to determine the section (using the section key) and document (using the document key) in which the passage is located. This relationship flows in both directions.

[0040] Document table 210 may store information regarding multiple documents. Document table 210 may include one record per document, where each record may include a document key and miscellaneous other fields. The miscellaneous other fields may include fields relating to the time the document was created, the source of the document, a title of the document, the time the document started, the duration of the document, the region, subregion, and country in which the document originated, and/or the language in which the document was created.

[0041] Section table 220 may store information regarding multiple sections. Section table 220 may include one record per section, where each record may include a document key, a section key, and miscellaneous other fields. The miscellaneous other fields may include fields relating to the start time of the section, the duration of the section, and/or the language in which the section was created. Passage table 230 may store information regarding multiple passages. Passage table 230 may include one record per passage, where each record may include a document key, a section key, a passage key, and miscellaneous other fields. The miscellaneous other fields may include fields relating to the start time of the passage, the duration of the passage, the name of a speaker in the passage, the gender of a speaker in the passage, and/or the language in which the passage was created.

[0042] Full text table 240 may store text from a section and/or document. Full text table 240 may include one record per section and/or document, where each record may include a document key, a section key, and miscellaneous other fields. The miscellaneous other fields may include the full text (including a transcription when the document is an audio or video document) of the document identified by the document key. Topic labels table 250 may store topics relating to a section and/or document. Topic labels table 250 may include one record per section and/or document, where each record may include a document key, a section key, a topic key, and miscellaneous other fields. The miscellaneous other fields may include topics, ranks, and scores relating to the section identified by the section key and/or the document identified by the document key.

[0043] Named entity table 260 may store data relating to proper names that appear in passages, sections, and/or documents. Named entity table 260 may include one record per proper name occurrence (i.e., a proper name may occur multiple times within named entity table 260, such as once for each mention of the proper name within the passage, section, and/or document). Each record within named entity table 260 may include a document key, a section key, a passage key, a named entity key, and miscellaneous other fields. The named entity key may include a proper name that refers to a person, place, or organization within the passage identified by the passage key, the section identified by the section key, and/or the document identified by the document key. The miscellaneous other fields may include the type of proper name (e.g., person, place, or organization).

[0044] Alias table 270 may store data relating to coreferences (or aliases) for proper names included in named entity table 260. Alias table 270 may include one record per proper name, where each record may include a named entity key and one or more alias fields. The alias fields may store coreferences (or aliases) relating to the proper name identified by the named entity key. A coreference for a proper name is another way to refer to the person, place, or organization corresponding to the proper name. For example, there are multiple ways to identify the current President of the United States: George Walker Bush, George W. Bush, George Bush, Bush, President Bush, and The President. In some cases, a coreference may include a different manner of spelling a proper name. For example, the proper name “Yasser Arafat” is sometimes spelled as Yaser Arafat, Yassir Arafat, and Yasir Arafat.

[0045] Translingual table 280 may store data relating to translations of proper names included in named entity table 260 and/or alias table 270. Translingual table 280 may include one record per proper name and/or coreference, where each record may include a named entity key and one or more translation fields. The translation fields may store translations of the proper name identified by the named entity key in multiple languages. It is useful to note that proper names can be translated nearly perfectly across languages.

[0046] Exemplary Processing

[0047] FIG. 3 is a flowchart of exemplary processing for name browsing according to an implementation consistent with the principles of the invention. Processing may begin with a user accessing server 140 via a client 150. The user may, in some way, identify a group of documents (act 310). For example, the user may query server 140 to retrieve the group of documents. In this case, the user may input one or more search terms into client 150. Client 150 may generate a query from the search terms and send the query to server 140. Server 140 may use the query to access database 130 to retrieve documents corresponding to the search terms. Server 140 may present the documents to the user via, for example, a browser interface on client 150.

[0048] Alternatively, the user may identify the group of documents in other ways. For example, the user may provide the documents to server 140, such as providing links to the documents to server 140. In this case, server 140 may obtain the documents (if necessary), send the documents to indexing system 120 for processing, and return the processed documents to the user via client 150.

[0049] The documents in the group may include any type of document in any language. For example, the documents might include audio documents, video documents, and/or text documents. The documents might also include documents in English, Arabic, Chinese, etc.

[0050] In any event, server 140 may present the identified group of documents to the user via client 150 (act 320). For example, server 140 may present the documents in some order that is meaningful to the user, such as a list of documents that are sorted by their relevance to the user's search terms. Alternatively, the documents may be presented in no particular order.

[0051] FIG. 4 is an exemplary diagram of a user interface 400 that may be presented to a user in response to a search query. In this particular implementation, the user interface takes the form of a browser interface.

[0052] User interface 400 may include a search criteria section 410 relating to the search query entered by the user and the number of documents resulting from the search query. Assume that the user entered the following search terms: “Kenneth Lay” and “fifth.” Assume that server 140 uses the search terms to identify a group of 68 documents within database 130. In this case, search criteria section 410 might include the search terms “Kenneth Lay” and “fifth” and indicate that the group of documents includes 68 documents identified as a result of the search.

[0053] User interface 400 may present the group of documents to the user in a document list section 420. The documents may be presented in a random order or an order that is meaningful to the user or selected by the user. For example, the documents may be presented in a chronological order based on the dates on which the documents were created. The documents may be presented in order with the newer documents being presented higher in the list than older documents, or vice versa.

[0054] Each of the documents may have an associated icon that indicates the document's type. For example, a document icon 422 may be used to indicate that the document corresponds to a text document. A speaker icon 424 may be used to indicate that the document corresponds to an audio document. A camcorder icon 426 may be used to indicate that the document corresponds to a video document.

[0055] Returning to FIG. 3, the user may peruse the documents and select one or more of them (act 330). The user may use conventional techniques, such as a mouse click, to select one or more of the documents from the group. In response to the user's selection, server 140 may present the document to the user with the proper names visually distinguished in some manner (act 340). As described above, indexing system 120 may process documents to, among other things, identify the names of people, places, and organizations (i.e., proper names) within the documents. Indexing system 120 may tag, or otherwise label, words in a document that correspond to proper names. Server 140 may use these tags to visually distinguish (e.g., highlight) the proper names in a document. Server 140 may visually distinguish proper names relating to people, places, and organizations differently. For example, server 140 may use different colors to differentiate among proper names relating to people, places, and organizations.

[0056] FIG. 5 is an exemplary diagram of user interface 400 when presenting a document with the proper names visually distinguished. When the user selects a particular document from the list of documents, user interface 400 may present a document section 510. Document section 510 may include direction arrows 512, document type indicator 514, document descriptor 516, and document text 518.

[0057] Direction arrows 512 may include left and right direction arrows that can be used to cycle through occurrences of the search terms within the document in a conventional manner. Document type indicator 514 may include an identifier that indicates whether the document corresponds to a text document, an audio document, or a video document. Document descriptor 516 may include a title of the document. In the case of audio or video documents, however, document descriptor 516 may alternatively include a list of topics that relate to the documents.

[0058] Document text 518 may include the text of the document identified by document descriptor 516. In the case of audio or video documents, document text 518 may include a transcription of the audio or video documents. Within document text 518, the search terms may be visually distinguished (e.g., highlighted) in some manner to aid the user in determining the relevance of the document. Also within document text 518, proper names may be visually distinguished differently from the search terms to assist the user in viewing the document. Each of the proper names may also be selectable by the user.

[0059] Returning to FIG. 3, server 140 determines whether the user selects one of the proper names in the document (act 350). The user may use conventional techniques, such as a mouse click, to select a proper name. In response to selection of a proper name, server 140 may use the proper name as a query to search the group of documents for occurrences of the name (act 360). For example, server 140 may search for documents, within the group of documents already identified by the user, that contain one or more occurrences of the proper name. Server 140 may also locate documents that contain variations of the proper name using alias table 270 or the proper name in other languages using translingual table 280.

[0060] Server 140 may present the resulting documents to the user via a user interface (act 370). FIG. 6 is an exemplary diagram of user interface 400 when presenting resulting documents to the user. Assume that the user desired documents relating to the company Enron and, therefore, selected the name “Enron” in one of the documents about Kenneth Lay. In this case, search criteria section 410 may include additional information relating to the particular proper name selected by the user (e.g., “Enron”).

[0061] User interface 400 may also present a document table 610 to the user. Document table 610 may include a title column 612, a mentions column 614, and a date column 616. Title column 612 may include a list of titles of documents that relate to the proper name selected by the user. In the case of audio or video documents, title column 612 may alternatively include lists of topics that correspond to the audio or video documents. Mentions column 614 may include the number of times that the proper name occurs in the corresponding documents. Variations of the proper name (from alias table 270) and the proper name in different languages (from translingual table 280) may be included when calculating the number of occurrences of the proper name. Date column 616 may include the date on which the corresponding documents were created.

[0062] The information in document table 610 may be presented in a number of different ways, whichever is more meaningful to the user. For example, information may be sorted by title, number of mentions, and/or date. The titles may be arranged in any order (e.g., alphabetically). The number of mentions may be sorted from least-to-most number of mentions or from most-to-least number of mentions. The dates of creation may be sorted from most recent to least recent or from least recent to most recent. Alternatively, a combination of these sorting arrangements may be used.

[0063] Returning to FIG. 3, the user may peruse the documents and select one of them to obtain text relating to the document. In response to the user's selection, server 140 may present the document to the user in a manner similar to that described above with regard to FIG. 5. Server 140 may then determine whether the user selects one of the proper names in the document (act 380). If so, server 140 may return to act 360 to perform a query over the documents using the selected proper name.

[0064] At any time, the user may obtain a high level view of the proper names contained within the documents. In this case, the user may cause a histogram to be created. FIG. 7 is an exemplary diagram of a histogram table 700 that may be presented to the user according to an implementation consistent with the principles of the invention. Histogram table 700 may include a name column 710 and a number of mentions column 720. In other implementations, histogram table 700 may include additional, different, or fewer columns.

[0065] Name column 710 may include a list of proper names that appear in the documents. A representation of each proper name may appear once in name column 710. Variations of a proper name (from alias table 270) and the proper name in different languages (from translingual table 280) may also be represented by the representation of the proper name. Number of mentions column 720 may identify the number of times the corresponding proper names are mentioned in the documents. The number of mentions for a proper name may occur within a single document or across multiple documents.

[0066] The information in histogram table 700 may be presented in a number of different ways, whichever is more meaningful to the user. For example, information may be sorted by proper name and/or number of mentions. The proper names may be sorted by type (e.g., person, place, and organization) or arranged in any other order (e.g., alphabetically). The number of mentions may be sorted from least-to-most number of mentions or from most-to-least number of mentions. Alternatively, a combination of these sorting arrangements may be used.

[0067] Any of the proper names in table 700 may be selected and the corresponding documents presented to the user. In one implementation, the documents are presented to the user via a user interface, such as user interface 400 of FIG. 6.

[0068] Conclusion

[0069] Systems and methods consistent with the present invention provide a way for a user to follow thematic threads through documents, which, in this case, may include links by proper name. The user may start with a query over a database of documents to identify some subset of documents that relate to one or more items of interest to the user. In some implementations, the subset of documents includes the entire database of documents. The user may then select one or more proper names to determine their occurrence within the subset of documents. The user may view the documents and follow the links created through the documents by the proper name(s).

[0070] The foregoing description of preferred embodiments of the present invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention.

[0071] For example, while server 140 has been described as performing certain functions with respect to the presentation of documents to a user, one or more of these functions may be performed by client 150 in implementations consistent with the principles of the invention.

[0072] Also, a series of acts has been presented with regard to FIG. 3. The order of the acts may vary in other implementations consistent with the present invention. Further, non-dependent acts may be performed in parallel.

[0073] No element, act, or instruction used in the description of the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items. Where only one item is intended, the term “one” or similar language is used. The scope of the invention is defined by the claims and their equivalents.