[0001] This application claims priority under 35 U.S.C. § 119 based on U.S. Provisional Application Nos. 60/394,064 and 60/394,082, filed Jul. 3, 2002, and Provisional Application No. 60/419,214, filed Oct. 17, 2002, the disclosures of which are incorporated herein by reference.
[0002] This application is related to U.S. patent application, Ser. No. ______ (Docket No. 02-4036), entitled, “Systems and Methods for Facilitating Playback of Media,” filed concurrently herewith and incorporated herein by reference.
[0004] 1. Field of the Invention
[0005] The present invention relates generally to language translation and, more particularly, to systems and methods for aiding a human in translating audio data.
[0006] 2. Description of Related Art
[0007] There are three major tasks when performing translations of an audio signal: selection, translation, and publication. During selection, a human translator chooses a segment of audio to translate. During translation, the translator actually translates the audio segment. During publication, the translator publishes or saves the translation results.
[0008] Human translation is a slow and time-consuming process. As a result, the human translator typically translates only important segments of an audio signal. The translator will often work from a recorded audio signal to skim the complete audio signal, listening for segments that are suitable for translation. The translator then replays selected segments, translating the speech while transcribing them with a word processor. To accurately transcribe the audio segments, the translator will usually go through the audio segments many times, rewinding the audio repeatedly to keep the translation synchronized with the playback. Only after the translator feels that the translated audio segment is accurate and complete will the translator publish the translation results.
[0009] As a result, there is a need for mechanisms that facilitate and expedite the translation of an audio signal.
[0010] Systems and methods consistent with the present invention address this and other needs by providing a transcription of an audio signal, along with the original audio signal, to a translator to assist the translator in translating the audio signal. The systems and methods visually synchronize the playback of the audio signal with the transcription to aid the translation process.
[0011] In one aspect consistent with the principles of the invention, a system aids a user in translating an audio signal that includes speech from one language to another. The system retrieves a textual representation of the audio signal and presents the textual representation to the user. The system receives selection of a segment of the textual representation for translation and obtains a portion of the audio signal corresponding to the segment. The system then provides the segment of the textual representation and the portion of the audio signal to the user to help the user translate the audio signal.
[0012] According to another aspect of the invention, a graphical user interface is provided. The graphical user interface includes a transcription section, a translation section, and a play button. The transcription section includes a transcription of non-text information in a first language. The translation section receives a translation of the non-text information into a second language. The play button, when selected, causes retrieval of the non-text information to be initiated, playing of the non-text information, and the playing of the non-text information to be visually synchronized with the transcription in the transcription section.
[0013] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the invention and, together with the description, explain the invention. In the drawings,
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024] The following detailed description of the invention refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements. Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents.
[0025] Systems and methods consistent with the present invention aid a human translator in translating an audio stream from one language to another. The systems and methods present the human translator with the audio stream, along with a transcription of the audio stream. The systems and methods visually synchronize the playing back of the audio with the words in the transcription. As will be apparent below, such systems and methods help the human translator translate the audio stream more efficiently, quickly, and accurately.
[0026]
[0027] Generally, clients
[0028] Each of the components of system
[0029] Server
[0030] Processor
[0031] Input device
[0032] As will be described in detail below, server
[0033] Execution of the sequences of instructions contained in memory
[0034] Metadata database
[0035] For example, the media processing system may segment an input audio stream by speaker, cluster audio segments from the same speaker, identify speakers known to the system, and transcribe the spoken words. The media processing system may also segment the input stream into stories, based on their topic content, and locate the names of people, places, and organizations. The media processing system may further analyze the input stream to identify when each word is spoken. The media processing system may include any or all of this information in the transcription and metadata relating to the input stream.
[0036] Database of original media
[0037] The original audio may be stored in such a way that it is easily retrievable as a whole and in portions. For example, a portion of an audio signal may be retrieved by specifying that the portion of the signal that occurred between 8:05 a.m. and 8:08 a.m. is desired. The database
[0038] Client
[0039]
[0040] Processor
[0041] Input devices
[0042] As will be described in detail below, client
[0043] The software instructions may be read into memory
[0044]
[0045] Client
[0046] Server
[0047] Client
[0048] GUI
[0049] Transcription section
[0050] GUI
[0051] GUI
[0052] Returning to
[0053]
[0054] Returning to
[0055] The media player may then retrieve the desired portion of the original audio signal (act
[0056] As the media player plays back the original audio, client
[0057]
[0058] The user may be permitted to stop the playback at any time. The user may also be permitted to control the playback by, for example, fast forwarding, speeding it up, slowing it down, or backing it up so many seconds or so many words. The media player or the graphical user interface may present the user with a set of controls to permit the user to perform these functions. Alternatively, the user may use foot pedals to control the playback of the audio signal.
[0059] The user may also be permitted to alter the HTML document in some manner and save the altered document back in metadata database
[0060] At some point, the user may identify this document or another document as containing one or more portions that the user desires to translate.
[0061] In translating the audio signal, the user performs three separate tasks: selection
[0062] Upon selection of product button
[0063] Upon receipt of the return message, client
[0064] GUI
[0065] Returning to
[0066] The media player may then retrieve the corresponding portion of the original audio signal (act
[0067] As the media player plays back the original audio, client
[0068] As the audio plays, the user may type in, or otherwise provide, a translation of the audio signal (act
[0069] When configuration button
[0070] In an implementation consistent with the principles of the invention, the functions of play/pause button
[0071] Returning to
[0072] Systems and methods consistent with the present invention provide mechanisms that aid a human in translating an audio signal to another language. The systems and methods provide improvements at all three stages of the translation process. For example, the systems and methods provide a transcription of the audio signal to the translator. This helps the translator in selecting a segment of the audio signal to translate because it is faster to skim through text than it is to listen to an entire audio signal. The translator may also use search criteria to find relevant text. This makes it possible to easily monitor a very large number of audio sources.
[0073] The systems and methods also present a transcription of the audio signal on the same screen that the translator uses to provide the translation. The systems and methods visually synchronize the playback of the audio signal with the text in the transcription. This helps the translator in translating the audio signal. For example, this gives the translator two indications (audible and visual) of what a particular word might mean, which increases the speed of translation. More people can read a language and translate it than can translate audio alone.
[0074] The systems and methods also permit the translator to publish the translation results anywhere that is useful. This helps the translator in making the translation results available to those who would be interested in them.
[0075] The foregoing description of preferred embodiments of the present invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention.
[0076] For example, it has been disclosed that a media player retrieves the original audio when instructed by a user. In other implementations, the original audio may be transmitted to the user along with the translation of the audio and any associated metadata. In yet other implementations, more than the requested portion of the original media may be transmitted to the user in anticipation of its later request by the user.
[0077] It may also be possible to translate the audio signal or the transcription of the audio signal using automated techniques. In this case, the translation may be presented to a human translator, possibly along with the transcription and/or the original audio signal, to aid the translator in preparing an accurate translation of the audio signal.
[0078] Further, while aspects of the invention have been described as operating upon speech within an audio signal, these aspects may also operate upon speech contained with a video signal. Still, while aspects of the invention have been described in reference to a client-server configuration over a network, systems and methods for translating in a manner consistent with the present invention may also be implemented locally on a single computer.
[0079] No element, act, or instruction used in the description of the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items. Where only one item is intended, the term “one” or similar language is used. The scope of the invention is defined by the claims and their equivalents.