United States Patent 3846586

An improved device having a first step of automatically instantaneous conversion of speech into writing by separating the speech into various types of its components, such as fricatives, vowels, plosives, nasals, etc., by the use of only a single oral input, distinguished from the original talkwriter development disclosed in Ser. No. 1,739, now U.S. Pat. No. 3,646,576, wherein two inputs were used, one from the throat and one oral. Having separated out these appropriate components of the speech, switches, gates and other circuit mechanisms are used to actuate other circuitry and a typewriter which records the input sounds. The device accomplishes the over-all goals and objectives achieved in the forementioned U.S. Pat. No. 3,646,576, except with a circuit replacement for FIG. 2 thereof and a modification of FIG. 9 thereof disclosed and provided herein. Thus the invention allows a single oral input to be analyzed in real time with its verbal message being printed out.

Application Number:
Publication Date:
Filing Date:
Primary Class:
International Classes:
G10L15/00; (IPC1-7): G10L1/16
Field of Search:
View Patent Images:
US Patent References:
3395249Speech analyzer for speech recognition system1968-07-30Clapper
3225141Sound analyzing system1965-12-21Dersch
3198884Sound analyzing system1965-08-03Dersch
3158685Synthesis of speech from code signals1964-11-24Gerstman

Other References:

IBM Technical Disclosure Bulletin, Harper, Vowel Separation by Time Ratio Measurements, 3/1963. .
IBM Technical Disclosure Bulletin, Ferrier, Plosive Measurement, 3/1963. .
IBM Technical Disclosure Bulletin, Harper, Friction Voicing Separator, 2/1962. .
IBM Technical Disclosure Bulletin, Dersch, Voiced Sound Detector, 8/1962..
Primary Examiner:
Claffy, Kathleen H.
Assistant Examiner:
Leaheey, Jon Bradford
Attorney, Agent or Firm:
Misegades, Douglas & Levy
What is claimed is

1. Sound separator device for speech-to-writer apparatus, using single-channel input through an amplifier circuit for connected speech, and separating in real time the speech elements into each of the types of speech sounds of, for example, English: (1) vowels and semi-vowels, (2) nasals, (3) unvoiced fricatives, (4) voiced fricatives, (5) unvoiced stops, (6) voiced stops; said separator device comprising (FIG. 1):

2. Sound separator device for speech-to-writer apparatus comprising:

3. The invention according to claim 2, wherein said sound separation means (FIG. 1) has sensor means, band-pass filters, rate generators, and comparators responsive to the oral input signals from said microphone for passing through a single channel, and sorting speech sounds according to said six categories.

4. The invention according to claim 2, wherein four networks comprising said fricative, nasal and vowel separators include:

5. The invention according to claim 4, wherein a phoneme sequence sensor and designator is provided with a series of time chopper units, each for receiving inputs and for thence providing outputs to said phoneme sequence sensor and designator, said time choppers receiving input signals identified, respectively, as fricatives, stops, vowels and nasals;


The present invention relates to apparatus and method for analysis of a single track voice signal which is detected and processed for its component speech sounds, and more particularly the invention is directed to elimination of dual microphone inputs or a microphone for voice and a microphone for throat signals. The single track voice input method of the present invention, for some purposes, substitutes measurement of energy fluctuations and levels in signals between 300 to 800 Hz bandwidth, for tactile sensing of phonation but does not require such substitution of energy fluctuations as a means for differentiation between different types of speech sounds which is the case where a dual track throat-mike method described in the parent application is used.


In order to aid the examiner in understanding the background of the present invention, distinguished from the invention of the parent application, there is herewith Attachment A titled "Delineation of Types of Speech Sounds from Single-Track Input" which is a narrative write-up exemplary of the invention and its background leading to a further understanding of the present invention.


An object of the invention is to provide a real-time detection and analysis of speech sounds by circuit analyzing and switching of sounds according to their manner of production which in this application is taken as six in number for separate subsequent analysis for identifying specific phonemes of each given category of speech sounds. The six categories are vowels (and semi-vowels), nasals, unvoiced fricatives, voiced fricatives, unvoiced stops and voiced stops. The switching and analysis contemplated in the present invention are accomplished from a single-track oral input, such as from a single microphone, recorder source or amplifier device, as are well known in the art.

A further object and accomplishment of the invention is to provide means for producing distinctions between voiced stops and unvoiced and unreleased stops (plosives) which are detected independent of voicing, when necessary, by means of rate of change of signal strength and durational timing circuit means.


The above and other objects and advantages of the invention will become apparent upon full consideration of the following detailed description and accompanying drawings in which:

FIG. 1 is a circuit schematic diagram for a sound separator apparatus or unit for a transducer module for separation into types of characterizations or sounds from a given single sonic source; and

FIG. 2 is a block diagram of the transcriber module as modified from FIG. 9 of application Ser. No. 1,739, now U.S. Pat. No. 3,646,576, according to the preferred and best mode of the invention.


Referring now to the drawings, there is shown a sound separator apparatus in FIG. 1, and it is seen to use identical means for processing the characterized six sounds of stops (plosives), both voiced and unvoiced, and undifferentiated. It also uses the same method and apparatus for distinguishing silence from the speech elements. The figure shows a substituted component identified as fricative, nasal and vowel separator 10 for processing nasals, vowels (and semi-vowels) or fricatives, thus working differently in this respect.

Since the present invention is restricted to a single-track oral input and cannot use a comparator of inputs of undifferentiated nature, it differentiates the sustained speech elements principally by measuring and comparing rates of change, or by identifying and comparing intensities in certain bandwidths, as described below.

A 300 Hz high-pass filter 13 is inserted into the input oral signal line 12 so as to remove any residual low-level line hum, and the high-pass filter is connected from its output to a sensor element 32 and to a "vogad" 36 by a conductor 37. The sensor element 32 amplifies for further analysis the input signal applied thereto for subsequent analysis as to its kinds of speech sounds, so that they may be shunted through or conveyed to different subsequent analytical circuits according to the kinds of speech determined and their constituent components for detection of individual speech sounds in several determined categories.

From the sensor device 32, an output is coupled to a linear amplifier or "vogad" 36, from whence it is fed to a set of switches or gates 40,42,44,46,48,50, so that each passes the oral input when appropriate for its category or kind of speech sound being analyzed, as described in the application Ser. No. 1,739, now U.S. Pat. No. 3,646,576. These categories are six in number, and relate in the following manner to the switches 40-50:

Switch 40 -- unvoiced stops

Switch 42 -- voiced stops

Switch 44 -- unvoiced fricatives

Switch 46 -- voiced fricatives

Switch 48 -- nasals

Switch 50 -- vowels (and semi-vowels)

By these divisions or separations of conventional electric analogs of an oral input, there are derived signals from switches 40-50 that provide means of pre-switching nasals, vowels and voiced fricatives.

Signal means of opening each of the gates or switches 40-50 are also shown in FIG. 1. The oral input line is supplied also by a connector 87 to a bandpass filter 15 that passes 300 to 800 Hz. An output here indicates the presence of low-frequency sounds of fundamental frequencies of voice activity. The presence or absence of voicing is shown in the existence or null of this output through the filter 15. The OFF and ON indications are then supplied by connectors 72,70, both to the stops or silence detector 74 and to the unvoiced fricative gate 44 or to the voiced fricative gate 46, respectively, by means of connectors 78 or 76. The actual filtered output 19 is also supplied to three additional components 43,45,47, as part of the identification of nasal sounds, described below. A rate generator 17 also is connected to the output from filter 15 to measure the rate of change of amplitude of signal in this bandwidth. This differential signal is supplied by connector 80 to the stops-or-silence detector 74.

The oral input signal also is supplied through connector 87 to a network consisting of three bandpass filters 21,23,25 and a comparator 29 which serve to delineate fricative sounds. The connector is supplied to the three different bandpass filters simultaneously: filter 21 passes 600 to 1,700 Hz; filter 23 passes 1,700 to 2,100 Hz; and filter 25 passes 6,500 to 10,000 Hz. The outputs through filters 23 and 25 are added in connector 27 and then supplied to comparator 29. In comparator 29, these combined amplitudes are divided by the output of filter 21, and if the result is unity or greater, the comparator passes a signal through switch 31 by connector 32a to fricative gates 44,46 showing the presence of fricative sounds. Voicing ON or OFF from filter 15 determines the opening of gate 44 or gate 46.

Connector 87 also supplies the oral input signal to a network that distinguishes nasals and vowels from fricatives. Nasal distinction is a two-stage process to be described below, and vowels are identified in the absence of the second-stage differentiation of nasals. The vowel-nasal network consists of two bandpass filters 33,35 and a rate generator 37. The input is supplied from connector 87 simultaneously to two narrow bandpass filters: filter 33 passes 380 to 420 Hz, which is desirable for low voices, and filter 35 passes 680 to 720Hz, suited to high voices. Since neither must be exclusive of the other, their outputs are added, to show the presence of activity in either bandwith, and supplied to a rate generator 37. The rate generator 37 takes readings at intervals of .01 second or less, sensing successively whether in each succeeding period there is a measurable disruption of the steady energy-level that is characteristic of a vowel or nasal. This rate generator has two outputs 39,41. One output 39 is connected to switch 31 to allow opening of the fricative gates 44 and 46 by means of connector 52a only when there is a rate of change output from the generator 37 in the absence of a vowel or nasal. The other output from it (41) passes a signal to permit opening of either the vowel or nasal gates 48 and 50, when there is an input but no change of rate with presence of a vowel or nasal, during two or more successive .01 second or less measured periods.

Connector 87 also is supplied to still another network to give positive identification for nasal sounds. It consists of two bandpass filters 43,47, a comparator 45, and a witch 49. Bandpass filter 43 passes 800 to 1,300 Hz, and the resulting energy is compared in a comparator 45 with that derived simultaneously from filter 15 mentioned earlier and passing bandwidth 300 to 800 Hz. When the ratio of this is two or more times stronger than the output of filter 43, a signal is passed into switch 49 for possible positive indication of a nasal sound (including /1/). The final use of input 87 is to supply a high-pass filter 47 for over 1,700 Hz. Absence of any output from it will activate switch 49 so that indication of a nasal sound passes through connector 54a to the nasal gate 48. While the nasal gate is thus activated, an activity signal is pssed by connector 51 to the vowel gate, switch 50, so that it will be inactivated only when the nasal gate is open. Specific vowels will be identified, of course, only when their individually required parameters are met in subsequent phoneme analysis beyond the scope of this invention, but in the parent case.

The stops switches 40,42 and the fricative switches 44,46 are activated not alone by the processes described above, but in conjunction with and in response to additional processes. The stops-or-silence indicator 74 is a component almost identical to that of the Talkwriter apparatus, modified here only with one difference: the source of inputs 70,72,80 instead of deriving from a throat sensor, derives from the network described above associated with the 300 to 800 Hz bandpass filter 15. Other inputs are a signal conductor 82 from the oral differentiator 62, signal conductor 84 for oral OFF, and the signal conductor 86 for oral ON, and an oral input from vogad 36 on conductor 88.

A threshold adjustment means 90 is connected from the output of the sensor 32 to a stress signal terminal 92.

There are seven general output signals from the stops-or-silence indicator 74, three of which are applied to the unvoiced stops switch 40, i.e., conductors 94 (two of them), 169 two of which are applied as output signals to the voiced stops switch 42 over conductors 96,96; an output to a silence terminal 98; and an output from a switch 140 that shows probable presence of a stop, which output passes by conductor 128 to close the unvoiced fricatives gate 44.

An output 169 used as an indication of an undifferentiated unvoiced stop is used also to pass through the unvoiced stop switch 40 and then into the detection circuit of other components of a Talkwriter apparatus. The outputs through gates 40-50, namely 88, 88,42a,44a,46a,48a and 50a, serve other components of a Talkwriter apparatus also. The output of the sensor 32 of the oral input signal strength is supplied through connectors 58 for use in other components of the Talkwriter apparatus where an indication of oral input signal strength is required. Finally, the output of the rate generator 62 of the oral input signal is supplied through connector 82 for similar use in other components of the Talkwriter apparatus where that information is needed.

Additional embodiments of the invention in this specification will occur to others and therefore it is intended that the true spirit of the invention be limited only by the appended claims and not by the embodiment described hereinabove. Accordingly, reference should be made to the following claims in determining the true spirit of the invention.