Description:
This invention relates to Voiceprint analysis, and more particularly to an automatic identification system by matching the voiceprint signals of an individual with that of a large storage of prerecorded voiceprint signals, for example, Nationally stored voiceprint signals, in the same order of prerecorded fingerprint patterns. The main object is to provide a highly reliable electronic arrangement capable of automatically selecting the correct voiceprint signals from a large number of sequentially fed prerecorded signals at high speed. A corollary object is to provide related arrangements which may be used for the purpose of command operation by a specific individual in the midst of a number of speakers.
In all voice recognition systems the most difficult problem involved is the enormous variations that occur in the speaking voice. These variations, however, are provided by Nature, so that an almost an infinite number of distinguishable informations can be derived therefrom. But it is not a simple process to locate a desired information in these unendingly varying complexities, and therefore, in a practical ultimate of realization, it becomes an absolute must that these variations are first normalized, so that the desired information can be picked out by a standard matching process with that of some predetermined parameters. In such a practice, the normalization must be highly precise, and highly stable without requiring any error-adjustments whatsoever. This is achieved in the ultimate accuracy by the switching arrangement of FIG. 1, wherein the group of signals contained in the arriving complex signal are regrouped in a reference arrangement at the outputs of a bank of channels in block 24, whereby the elemental signals composing the complex signal are rendered distinguishable for selection in any desired fashion. The regrouped signals at the channel outputs are then automatically matched with pre-stored signals in an arrangement of FIG. 2, for final identification of a particular voice desired.
BRIEF OF THE SIGNAL REGROUPING USED
There are used a bsnk of channels arranged in a predetermined numerical order, each one of which is prearranged with a plurality of signal-admitting inputs, and a plurality of signal-switching inputs, respectively. The incoming group of signals are applied to the corresponding ones of the plurality of signal-admitting inputs of each one of the plurality of channels, so that any one of the applied signals can be admitted to the output of any one of the channels by the operation of a respective signal-switching input. A plurality of prearranged combinations of signal-regrouping control signals are then applied (at high speed) sequentially to the plurality of signal-switching inputs until the detected signals derived from the lowest (pitch) frequency appears at the first channel output, representing the required reference signal regrouping. This signal regrouping is held in steady state, while at the same time switching sequence of signal regrouping combinations starts from a reference beginning for continual pitch frequency hunting until the pitch frequency changes to establish a new signal regrouping at the channel outputs.
SPECTRUM NORMALIZATION
In reference to variations of resonances in the sound spectrum of a speaking voice, a group of resonances representing a specific phonetic sound will vary in different frequency regions of the sound spectrum. In all previously proposed systems for frequency normalization, the system must provide a highly complicated follow-up system to determine the mutually related frequency ratios between the arriving group of resonances. For example, assuming that the various resonances in the sound wave are harmonically related to the lowest (pitch) frequency, the values may be shown as, F 1 + f o , F 2 + 2f o , F 3 + 3f o , -- F n + nf o , where (f o ) is the varying frequency components; and (F 1 + f o ) is the fixed reference pitch frequency component. Obviously, the artificial generation of the (F) components necessary for obtaining the shown values would involve critical and undesired control systems. But one method of signal conversion may be substituted by another without changing the specific information to be analyzed. Thus, instead of changing the varying frequencies in the sound wave into fixed frequencies, we may first derive detected signals from the pass-band filters (for subdividing the sound into sub-bands), and regroup them in an arrangement of numerical order, such as, 1, 2, 3, -- n, which by simulation may assume the values as: 1 = F 1 + f o , 2 = F 2 + 2f o , 3 = F 3 + 3f o , and n = F n + nf o , where (1) represents the fixed reference numeral (fixed reference pitch frequency). By such numerical conversion (digital conversion, as far as frequency components are concerned, without changing the amplitude components), we may now deal with on-and-off conditions which can be established in the highest order of control and stability with the present day digital techniques. Accordingly, the novel switching system for such numerical conversion, as used in the present invention will now be described by way of the accompanying illustrations, wherein:
FIGS. 1 and 2 are block diagrams of the voiceprint identifying arrangements, according to the invention; FIG. 3 is a graphical diagram showing how different groups of signals are regrouped at the channel outputs, according to the invention; FIG. 4 shows the special center frequency sub-divisions of the pass-band filters, as used in the present invention; FIG. 5 is a numerical chart showing how the detected signal outputs of the pass-band filters are switched linearly to the outputs of the numerically arranged channels, in accordance with the invention; FIG. 6 is the switching arrangement used herein; and FIG. 7 is an amplitude equalization arrangement in accordance with the invention.
GRAPHICAL REPRESENTATION OF SIGNAL REGROUPING AT CHANNEL OUTPUTS
Referring to the graphical illustration of FIG. 3, assume a bank of channels, as represented by the blocks 1 through n at D, each of which having a plurality of inputs, as represented by the blocks at A, B and C, drawn under each channel. In this arrangement, assume that the second and third shaded blocks (at A) are in the second and fourth multiples of the first shaded block, respectively. Similarly, when the group of signals are distributed along the shaded inputs of the channels, as at B, it is also seen that the second and third shaded blocks are in the second and fourth multiples of the first shaded block, respectively. In still further exemplary arrangement, when the input group of signals are distributed along the shaded inputs, as at C, it is further seen that the shaded blocks in the second and third shaded blocks are in the second and fourth multiples of the first shaded block, respectively. By such examples, accordingly, we may regroup any one of these three groups of input signals to a reference numerical region at the outputs of the numerically arranged channels, as at D, and we may derive the numerical ratios in a simpler mode than it would require by hunting the numerical locations in the blocks at A, B or C. At this point, however, the problem remains as to how to determine what combination of switching that is required for each of the group of signals at A, B or C, in order to obtain the reference signal regrouping at D. This is done simply by a prearranged matrix of a plurality of switching combinations, which are applied to the bank of channels sequentially until the signal derived from the lowest (pitch) frequency in the original voice is shifted to the first channel output for the required switching at D.
SPECIAL CENTER FREQUENCY PASS-BAND FILTERS
In reference to the illustration of FIG. 3, the information contained in complex waves, such as Voiceprints, depends on greater detail than the simple harmonic relations shown, and it requires further sub-divisions between the harmonic intervals. Thus in order to obtain high accuracy of signal regrouping without causing any cross switching of the input signals to the channels, the center frequencies of the sub-dividing pass-band filters are arranged as in FIG. 4, wherein the sub-divisions are similar to the standard musical scale. In this arrangement, it will be noted that the sub-divisions are arranged in a series of digitals at like intervals in harmonic successions corresponding to the digitals in preceding intervals. Thus, the numerical ratios of the numerals from the second through the thirteenth numerals with respect to the first numeral is the same as the numerical ratios of the 14th through the 25th numeral with respect to the 13th numeral, and exemplary demonstration. Such a numerical arrangement simplifies the actual channel switching arrangement, because it requires linearly sequenced numerical transfer without cross coupling of any of the pass-band outputs. This is shown in greater clarity by the numerical chart in FIG. 5.
SIGNAL REGROUPING NUMERICAL CHART
The signal regrouping arrangement is shown in greater clarity by the numerical chart in FIG. 5, wherein the top row of the numerals represent the channels, and the rows of numerals below represent the sequence of the numerals left hand of the sub-band frequencies of FIG. 4. Fog example, in the first row (FIG. 5), all of the detected outputs of pass-band filters (starting from filter number-1) are applied to the inputs of the channels starting from channel number-1. In the second row, the detected filter outputs starting from the number-2 filter are applied to all of the channels starting from the channel number-1, and so on. As stated in the foregoing, such simplicity of switching sequence becomes inherently accurate, as long as the harmonic sequence at like intervals is considered in sub-dividing the sound spectrum, because the number of sub-band divisions and center frequencies of the pass-band filters may be arranged other than the frequencies shown in FIG. 4 without affecting the required accuracy.
ANALOG CHANNEL SWITCHES
The arrangement of analog channel switches is shown in FIG. 6. The detected signals from the pass-band filters are applied to the drain electrodes of transistors Q1 through Q4, respectively. The source electrodes of these transistors are connected to ground in series with the output resistors R1, R2, R3, and Rn, respectively, which represent the channel outputs. In accordance with the numerical chart of FIG. 5, anyone of the detected signals from f 1 , f 2 . f 3 (FIG. 1) may be coupled to the resistor R1 of channel-1. This is done by the additional transistors Q5, Q8, Q10, the source electrodes of which are all connected in parallel with the source electrode of Q1. Thus, when the first distributor pulse is applied to the gate electrode of Q1 the detected signal from f 1 is admitted to the output resistor R1. When the second distributor pulse is applied to the gate-electrode of Q5 (also to those connected in parallel) the detected signal from f 2 is admitted to the output resistor R1 of channel-1, and so on. Thus, as the output signals from the set-reset flip-flops in the R-S blocks 17-19 in FIG. 1 are applied to the parallel connected gate-electrodes of the transistors in FIG. 6, the detected signals from the pass-band filters in FIG. 1 are regrouped across the channel output resistors shown, in accordance with the numerical chart of FIG. 5.
Having described the details of parts for the specific signalconversion system utilized herein, the actual system for Voiceprint identification will now be described by the block diagrams of FIGS. 1 and 2, as in the following:
SIGNAL CONVERSION SYSTEM
I have described novel signal conversion systems in my related patents, for example, U.S. Pat. No. 3,622,706 and 3,659,051, and reference may be made to these patents. For the specific purpose, however, and referring to FIG. 1, the voice sound wave in block 1 is applied to the passband filters in blocks 2-4 in parallel, and their outputs are first detected in blocks 5-7, respectively, and further applied to the signal-admitting inputs of the switching channels in block 24. The outputs of filters 2-4 are also applied to the sense-amplifiers in blocks 8-10, respectively, which are provided with set-reset flip-flop outputs, so that they produce "1" level output signals when their inputs receive useful signals from the filter outputs above a threshold level. The outputs of these sense-amplifiers are applied to the first inputs of gates 11-13, respectively, and the second inputs of these gates are excited by the sequential "1" level pulses of the pulse distributor in block 25, so that only those gates which have received simultaneous signals at their first inputs from the respective sense-amplifiers operate by the distribution pulses. Thus assuming that the distributor starts applying sequential pulses at "1" level to the input of gate 11, and that this gate has also received "1" level signal from the sense-amplifier 8, it applies an operating pulse to the one-shot 14, which in turn operates the set-reset flip-flop in block 17 into set state by way of a-c coupling (a-c and d-c couplings are available in integrated circuits). The set state output of flip-flop 17 is a-c coupled to the one-shot in block 20, which in turn operates and applies "0" level pulse to the multi-input gate in block 23 for operation. The output "1" level pulse of gate 23 is inverted into "0" level pulse, and applied in parallel through d-c coupling to the reset inputs of the set-reset flip-flops in blocks 17-19 for reset operating states. At this point, however, while the flip-flops 17-19 are driven into reset operating states, the flip-flop in block 17 remains in set operating state by reason that the d-c coupled pulse at its set input has a longer pulse period than the d-c coupled pulse at its reset input, as illustrated by pulse waveforms under the one-shots 14-16 and 20-22. Thus, the set output of flip-flop 17 is amplified in block 26 (this amplification might be necessary if the different types of integrated circuit devices availabe commercially, as utilized herein, are not found to be compatible one with another), and a selected signal-regrouping combination in the matrix of block 27 is applied to the channel switches in block 24 for the required signal regrouping at their outputs.
For continuous signal-regrouping operation, the distributor 25 must be reset for a new start of signal regrouping. At the start of this new pulse distribution, however, the pass-band filter outputs may have changed, and accordingly, the flip-flop outputs of the sense-amplifiers 8-10 must also be reset simultaneously with the resetting of the distributor. This is done by mixing the output pulses of one-shots 14-16 in the multi-input gate 28, the output of which is inverted in block 29 and applied (through d-c coupling) to the distributor 25 and sense-amplifiers 8-10 for reset operation, after which period the distributor starts hunting the output signals of the sense-amplifiers. In this new start, assume that the output of filter 2 is zero, the output of sense-amplifier 8 is also at "0" level, and accordingly, the first distribution pulse does not operate the gate 11. Whereas, if the output of sense-amplifier 9 is at "1" level, the gate 12 operates at the arrival of the second distribution pulse, and according to the previously described mode of operation, the flip-flop 17 is driven into rest operation, and the flip-flop 18 is driven into set operation for a new signal regrouping at the channel outputs of block 24. The pulse frequency of the pulse generator in block 30 is not critical, and may be adjusted to any useful frequency as desired. The gate 31 may be used to gate out the output pulses of generator 30 during resetting of the distributor.
MATCHING OF VOICEPRINTS WITH THAT OF PRERECORDED VOICEPRINT SIGNALS
As has been stated in the foregoing, all information bearing signals that are prone to variations, must necessarily however, contain the component signals for specificity during specific variation. Thus, a specific group of component signals within a larger group of component signals must represent the information of specificity. By the signal-regrouping arrangement described in the foregoing, it is only necessary to select a significant group of signals from the regrouped signals at the channel outputs of block 24 in FIG. 1, and match with prerecorded group of specific signals for identification of the information sought. This may be accomplished by the matching arrangement of FIG. 2.
In FIG. 2, the output signals of channel-1 through channel- n are differentiated with the signals which are prerecorded in the a, a, a, a; b, b, b, b; c, c, c, c blocks, in the operational-amplifiers 36-39. These signal-differentiating circuits at the channel outputs are similar, and therefore, description of the circuitry in reference to one of the channels is typical to the rest of the channels. Thus, assuming that the two input signals of the operational amplifier in block 36 are of equal amplitudes, the output is zero, or at least below a threshold level. On the other hand, when these two input signals are of unequal amplitudes, the output of block 36 is above zero, and its polarity depends on which of the two input signals is larger than the other. Because of this undetermined output polarity, the output of amplifier 36 is passed through a push-pull output amplifier in block 40, and applied to a sense-amplifier in block 44, which is provided with a one-shot output. Thus, when the two input signals of block 36 are of equal amplitudes, the output of sense-amplifier 44 is zero, and when these two input signals are of unequal amplitudes, the sense-amplifier 44 produces a one-shot pulse output. The output of sense-amplifier 44 is applied to one of the inputs of gate 49 in series with the normally idle switching transistor Q11. In a signal matching practice, we may assume that when the two matching signals are of equal amplitudes, it represents the correct information sought -- and it is false when the two signals differ in amplitudes. As an exemplary condition, accordingly, when the transistor Q11 is switched-ON, and the output of sense-amplifier is zero, the information fed to the gate 49 is correct, whereas, when the transistor Q11 is switched-ON, and the sense-amplifier 44 produces an output pulse, the information fed to the gate 49 is false. By combining the functional operations of different matching channels, accordingly, we may obtsin the following:
Assume that the a, a, a, a signals of the prerecorded signals are to be matched with the arriving signals from the channel outputs, and that the prerecorded signals are significant only in the channel-1, channel-2, and channel-n -- the blank block of channel-3 indicating that there will be no matching of a signal. The a blocks (having matching signals present) apply ON signals to the gate-electrodes of their respective switching transistors Q11, Q12, and Q14, for admitting the respective source-terminal outputs to the inputs of multi-input gate 49, as shown. The a blocks of these channels also apply the matching signals (analog signals) to the first inputs of AND gates 50, 51 and 53. Now all operating conditions being ready, the pulse generator in block 54 applies a pulse to all of the second inputs of the AND gates 50--53, and to all of the switches (not shown in order to avoid crowding of the drawing) between the amplitude equalizers in blocks 32--35 and the operational-amplifiers 36--39, respectively, for operation. Thus assuming that the signal matching of the arriving signals is correct in the channel outputs (channel-1, channel-2, channel-n), the switched-ON outputs of Q11, Q12 and Q14, and also the pulse from the generator 54, will be at "1" level to the inputs of gate 49, and therefore, the output of this gate will be at "0" level, as a signal representing the correct identification of the individual in question. On the other hand, if one of the input signals to the gate 49 is other than "1" level, the output of this gate will not be at "0" level, indicating false signal. But a decision on matching of only the a signals may not be sufficient, and therefore, the b signals may also be tried for matching, and getting an average prior to a final decision of correctness.
In some cases it may be desired that instead of ON and OFF decisions made in reliance to the output of gate 49, a fine gradient of combined signals at the outputs of operational-amplifiers 36-39 is further analyzed, and these signals are readily obtainable at the outputs of amplifiers 36-39. Similarly, since the channel-output signals are regrouped, it may be necessary to know what frequency range in the sound spectrum were the original voice sound originated. This information is also readily provided by the outputs of flip-flops 17-19 in FIG. 1, as indicated by the arrowed output terminals.
Since the main object of the matching system described herein is a modification (signal regrouping) of the signals prior to matching, it is obvious that the prerecorded signal must be modified in the same manner, as described by way of the arrangement in FIG. 1. Besides signal regrouping, however, there are other variables that must be modified, for example, the amplitude variation that occurs during each segment of the information to be analyzed. In the case that the arriving signals are first regrouped and recorded for matching with the prerecorded signals, the amplitudes of both recorded signals may be adjusted for the proper matching process. But in this case also, the original voice signal varies in amplitude uncontrollably from time to time, even when tried to duplicate a previous articulation, especially when the sequence of amplitude modulations in a whole spoken word are to be considered for matching. For this reason, it is preferable to include an automatic amplitude control of the channel output signals.
The automatic amplitude control is achieved by the equalization amplifiers in blocks 32-35, but the equalization is controlled only by the output signal of channel-1, as reference amplitude for matching. For example, as shown in the drawing of FIG. 2, the amplitude equalization is established by the peak signal in channel-1, as a feedback control to a reference level. This may be explained in more clarity by a simple equalization arrangement of FIG. 7, as in the following.
AMPLITUDE EQUALIZAER
In FIG. 7, the channel output is applied to the resistor R4 in series with the variable-resistance transistor Q15, so that the transfer from channel-1 to R4 may be voltage divided by change of resistance of the transistor Q15. The gate electrode of Q15 is normally forward biased by B to the lowest reference resistance between drain and source electrodes of Q15, and this forward bias is variably reduced by an opposing voltage stored in the capacitor C through a feed-back switching path. The information to be analyzed occurs between the peaks of the pitch wave in the voice sound, and this pitch wave is represented by the output wave of the first channel. Pitch selection has been used previously for various purposes, and different circuit arrangements already exist, which may be utilized for deriving pitch pulses either directly from the original voice sound wave, or from the channel-1 output of the arrangement shown in FIG. 1. Thus, the control voltage of capacitor C is changed during the short pitch pulses by operation of the normally OFF switches in blocks 56 and 57 for negative feed-back. This feed-back is controlled by the zener diode D1, which becomes conductive in reverse direction only when the voltage across RL is above a reference voltage. Accordingly, when a pitch pulse operates the switches in blocks 56 and 57, and assuming that the amplified voltage across load resistor RL is above the zener regulation voltage, the capacitor C is charged in polarity as to increase the internal resistance of Q15 to a point where the voltage across C is equal to the zener regulation voltage. When the switches 56 and 57 become in OFF states again, the channel output signal, as transferred to R4, has been regulated to a reference level, and this reference level has also been established in the rest of the channel outputs by the parallel connection of the capacitor C to each one of the amplitude equalizers that are used at the other channel outputs. In controlling the signal amplitude across R4, and assuming that the voltage across C is in the forward direction during a pitch pulse period, the capacitor C is discharged until the reference voltage regulation has been established. The use of the switch in block 57 may be prefereable when the amplifier 55 is chosen of a capacitive coupled amplifier, but actually the choice is irrelevant herein.
In reference to commercially available AND gates, they are generally referred to as being on-and-off devices by two input signals at "1" levels. Whereas, the AND gates 50-53 are actually analog switches. But the meaning of the AND gate is broad, and some analog switches are now appearing in the name of AND gates with some modified terminology attached to it. Accordingly, the term AND gate used herein does not necessarily mean having two on-and-off input signals, but one of the signals may vary in magnitude, and the other used as an on-and-off switching signal. Gates of this type are used between the amplitude equalizers 32-35 and the operational amplifiers 36-39, which are not shown in the drawing, but their requirement is referenced in writing by the pulses of the pulse-generator 54 applied to them during the signal matching performance.
When the system described herein is used for the purpose of selecting a particular individual from a group of speakers for some required performance, the system may be incorporated in conjunction with my phonetic sound recognition systems, for example, as described in my U.S. Pat. No. 3,622,706, so that during phonetic sound recognition an output switch is operated only when simultaneously voiceprint recognition occurs. In view of the broad usefulness of the system disclosed herein, accordingly, it becomes obvious that the specific arrangements described herein are exemplary, and various modifications, adaptations and substitutions of parts may be made without departing from the true spirit and scope of the invention.