United States Patent 3715512

In an adaptive, predictive coder for speech signals, the transmitted signal generally consists of an rms value, a pitch signal, a voice-unvoiced indication, and a number of parameter signals for adjusting the coefficients of a linear predictor. Transmission of these signals is improved in this invention by generating a low rate pulsive signal and by shaping its spectrum in accordance with the parameter signals. The pulsive signals thus act as a carrier for the parameters. The bandwidth required for transmitting the resulting composite signal, i.e., the modulated pulsive signal and the subsidiary signals, is substantially less than that of the original speech signal and somewhat less than that required for the transmission of predictively coded signals.

Application Number:
Publication Date:
Filing Date:
Primary Class:
Other Classes:
704/207, 704/208
International Classes:
G10L19/04; (IPC1-7): G10L1/06
Field of Search:
179/1SA,15.55R 324
View Patent Images:
US Patent References:

Other References:

Atal & Hanaver, Speech Analysis and Synthesis by Linear Prediction of the Speech Wave, JASA, 1971 pp. 637-655..
Primary Examiner:
Blakeslee, Ralph D.
Assistant Examiner:
Leaheey, Jon Bradford
What is claimed is

1. A band compression system for the transmission of speech signals, which comprises:

2. A band compression system for the transmission of speech signals, which comprises:

3. A speech signal coder for developing reduced bandwidth signals for transmission, which comprises:

4. A speech signal coder as defined in claim 3, wherein,

5. A speech signal coder as defined in claim 3, wherein,

6. A speech signal coder as defined in claim 3, wherein said synthesizer comprises:

7. In combination with the speech coder as defined in claim 3,

8. An adaptive predictive speech signal coding system, which comprises:

This invention relates in general to the modification of signals to facilitate their transmission, and particularly to the reduction of their information rates and, hence, to the compression of their frequency bands. Its principal object is to compress the band of frequencies occupied by a speech message wave. A more general object is to apply new principles to the band compression or other modification of a speech message wave.


The aim of efficient coding methods is to reduce the channel capacity required to transmit a signal with specified fidelity. To achieve this objective, it is often essential to reduce, so far as possible, redundancies in the transmitted signal. One well-known procedure for reducing redundancy is predictive coding. In predictive coding, that part of an applied signal that can be predicted from its past is subtracted from the signal to leave only an error, or difference signal, for transmission. For many signals, the spectrum of the difference signal is much flatter than the spectrum of the original signal. Thus, the difference signal is better suited to memoryless encoding than the original.


Predictive coding of speech signals avoids many of the difficulties encountered with conventional vocoders and vocoder-like devices. Because of the rapidly changing structure of a speech signal, however, conventional linear prediction arrangements, often satisfactory for video coding, fail adequately to track a speech signal. This difficulty is overcome in an adaptive predictive coding arrangement for speech signals described in a copending application of B. S. Atal, Ser. No. 753,408, filed Aug. 19, 1968, now U.S. Pat. No. 3,631,520, granted Dec. 28, 1971. According to the Atal arrangement, the predictable portion of the speech signal is identified and, as in a conventional predictive coding arrangement, the remainder is treated as a difference signal for transmission. Because of the nonstationary nature of speech signals, however, the Atal system employs an adaptive predictor which is readjusted periodically to match the time varying characteristics of the speech signal. Such an adaptive system has been shown to afford substantial transmission economies and to produce high quality synthesized speech.

Moreover, the parameter signals themselves have been found to represent adequately the spectral envelope shape of an input speech wave. Thus, in B. S. Atal U.S. Pat. No. 3,624,302, granted Nov. 30, 1971, a linear predictor vocoder system is described which is capable of producing high quality synthetic speech on the basis of predictor parameters and certain other speech signal values. A pitch detector is used, in accordance with the invention of that patent, as in a channel vocoder, to determine either the fundamental frequency of the voice wave or the absence of such a fundamental frequency. Speech signal parameters are continuously developed at a transmitter station, using the constraint that the applied speech wave at any instant is a weighted form of its past values. An Nth order linear predictor is adjusted to produce a minimum mean squared error signal. The predictor is basically a filter that has a spectral response that is a good match to the inverse of the spectral envelope of the input speech wave. The N predictor coefficients, together with the pitch frequency and the rms value of the input speech constitute the analyzer output signal, which may be transmitted to a receiver station. The receiver station employs a recursive filter in a feedback loop fed either by pulses at the pitch rate or by noise. The filter imposes a spectral envelope shape on the pitch pulses or noise to produce a signal which is a close match to the spectral envelope of the input speech. Amplifier gain in the system is adjusted to produce the correct rms value in the synthetic speech.

Such a predictive coding system permits efficient digital encoding of speech signals. Yet, it is often desirable to utilize analog facilities for the transmission of encoded speech, a task not readily possible with a predictive coding arrangement.

It is, therefore, an object of this invention, to achieve analog bandwidth compression and transmission economies in a linear prediction vocoder system.


In accordance with this invention, a linear prediction vocoder analyzer is used to determine the spectral envelope of a speech signal, and an auxiliary spectral envelope signal is thereupon generated which is a frequency compressed value of the original spectral envelope. In essence, an analog signal is developed for transmission which contains all of the information needed to control a linear prediction vocoder synthesizer, for example, of the type described by Atal in U.S. Pat. No. 3,624,302, but which requires appreciably less bandwidth for transmission than either the original speech wave or prediction vocoder signals.

According to the invention, a coded signal for transmission is developed by generating a train of low rate pulsive signals and shaping the spectral envelope of the train in accordance with parameter signals which represent the spectral envelope of the speech signals. By employing a sampling rate in this spectral shaping process which is substantially lower than the rate of sampling of the input wave, the spectrum imposed on the low rate pulses is frequency compressed by a factor equal to the ratio of these two rates, The amplitudes of the pulses are thus varied to produce spectrally shaped pulses with the same rms values as the input speech. Preferably, the sampled data signal is converted into analog form prior to transmission. It may be combined with the pitch and voicing characteristic signals, for example, in a multiplexor, prior to transmission.

At the receiver, the inverse process is employed. Prediction parameters are recovered from the shaped pulsive signals and used, together with the pitch and voicing characteristic signals to synthesize a close replica of the original speech signal.


The invention will be fully apprehended from the following description of a preferred illustrative embodiment thereof, taken in connection with the appended drawing in which:

FIG. 1 is a block schematic diagram of an adaptive predictive speech signal coding arrangement which illustrates the principles of the invention;

FIG. 2 illustrates a linear predictor suitable for use in the practice of the invention; and

FIG. 3 is a block schematic drawing of a predictive speech signal decoder and synthesizer which may be used in the practice of the invention.


A transmitter station for developing narrow bandwidth signals for each of a number of voice channels is illustrated in the block diagram of FIG. 1. The station may include a number of individual speech signal codes 101 through 10N, each supplied with speech frequency signals, e.g., from transducers 111 through 11N, and each delivering its coded output signal S to a transmission coder 12, of any desired construction. Coder 12 may be a multiplex unit, or the like, for combining the several supplied signals prior to transmission.

Each of coders 10 may be substantially identical in construction, with one 101, being illustrated in detail. Speech signals from transducer 111 are passed through filter 13, which typically has a cut-off frequency in the neighborhood of 4 kHz. The resultant signal is delivered to an analyzer which operates at a sampling rate of 8 kHz. Hence, the speech signal is sampled in unit 14 at a frequency of approximately 8 kHz under control of pulses from clock 15. Samples of the input signal are delivered to rms detector 16, predictor parameter computer 17, and voicing detector 18. In response, rms detector 16 develops a signal which characterizes the short-time energy of the signal, and voicing detector 18 develops two signals, namely, a signal P which denotes the pitch frequency or period of the input signal, and a voicing character signal V-UV which denotes whether the applied signal is voiced or unvoiced. Prediction parameter computer 17 develops a set of parameter values A=a1, . . . , aN, for each pitch period which implicitly describe the spectral envelope of the input speech signal below 4 kHz.

The construction, function, and mode of operation of the processing elements of the analyzer described hereinabove are disclosed in detail in the above-cited U.S. Pat. No. 3,624,302 to Atal. As in the Atal system, the pitch frequency signal P and the voice-unvoiced signal V-UV are delivered to multiplexor 19 for eventual transmission. Unlike Atal, however, the rms signal and the several parameter signals are delivered, in accordance with this invention, to a linear prediction synthesizer for further processing prior to transmission.

According, pulse generator 20 is employed to develop pulses at a frequency proportional to the sampling frequencies of the input signal. Suitable pulses are conveniently developed by dividing pulses from clock 15 by a factor C in divider network 21. For example, if samples of the input signal are developed at an 8 kHz rate, and pulse generator 20 is adjusted to issue pulses at a 50 Hz rate, divider 21 is equipped with a divisor factor C=8 kHz/50 Hz=160. The resulting low rate pulses from generator 20 are supplied to amplifier 22, whose gain is controlled by signals α from computer 23, to produce a signal at the output of combiner 24 whose rms value varies in accordance with the rms value of the input speech. The variable level pulses are added in combiner 24 to predicted values of the spectrally shaped pulses developed in predictor 25. The resulting signal, i.e., the sum of the input signal to combiner 24 and the predicted value of it formed in predictor 25, constitutes the compressed bandwidth output signal.

Since the sampling rate used for spectral shaping is a factor M (analyzer sampling rate/synthesizer sampling rate) less than the sampling rate of applied signals, the spectrum imposed on the low rate pulses from generator 20 in the synthesizer is frequency compressed by a factor M over the input speech spectral envelope. For analog transmission, this signal is passed through a digital to analog converter, for example, low pass filter 26. Filter 26 typically exhibits a low frequency cut-off of 4/M kHz. The resulting analog signal, fL, is delivered to multiplex unit 19 where it is combined with the pitch and voicing signals to produce a compressed bandwidth signal S1 for transmission.

Computer 23 may comprise an arithmetic processor using conventional analog circuit techniques or may comprise a special purpose computer or subroutine for evaluating certain arithmetic expressions. Its function is to control the amplitude of pulses produced at the lower rate which are used to synthesize an analog signal, narrower in bandwidth than the input signal, but containing all of the information needed for proper synthesis of the input signal. In essence, spectral shaping of pulses takes place in a linear predictor system (predictor 25 and combiner 24), which behaves as a linear system with a controlled impulse response. In this system, the input is known and the quiescent response is known. It is necessary, therefore, only to adjust the input to produce the desired output, a signal whose energy is equivalent to the mean-squared value of the input signal. The initial conditions of the linear system are taken into account by employing the principle of superposition, i.e., the output of the system is equal to the sum of the components due to the input signal and to the initial conditions of the system. Thus, if the output EO of the system, due to the passage of an input signal pulse through it, is Z(t), and the output of the system, due to the initial conditions is Y(t), then

E0 = < α Z(t) + Y(t)2 > ,

where α represents the amplitude of pulsive signals for each set of parameter values A, established for the analyzer. It is, thus, a simple arithmetic procedure to solve Equation (1) for α.

More rigorously, the processing of computer 23 takes place for each frame of K samples of the output signal which spans the interval between changes in the values of parameters A. To compute α, it is necessary that

where L represents the sample number at the beginning of a frame, e.g., a 20 ms interval, of K samples, and xi represents the compressed bandwidth signal. In the illustrated example, K = C/M = 160/M. Signal Xi is then represented by

where ei = α for i=L.

ei = 0 for i=L+1, L+2, . . ., L+K.

If Yi represents the response of the linear predictor to the last N output samples in the previous frame of K samples, at a time when there is a zero input to the predictor, the ith sample of the compressed signal is

Xi = Yi + α Zi , (4)

where Zi represents the response of the linear predictor to an impulse of unit magnitude and with an initial condition of zero. Therefore,


Yi-j = Xi-j for i- j < L.

Under these conditions, the rms value of the output signal must be

Since Yi, Z i and Erms are known, the quadratic equation yields two values of α, of which the largest value is selected for use.

The computed value of a α for each set of parameter values A is thus supplied to control the gain of amplifier 22 and, hence, to control the magnitude of the pulse from generator 20 during the entire frame.

A suitable linear predictor, e.g., for use as element 25 in the system of FIG. 1, is illustrated in FIG. 2. It employs a transversal filter which includes a tapped delay line, a plurality of associated amplifiers and a summing network. Signal from combiner 24 are delivered to elements 271, 272, . . ., 27N of the delay line and the intermediate delayed signals are passed through amplifiers 281, 282, . . ., 28N, respectively. Amplifiers 28 are adjusted by parameter signals A from computer 17 and the resulting adjusted signals are summed in unit 29. The summed output signal represents a predicted value of the incoming signal based on a plurality of its past values. This signal is delivered to one input of combiner 24. A frame of past samples values X1, . . ., xN, are available at the intermediate points of delay line 27. These sample values are delivered to computer 23 (FIG. 1).

FIG. 3 illustrates a receiver station, in accordance with the invention, which includes a number of separate speech signal decoders 30. Each decoder serves to analyze the supplied coded signal received from the transmitter and to develop therefrom the parameter values needed for synthesis. The composite signal input S is initially separated in transmission decoder 31 into the several channel signals S1 through SN. Each channel signal is then applied to the corresponding decoder wherein a replica of the input speech is developed and delivered, for example, to as associated loudspeaker 32.

One of the decoders, namely 301, is shown in detail. Channel signal S1 is delivered to demultiplexer 33 to separate the several parameter values from one another. Pitch signal P is delivered to a synthesizer, operating at an 8 kHz rate, and is used to control the generation of periodic pulses in generator 34. The voicing characteristic signal V-UV is employed in the synthesizer to actuate selector 35. Pitch period pulses from generator 34 are amplified in controlled amplifier 36 in accordance with the value of control signal α', and supplied to one input of selector 35. This variable amplitude pulse signal constitutes a voiced excitation signal for use in the synthesis operation. Similarly, unvoiced excitation is produced in noise generator 37. Generator 37 typically produces a sequence of random pulses that are controlled in amplitude in amplifier 38 under control of α'. Depending upon the voicing character of the speech signal, selector 35 delivers voiced energy from amplifier 36 or unvoiced energy from amplifier 38 to combining network 39.

The compressed bandwidth signal fL developed at the transmitter station is supplied from demultiplexer 33 to an analyzer similar to the analyzer at the transmitter station. However, the receiver analyzer operates at a sampling rate of 8/M kHz. In the analyzer, signals from demultiplexor 33 are delivered to sampler 40 and the resulting samples fs are delivered to prediction parameter computer 41, and to rms detector 42. Computer 41, of essentially the same construction as computer 17 at the transmitter station, reestablishes parameter values A = a1, . . ., aN. These values are delivered to α' computer 43 and to linear predictor 44. Detector 42, essentially identical to detector 16 at the transmitter station, delivers the rms value of the compressed bandwidth signal to computer 43. Computer 43 is also supplied with an array X = x1, . . ., xN of samples of the past frame of signals from predictor 44.

As with computer 23 at the transmitter, computer 43 develops a control signal α' for use in varying the amplitudes of both voiced and unvoiced synthesizer excitation signals, in amplifier 36 and 38, respectively. Computer 43 evaluates the same equation as α computer 23 at the transmitter except that the value of K in Equation (2) is not constant at 160 /M. Instead, it is equal to the pitch period P divided by 125 MS or 1/8 kHz. Otherwise, computation with the supplied values is identical to that carried out at the transmitter.

Linear predictor 44 is supplied with output signals from combiner 39 and develops a signal proportional to the predictable portion of the output signal. Amplifier settings in predictor 44 are changed in accordance with parameter values A from computer 41 during both voiced and unvoiced speech intervals. Gate 48 is thus used to determine the precise instants in time at which the amplifier settings in predictor 44 are changed in accordance with parameter values A from computer 41. It allows the amplifier settings of predictor 44 to be updated to the current values of parameter values A of computer 41 whenever a pulse is present at the output of selector 46.

Selector 46, under control of the voiced-unvoiced signal, selects the output of 50 Hz clock 47 during unvoiced intervals, or selects the output of pulse generator 34 during voiced intervals. Since predictor parameter computer 41 determines a new set of parameter values at a 50 Hz rate (corresponding to the rate at which prediction parameters are changed in the synthesizer at the receiver), the amplifier settings of predictor 44 are effectively changed at the occurrence of only some pitch pulses. For example, if the pitch frequency is 100 Hz, then the amplifier settings of predictor 44 are changed for every other pitch pulse. Preferably, these amplifier settings are changed for every pitch pulse. This may be done conveniently by interpolating between consecutive sets of parameter values A. The interpolation arrangement described by Atal in his aforementioned patent may be used for this purpose.

The predictor output signal from unit 44 is returned to combiner 39 where it is added to the voiced or unvoiced excitation signal from selector 35 in conventional fashion. The output signal from the combiner is also delivered to a digital-to-analog converter; e.g., to low-pass filter 45, typically with a cut-off frequency of about 4 kHz. The resulting analog signal is delivered to loudspeaker 321.

Speech signal decoder 301 thus serves essentially to generate parameter values for synthesis by means of an analysis process and, thereupon, to synthesize a signal on the basis of local excitation, a pitch signal, and supplied compressed bandwidth information.