Title:

United States Patent 3631520

Abstract:

Predictive coding of signals, i.e., the reduction or redundancy in a signal by subtracting from it that part which can be predicted from its past, is a well-known technique for reducing the channel capacity required to transmit a signal with specified fidelity. It has been widely applied to signals, such as television signals which have regularly repeating intervals of information, but has not been satisfactorily applied to signals, such as speech, which exhibit characteristics that vary from speaker to speaker and from time to time for one speaker. According to this invention, an adaptive predictor is employed which is readjusted periodically to match the time-varying characteristics of a speech signal.

Inventors:

ATAL BISHNU S

Application Number:

04/753408

Publication Date:

12/28/1971

Filing Date:

08/19/1968

Export Citation:

Assignee:

BELL TELEPHONE LABORATORIES INC.

Primary Class:

Other Classes:

375/244, 375/250

International Classes:

Field of Search:

325/41,42,38.1 179

View Patent Images:

US Patent References:

3461244 | DELTA MODULATION SYSTEM WITH CONTINUOUSLY VARIABLE COMPANDER | 1969-08-12 | Brolin | |

3305854 | Sampled data system | 1967-02-21 | Witt | |

3236947 | Word code generator | 1966-02-22 | Clapper | |

3026375 | Transmission of quantized signals | 1962-03-20 | Graham | |

2732424 | N/A | 1956-01-24 | Oliver |

Other References:

Linear and Adaptive Delta Modulation J. E. Abate, Proceedings of the IEEE VOl. 55, No. 3, pages 298-308, March, 1967.

Primary Examiner:

Claffy, Kathleen H.

Assistant Examiner:

Leaheeny, Jon Bradford

Claims:

What is claimed is

1. Speech signal processing apparatus, which comprises:

2. Speech signal processing apparatus as defined in claim 1, wherein,

3. Speech signal processing apparatus as defined in claim 1, wherein new parameter signals are developed every 5 milliseconds.

4. Speech signal processing apparatus as defined in claim 1, wherein said means for predicting the present value of said applied speech signal comprises,

5. A communication system for conveying the information content of a speech signal over a channel of relatively small capacity which comprises, in combination:

6. A communication system as defined in claim 5 in further combination with,

7. A communication system as defined in claim 5 wherein said difference signals and said parameter signals are transmitted to said receiver station via diverse transmission facilities.

8. A communication system as defined in claim 5 wherein said parameter signals are scrambled according to a prescribed code for transmission.

9. Apparatus for predicting the present value of a speech signal from its past, which comprises:

10. Apparatus for developing parameter signals for use in the predictive coding of speech signals, which comprises, in combination:

11. Apparatus for reconstructing a speech signal from signals representative of the difference between the present value of said speech signal and a predicted value derived from past pitch period intervals thereof, which comprises,

1. Speech signal processing apparatus, which comprises:

2. Speech signal processing apparatus as defined in claim 1, wherein,

3. Speech signal processing apparatus as defined in claim 1, wherein new parameter signals are developed every 5 milliseconds.

4. Speech signal processing apparatus as defined in claim 1, wherein said means for predicting the present value of said applied speech signal comprises,

5. A communication system for conveying the information content of a speech signal over a channel of relatively small capacity which comprises, in combination:

6. A communication system as defined in claim 5 in further combination with,

7. A communication system as defined in claim 5 wherein said difference signals and said parameter signals are transmitted to said receiver station via diverse transmission facilities.

8. A communication system as defined in claim 5 wherein said parameter signals are scrambled according to a prescribed code for transmission.

9. Apparatus for predicting the present value of a speech signal from its past, which comprises:

10. Apparatus for developing parameter signals for use in the predictive coding of speech signals, which comprises, in combination:

11. Apparatus for reconstructing a speech signal from signals representative of the difference between the present value of said speech signal and a predicted value derived from past pitch period intervals thereof, which comprises,

Description:

BACKGROUND OF THE INVENTION

This invention relates to the efficient encoding of communication signals and to the reduction of the channel capacity required to transmit them. More particularly, it relates to the predictive coding of speech signals. It has for its object a reduction of redundancy in speech signals so that the signals may more economically be transmitted to a receiver station.

1. Field of the Invention

The aim of efficient coding methods is to reduce the channel capacity required to transmit a signal with specified fidelity. To achieve this objective, it is often essential to reduce the redundancy of the transmitted signal. One well-known procedure for reducing the signal redundancy is known as predictive coding. In predictive coding, redundancy is reduced by subtracting from the signal that part which can be predicted from its past. For many signals, the first order entropy of the difference signal is much smaller than the first order entropy of the original signal; thus, the difference signal is better suited to encoding for transmission than the original signal. Predictive coding thus offers a practical way of coding signals efficiently without requiring large capacity storage facilities.

2. Discussion of the Prior Art

One of the principal methods for efficiently encoding communication signals for transmission involves removing inherent signal redundancy through the use of prediction apparatus at both the transmitter and the receiver of a system. The current value of the signal is estimated at both locations by linear prediction based on the previously transmitted signals. The difference between this estimate and the true value of the signal is quantized, coded and transmitted to the receiver. At the receiver, the decoded difference signal is added to the predicted signal to reproduce the input speech signal. So long as a good prediction of the present signal value can be made, efficient coding may take place. However, speech is nonstationary so that a predictor with fixed coefficients fails to predict the value of a speech signal efficiently. For example, the speech signal is approximately periodic during voiced portions; thus, a good prediction of the present value can be based on the value of the signal exactly one period earlier. However, the period varies with time so that the predictor must change with the changing period of the input speech signal. Thus, since speech signal statistics are not constant, it is necessary that the prediction parameters be varied in accordance with the nature of the incoming signal to adapt the predictor to the needs of the signal.

SUMMARY OF THE INVENTION

In accordance with the present invention, redundancy in speech signals is reduced by predicting the present value of the signal from its past and by subtracting the predicted value from the present value. To accommodate the constantly changing character of speech, a form of predictive coding is employed in which the predictor adapts to changing signal conditions. For speech signals, past signal intervals are selected for prediction that are comparable to individual pitch periods. The extent of the period and the magnitude of the signal within the period are, in accordance with the invention, periodically redefined. Preferably, the parameter signals controlling the predictor are changed every 5 milliseconds. Such an interval has been found to be satisfactory for accommodating the time-varying characteristics of the input speech signal. The predictor parameter values are selected to minimize the power in the difference signal averaged over 5-msec. intervals. As such, the predictor parameter values constitute slowly varying signals which can be transmitted efficiently.

To prevent errors that might be introduced in the system as a result of the prediction and subsequent encoding processes from being accumulated, it is in accordance with the invention to reconstruct speech samples at the transmitter and to perform the prediction operation on reconstructed speech signals and not on input speech signals.

The difference between the present value signal and the predicted value of the signal, if any, is eventually encoded and transmitted to a receiver station together with the slowly varying parameter signals which characterize the prediction.

Unlike previously proposed speech coding methods, the predictive coding system of this invention accurately reproduces a speech waveform rather than its spectrum. Listening tests show that there is only slight, often imperceptible, degradation in the quality of speech reproduced after transmission. In addition, experiments indicate that binary difference signals and predictor parameter signals prepared in accordance with the invention together can be transmitted at rates of less than 10 kilobits per second, or several times less than the rate required for ordinary pcm encoding with comparable speech quality.

Since the difference signal developed in accordance with the invention is the result of continuous efficient prediction, it has low or zero intelligibility. It may be used, therefore, together with the signals representative of the parameters of the adaptive predictor, which are themselves unintelligible, to provide secure transmission of speech signals. Only a recipient of the signals with the appropriate decoding apparatus can properly reconstruct the signals. Moreover, as a feature of the invention, the predictor parameter signals may themselves be suitably scrambled for transmission. Only if the appropriate key to scrambling is known can they be properly recovered. Similarly, the difference signal and the parameter signal may be transmitted via independent channels as opposed to being multiplexed for transmission as a composite signal.

DESCRIPTION OF THE DRAWINGS

The invention will be fully apprehended from the following detailed description of a preferred illustrative embodiment thereof taken in connection with the appended drawings in which:

FIG. 1 is a block schematic diagram of a transmitter station which illustrates the principles of the invention;

FIG. 2 is a block schematic diagram of a receiver station constructed in accordance with the principles of the invention;

FIG. 3 is a block schematic diagram of an adaptive predictor suitable for use in the practice of the invention;

FIG. 4 is a block schematic diagram which illustrates the construction of a suitable predictor parameter computer,

FIG. 5 is a block schematic diagram of a suitable arrangement for computing the values of parameter α used for adjusting an adaptive predictor used in the practice of the invention,

FIG. 6 is a block schematic diagram of a transmitter station in accordance with the invention in which the difference signal and the several parameter signals are conveyed to a receiver station by separate transmission facilities, and

FIG. 7 is a block schematic diagram which illustrates the manner in which the parameter signals may be scrambled for transmission.

DETAILED DESCRIPTION

A predictive coding system for speech signals in accordance with the invention, includes: a transmitter (FIG. 1) for converting an input speech signal to a low-bit rate digital signal for transmission to a receiver; a predictor parameter computer (FIG. 4) to calculate the parameters of an adaptive predictor (FIG. 3); and a receiver (FIG. 2) to synthesize a speech signal from the received digital signal.

Transmitter

A block diagram of a transmitter which illustrates the principles of the predictive coding system of the invention is shown in FIG. 1. An input speech signal supplied at an input terminal is initially filtered in conventional low-pass filter 10 and sampled in sampling unit 11. In accordance with the well-known sampling theorem, the sampling rate is twice the cutoff frequency of the filter. A suitable sampling rate for speech signals is 6 kilohertz, so that low-pass filter 10 has a cutoff frequency of 3 kHz. Speech samples from sampler 11 are delayed by an interval of 60 samples (10 msec.) by delay line 12 and delivered to one terminal of differencing network 13, for example, a subtracting network. (Since sampler 11 converts the input speech signal to a sequence of brief samples, i.e., to digital form, it is appropriate to consider the operation of the circuit on a sample-by-sample basis).

A predicted value Z_{N} of the speech sample, obtained by predicting (in network 30) the present value of the signal on the basis, for example, of the value of past samples r_{} N_{-1} , r_{} N_{-} 2,..., is delivered to a second terminal of network 13. The difference, δ_{N}, between S_{N} and Z_{N}, if any, issuing from the network is next supplied to an adjustable gain amplifier 14 and altered in amplitude by a factor Q. The resultant signal is thereupon quantized to one of two levels, for example, in two-level quantizer 15. To provide for the construction of a predicted value of the signal, the signal developed by quantizer 15 is altered in amplitude by a factor 1/Q in amplifier 16 and supplied to one terminal of adder network 17. The predicted value Z_{N} is supplied to the other terminal of adder 17. The sum of these signals, designated r_{} N, forms the reconstructed signal. Note that at the Nth sampling instant, predictor 30 uses only earlier samples (N-1, N-2,...) of the reconstructed signal. The current sample r_{} N of the reconstructed signal is formed after the difference signal δ_{N} is quantized and added to the predicted value Z_{N}.

Adaptive predictor 30, which may be of the form illustrated in FIG. 3, is periodically adapted to accommodate changing signal conditions, for example, in accordance with parameter signals developed in computer 40. Details of a suitable computer are discussed hereinafter with reference to FIG. 4. Predictor parameter computer 40 operates on signal samples supplied directly from sampler 11, and hence on signals in advance of their interaction in the differential operation, since signal S_{N} is delayed in unit 12 for a time sufficient to allow computer 40 to complete its operations. With the form of computer employed in the illustrative embodiment, it has been found that all computer operations can be completed in the time required for approximately 60 samples; signals supplied directly from sampler 11 thus are designated S_{} N_{+60}. Parameter value signals, designated b, K and α, are thus developed to denote selected characteristics of the speech signal on the basis of intervals corresponding nominally to pitch periods of the signal. Parameter signal K represents the duration of a pitch period of the applied signal, and parameter b represents the relative amplitudes of corresponding signal values in contiguous pitch periods used in the prediction operation. Parameter signals α are amplitude factors related to the formant structure of the vocal-tract transmission function and to the spectral envelope of the vocal source. Similarly, computer 40 develops a signal, designated Q representative of the gain of amplifiers 14 and 16 (and their counterpart at the receiver).

The binary signal at the output of quantizer 15, parameter signals for adjusting the predictor, and the signal Q, representative of the gain of amplifiers 14 and 16, thus constitute components of the transmitted signal. They may be combined for transmission to a distant station in any desired manner. For example, the binary signal at the output of quantizer 15 may be supplied directly to multiplexer 18 and the parameter value signals b, K and α, and the signal Q may also be supplied to multiplexer 18 for composite transmission to a receiver station. Alternatively, to achieve a degree of signal security, the several signals may be transmitted via independent channels as shown in FIG. 6. Moreover, additional security may be achieved by scrambling the parameter signals according to a known code prior to transmission. A suitable arrangement is illustrated in FIG. 7. Scramblers suitable for cryptically encoding signals are known to those in the art. It is obvious that scrambled parameter signals and the difference signal may be transmitted to a distant station in any desired fashion, for example, by multiplexing as illustrated in FIG. 1 or by transmission over diverse paths as shown in FIG. 6.

Receiver

A block diagram illustrating the various functions performed by a receiver constructed in accordance with the invention is shown in FIG. 2. Demultiplexer 21 serves to separate the various components of the composite signal received at an input terminal, namely, the quantized difference signal, signals denoting predictor parameters, and a signal representative of the gain of the amplifiers used at the transmitter. The predictor parameters are supplied to adaptive predictor 30', which may be identical in all respects to adaptive predictor 30 at the transmitter. The signal representative of the gain Q is supplied to amplifier 22 and decoded difference signals are delivered to amplifier 22. After being adjusted in gain by the factor 1/Q, the difference signal is added to a predicted value Z_{N} ' of the present value of the signal developed at the receiver, for example, in adder network 23. The reconstructed samples r_{N} ' are delivered to adaptive predictor 30' and also supplied by way of low-pass filter 24 to an output terminal. Low-pass filter 24, which has a cutoff frequency of one-half the sampling rate, smooths the supplied samples to produce an output speech signal r'(t). If there are no digital channel transmission errors, evidently predicted values Z_{N} ' are identical to values Z_{N} predicted at the transmitter, since predictor 30' is adjusted identically to its counterpart 30 at the transmitter. Hence, reconstructed sample r_{} N ' is virtually identical to r_{} N at the transmitter. It is apparent that the error between the reconstructed speech sample r_{} N and the input speech sample S_{N} is identical to the difference δ_{N} "- δ_{N} between the output of amplifier 16 and the input of amplifier 14. Since, on the average, power of samples δ_{N} is much smaller compared to power of samples S_{N}, the quantizing noise power in the reconstructed speech signal is a very small fraction of the power in the input speech signal. Output signal r(t) is thus an extremely close approximation to the signal supplied as an input to the transmitter.

Adaptive Predictor

Two of the main causes of redundancy in speech are (1) quasi-periodicity during voiced segments and (2) lack of flatness of the short-time spectral envelopes. In accordance with the invention, redundancy due to the quasi-periodic nature of speech is reduced by a linear predictor consisting, for example, of a delay and a gain. The z-transform of the predictor is given by

p_{} 1 (z)=bz^{-} K (1)

where z^{-} ^{K} represents a delay of K samples and b is an amplitude factor. For voiced speech, delay K corresponds nominally to a pitch period. The factor b compensates for possible unequal amplitudes of the speech signal during contiguous pitch periods. During the onset of voicing, b is frequently greater than unity; the reverse is the case at the end of a voiced segment. For unvoiced speech sounds, b is ordinarily close to zero.

Redundancy caused by the spectral envelope of speech is reduced, in accordance with the invention, by means of an eighth-order linear predictor. The z-transform of such a predictor is given by

An eighth-order linear predictor has been found to substantially reduce redundancies due to three formants of the vocal-tract transmission function and the spectral envelope of the vocal source.

An adaptive predictor, which is suitable for speech signals, and is in accordance with these considerations, is illustrated in FIG. 3. It consists essentially of two separate linear predictor systems, which exhibit transfer characteristics in accordance with equations (1) and (2), and means for combining them.

Reconstructed signal samples, r_{} n (delivered from adder network 17 of the transmitter, and correspondingly, from adder network 23 of the receiver, are delivered to storage unit 31. This unit is equipped to store a variable digital signal y_{} n for values of n=-120, - 119,..., -1, 0, +1,..., + 29. Thus, it has a storage capability of 130 digits. Of these, the last 30 digits are replaced every 5 msec. Every 5 msec, storage unit 31 is actuated, for example, by a pulse from clock 37, such that the signal in storage location y_{} 29 replaces the signal stored at location y_{-} _{1}, the signal at y_{} 28 replaces the signal at y_{-} _{2}, and so on. Thus, every 5 msec. a new group of samples is advanced into locations y_{-} _{1},..., y_{-} _{120} to constitute a stored sequence of "past" samples. The locations, y_{} 29,..., y_{} 0, are vacated and made available to incoming reconstructed value signals r_{} N for the next 30 sample intervals.

During each 5 msec. interval, the values of y_{} n stored in locations y_{-} _{1},..., y_{-} _{120} are delivered sequentially to arithmetic unit 32 which is equipped to compute value, c_{} 1, in accordance with equation (1 a) for each value of y_{} n. Equation (1 a), viz,

c_{} 1 = by_{} n_{-} K (1 a)

defines an input-output characteristic which corresponds to the form of equation (1), and specifies an output signal c_{1} for each supplied value of y_{} n_{-k}. The necessary factors, b and K, are supplied to arithmetic unit 32 from predictor parameters computer 40 (FIG. 1). The resulting signals are delivered both to arithmetic unit 33 and to arithmetic unit 34.

Arithmetic unit 33 is programmed to develop values of u_{n} in accordance with the relation

u_{} n =r_{} N -c_{} 1. (3)

The momentary value of r_{N} is supplied to arithmetic unit 33 from the input to adaptive predictor 30. Evidently, arithmetic unit 33 comprises a simple subtractor network.

Similarly, storage unit 35 provides digital storage facility for a variable u_{n} for values of n=-8,..., -1, 0, +29. Unit 35 may be a shift register or the like. It is reset every 5 msec. by a pulse, for example, from clock 37, to shift the signals stored in locations 22 through + 29 into the first eight locations and to free the locations 0 through +29 for incoming signals. The vacant storage locations are filled progressively with values of signal u_{n} developed by arithmetic unit 33. During each 5 msec. interval, values of u_{n} stored in unit 35 are delivered to arithmetic unit 36 which is arranged to compute values of c_{2} in accordance with equation (2a), as follows:

Equation (2a) corresponds to the generalized relation of equation (2). In essence, arithmetic unit 36 is a cumulative multiplier network which sums the product of α and u for values of m=1 through m=8 for each value of u_{n} supplied from storage. The necessary amplitude factors α are supplied to unit 36 from predictor parameters computer 40 (FIG. 1). Computed values of c_{2} are delivered to arithmetic unit 34 wherein they are arithmetically added to values of c_{1} supplied by arithmetic unit 32 in accordance with equation (4) as follows:

Z_{} N =c_{} 1 + c_{2}. (4)

Evidently, arithmetic unit 34 may comprise an adder network.

The resulting values of Z_{N} constitute the predicted value of the incoming speech signal sample S_{N} and are delivered, as an output signal, to subtractor network 13 of the transmitter (FIG. 1) and, correspondingly, to adder network 23 of the receiver (FIG. 2). The above-described arithmetic operations are carried on sequentially for each value of n from 0 through 29. In the above discussion, the integer N indicates the count of the current sample of the input signal, i.e., from sampler 12 (FIG. 1), minus 60 samples to take account of the 10 msec. delay. The integer N indicates a corresponding count within each unit. Variables u_{n} and r_{N} are consecutively stored in storage units 81 and 35, respectively, in locations 0 through 29. Every 5 msec., both storage units are reset, as described above, and consecutive samples of r_{N} are again stored in locations 0 through 29 in storage unit 31, and consecutive samples u_{n} are again stored in locations 0 through 29 and storage unit 35.

Predictor Parameters Computer

Parameters for the adaptive predictors at the transmitter and receiver stations are calculated in special computation apparatus which may be of the form illustrated in FIG. 4. Such apparatus develops the predictor parameters necessary to adjust the predictor optimally despite the nonstationary, time-varying character of the input speech signals. Predictor parameters are recalculated every 5 msec. to ensure that prediction is efficient even when the speech characteristics are changing relatively fast.

Input speech samples, S_{N}_{+} 60, from sampler 11 (FIG. 1) are supplied to storage unit 41 which is equipped with sufficient storage capacity to accommodate an array w_{} n in a configuration identical to that described above. Incoming samples are thus stored in the array as w_{-}_{120},..., w_{-}_{1}, w_{} 0,..., w_{29}. The sample at location w_{0} =S_{M}_{+60}, that at location w_{1} =S_{} M_{+61} ,..., and so on through w_{29} =S_{} M_{+89}, where M indicates the sample number of the first sample of the current "frame" of samples, i.e., samples in a 5 msec. group. Storage unit 41 is reset every 5 msec., for example, by a pulse from clock 37 of FIG. 3 (connections not shown for simplicity) such that w_{j} =w_{j}_{+30} for all values of j=-120,..., -1. Accordingly, storage locations w_{0},..., w_{29} are vacated every 5 msec. and used to store the new samples incoming from sampler 11. The set of 30 newly installed samples constitutes a new frame of signals.

Signals from storage unit 41 are supplied in parallel to arithmetic unit 42 wherein computational values χ_{j} according to equation (5), are computed as follows:

Arithmetic unit 42 includes individual computational units, 42a, 42b ,..., 42n, which operate in parallel to compute χ_{j} according to the equation for values of j=15 ,..., 120. A special purpose computer programmed according to the equation to be employed to evaluate these signal values or, alternatively, several individual arithmetic operations, e.g., multiplication, summation, rooting, and division, may be performed serially according to techniques well known in the art.

The computed array of values of χ, i.e., χ_{15},..., χ_{120},..., are supplied in parallel to peak locating network 43 wherein the largest value of χ is determined. Thus, peak locating network 43 finds the value of j such that χ_{j} is the maximum of all values of χ. Networks for picking the "biggest" from among a plurality of signals are well known in the art; a suitable one typically includes a progressively biased diode matrix. The index of the largest selected value of χ is designated K and is supplied as one parameter necessary to adjust the adaptive predictors at the transmitter and receiver locations.

Parameter K is also supplied to B computer 44 and to arithmetic unit 45. Computer 44 is also supplied with the signals w_{n} from storage unit 41. It computes b from these data according to equation (6) as follows:

As before, a special purpose computer or a conventional arrangement of arithmetic units may be employed to evaluate b. The computed value of b similarly constitutes a parameter necessary to adjusting the adaptive predictors. Values of b are also supplied to arithmetic unit 45.

Arithmetic unit 45 is scheduled to develop an array of signal values u_{n} according to equation (7), below, for values of n=0,..., 29.

u_{n} =w_{n} -bw_{n}_{-j}

n=0,..., 29. (7)

Values of signals in the array w_{n} are supplied to arithmetic unit 45 from storage network 41.

The various computations outlined above are carried out serially in the order stated. The suboperations, e.g., the computation of values of χ in arithmetic unit 42, b in computer 44, and u_{n} in arithmetic unit 45, are carried out in parallel circuits within those units.

Every 5 msec. the array of signal values u_{n} is transferred into storage unit 46 to replace the previous arrays of signals in storage. Storage unit 46 thus stores an array of signal values u_{-8}, u_{-7},..., v_{-1}, u_{0},..., u_{29}. Every 5 msec. the values u_{-} _{8},..., u_{-} _{1} are replaced by the valves u_{22},..., u_{29}. The incoming samples are placed in the vacated storage locations u_{0},..., u_{29}. Thus, the signals u_{0},..., u_{29}, are consecutively stored as they are received in storage unit 46.

Periodically, under the influence of clock signals, an array of signal values u_{n} are read out of storage unit 46 and transferred to arithmetic unit 47A. This unit comprises 36 arithmetic units designated f_{1},1 ; f_{1},2,..., f_{1},8 ; f_{2},2 ; f_{2},3,..., f_{2},8 ; f_{3},3,..., f_{8},8, which operate in parallel. Each unit serves to compute one value of f according to equation (8) as follows:

Computations of f_{i},j are carried on simultaneously and the output values, designated F, are periodically supplied to computer 48.

The array of signals u_{n} is also supplied to arithmetic unit 47B wherein an array of values are evaluated according to equation (9) as follows:

Arithmetic unit 47B preferably comprises an array of eight individual units operating in parallel to evaluate the several values of g. The resultant array, g_{1},..., g_{8}, designated G, is delivered every 5 msec. to computer 48.

Computer 48 is programmed to solve the matrix equation

Fα=G (10)

to yield values of α. Although any special purpose computer may be programmed for this evaluation, one suitable arrangement is described below with reference to FIG. 5. Suffice it to say at this point that the output of computer 48 is an array α of signal values a_{1}, a_{2} ,..., a_{8}, which constitutes parameter values necessary for adjusting the adaptive predictors at the transmitter and receiver stations. These signals are thus applied directly to adaptive predictor 30 at the transmitter (FIG. 1), and to multiplexer 18 at the transmitter for delivery to the receiver and adaptive predictor 30'.

Array α is also delivered to Q computer 49. Computer 49 constitutes an arithmetic unit arranged to evaluate values of Q according to the relation

Arithmetic units for obtaining products, summations, differentials, absolute values and so on, are well known to those skilled in the art. Values of Q thus evaluated are used both at the transmitter and at the receiver to set the gains of the several adjustable gain amplifiers used in the predictive networks. At the transmitter, values of signal Qare used to set the gains of amplifiers 14 and 16; at the receiver to set the gain of amplifier 22.

Although the several individual processing steps required to evaluate the various intermediate parameter values take place sequentially in the apparatus of FIG. 4, it is evident that essentially instantaneous processing takes place in the various computational units within each frame interval. The various sets of parameter signals are advanced, one unit to the next, for example, in accordance with pulses from a clock (such as clock 37 in FIG. 3).

The various predictor parameters and gain factor Q are recalculated every 5 msec. These calculated values are held fixed for a duration of 5 msec., the period over which the predictor parameters have been optimized. Due to the 10 msec. delay of incoming signals at the transmitter, the predictor parameters computer calculates the parameters ahead of the time they are needed at the transmitter. The adaptive predictors are reset just before the arrival of the first speech sample of each frame at the transmitter.

Parameter α Computer

Operations sufficient for evaluating α in accordance with equation (10) are described, for example, at pages 145-146 of "Computational Methods of Linear Algebra" by D. K. Feddeev and V. N. Faddena (English translation by R. C. Williams published by W. H. Freeman & Co., San Francisco, 1963. Although conventional operations as described in the literature may be employed, one arrangement that has been found particularly suitable is illustrated in the block schematic diagram of FIG. 5.

In FIG. 5, the array of signals F, representative of values of f developed in arithmetic unit 47A, are supplied, respectively, to arithmetic units 51. The system of arithmetic units operates on the supplied values of f to produce an array of modified functions, designated h_{i},j for values of i=1 ,..., 8, and for values of j= i,..., 8. Values of h are individually stored in storage unit 52. Arithmetic unit 51a, for example, develops a value of h_{1},1 in accordance with equation (12), as follows:

h_{1},1 = f_{1},1 (12) Apparently arithmetic unit 51a comprises a square rooting device. Values of h_{1},2,..., h_{1},8, are evaluated in arithmetic unit 51b in accordance with the relation shown in equation (13), viz,

h_{1},j =f_{1},j /h_{1},1, j=1 ,..., 8. (13)

Evidently, arithmetic unit 51b comprises a plurality of individual units for developing a quotient signal. The necessary value of h_{1},1 is delivered to arithmetic unit 51b from storage unit 52.

In like manner, values of h_{i},j are sequentially derived in arithmetic units 51, progressing from left to right in the drawing, according to the relation: ##SPC1##

Evidently, units shown in the drawing and designated 51c, 51e, 51g, make the evaluations according to equation (14), i.e., for values of h_{2},2, h_{3},3,..., h_{8},8. The remaining evaluations, according to equation (15), are made in units 51d, 51f, and so on.

It is, of course, possible that values of h_{i},j are at times zero. Hence, to avoid any ambiguity in evaluating functions according to equations (14)and (15), it is in accordance with the invention to prescribe an arbitrary rule to accommodate this situation. Any similar rule may, of course, be used. According to the selected rule an increment ε is added to each input for a detected input of zero. As a result, the input always assumes a finite value and the required division operation can take place. A small signal ε, derived for example from battery 53, is supplied to adders 54 in the input circuits of the requisite arithmetic units 51. The magnitude of ε is selected in accordance with the relative signal magnitudes accommodated by units 51, to be insignificant as far as signal evaluation is concerned, but sufficient to avoid the divide-by-zero ambiguity. If desired, switch 55 may be used to open the ε circuit except when zero signal is detected.

The computed values of h are supplied to arithmetic units 56 together with values of G (from arithmetic unit 47B) and functions p_{i} are developed as follows: ##SPC2## The array of values of p, viz, p_{1},..., p_{8}, is stored in storage apparatus 57 and supplied as required to arithmetic units 58 wherein an array of signal values α is developed for values of i=1 ,..., 8 according to the equation ##SPC3## The necessary values of h for this evaluation are supplied from storage unit 52. The resulting array of values α is delivered to storage apparatus 59. Periodically this array is delivered to arithmetic unit 49 for the evaluation of Q and to the adaptive predictors 30 and 30' (FIG. 3).

It is apparent that the apparatus described herein represents merely one suitable manner of carrying out the necessary operations to adaptively predict the values of a speech signal to promote efficient coding for transmission. Numerous alternative techniques may be employed for the evaluation; in fact, many of the operations may be programmed for evaluation by a special purpose computer. Moreover, the signals prepared for transmission may be combined in any desired fashion or, in the alternative, may be transmitted separately to achieve secure transmission of the speech signals.

It is of interest that the quantizing noise appearing at the output of the receiver, as described in this invention, is essentially white in nature (flat spectrum). Frequently it is desirable that quantizing noise have a nonflat spectrum. For example, noise whose spectrum is weighted down at high frequencies may be subjectively less annoying. Any desired noise spectral characteristics can be obtained by employing a suitable preemphasis network before low-pass filter 10 in the transmitter and a deemphasis network after low-pass filter 24 at the receiver. A suitable preemphasis characteristic for speech signals is one which is flat up to about 500 Hz. and rises at about 10 db. per octave between 500 and 300 Hz. It is not necessary that a preemphasis network be used prior to low-pass filtering. It may, for example, be used just after the sampler 11. Similarly, the deemphasis network may be used just prior to low-pass filter 24 in the receiver.

In all events, the above-described arrangements are merely illustrative of the application of the principles of the invention. Numerous arrangements may be devised by those skilled in the art without, however, departing from the spirit and scope of the invention.

This invention relates to the efficient encoding of communication signals and to the reduction of the channel capacity required to transmit them. More particularly, it relates to the predictive coding of speech signals. It has for its object a reduction of redundancy in speech signals so that the signals may more economically be transmitted to a receiver station.

1. Field of the Invention

The aim of efficient coding methods is to reduce the channel capacity required to transmit a signal with specified fidelity. To achieve this objective, it is often essential to reduce the redundancy of the transmitted signal. One well-known procedure for reducing the signal redundancy is known as predictive coding. In predictive coding, redundancy is reduced by subtracting from the signal that part which can be predicted from its past. For many signals, the first order entropy of the difference signal is much smaller than the first order entropy of the original signal; thus, the difference signal is better suited to encoding for transmission than the original signal. Predictive coding thus offers a practical way of coding signals efficiently without requiring large capacity storage facilities.

2. Discussion of the Prior Art

One of the principal methods for efficiently encoding communication signals for transmission involves removing inherent signal redundancy through the use of prediction apparatus at both the transmitter and the receiver of a system. The current value of the signal is estimated at both locations by linear prediction based on the previously transmitted signals. The difference between this estimate and the true value of the signal is quantized, coded and transmitted to the receiver. At the receiver, the decoded difference signal is added to the predicted signal to reproduce the input speech signal. So long as a good prediction of the present signal value can be made, efficient coding may take place. However, speech is nonstationary so that a predictor with fixed coefficients fails to predict the value of a speech signal efficiently. For example, the speech signal is approximately periodic during voiced portions; thus, a good prediction of the present value can be based on the value of the signal exactly one period earlier. However, the period varies with time so that the predictor must change with the changing period of the input speech signal. Thus, since speech signal statistics are not constant, it is necessary that the prediction parameters be varied in accordance with the nature of the incoming signal to adapt the predictor to the needs of the signal.

SUMMARY OF THE INVENTION

In accordance with the present invention, redundancy in speech signals is reduced by predicting the present value of the signal from its past and by subtracting the predicted value from the present value. To accommodate the constantly changing character of speech, a form of predictive coding is employed in which the predictor adapts to changing signal conditions. For speech signals, past signal intervals are selected for prediction that are comparable to individual pitch periods. The extent of the period and the magnitude of the signal within the period are, in accordance with the invention, periodically redefined. Preferably, the parameter signals controlling the predictor are changed every 5 milliseconds. Such an interval has been found to be satisfactory for accommodating the time-varying characteristics of the input speech signal. The predictor parameter values are selected to minimize the power in the difference signal averaged over 5-msec. intervals. As such, the predictor parameter values constitute slowly varying signals which can be transmitted efficiently.

To prevent errors that might be introduced in the system as a result of the prediction and subsequent encoding processes from being accumulated, it is in accordance with the invention to reconstruct speech samples at the transmitter and to perform the prediction operation on reconstructed speech signals and not on input speech signals.

The difference between the present value signal and the predicted value of the signal, if any, is eventually encoded and transmitted to a receiver station together with the slowly varying parameter signals which characterize the prediction.

Unlike previously proposed speech coding methods, the predictive coding system of this invention accurately reproduces a speech waveform rather than its spectrum. Listening tests show that there is only slight, often imperceptible, degradation in the quality of speech reproduced after transmission. In addition, experiments indicate that binary difference signals and predictor parameter signals prepared in accordance with the invention together can be transmitted at rates of less than 10 kilobits per second, or several times less than the rate required for ordinary pcm encoding with comparable speech quality.

Since the difference signal developed in accordance with the invention is the result of continuous efficient prediction, it has low or zero intelligibility. It may be used, therefore, together with the signals representative of the parameters of the adaptive predictor, which are themselves unintelligible, to provide secure transmission of speech signals. Only a recipient of the signals with the appropriate decoding apparatus can properly reconstruct the signals. Moreover, as a feature of the invention, the predictor parameter signals may themselves be suitably scrambled for transmission. Only if the appropriate key to scrambling is known can they be properly recovered. Similarly, the difference signal and the parameter signal may be transmitted via independent channels as opposed to being multiplexed for transmission as a composite signal.

DESCRIPTION OF THE DRAWINGS

The invention will be fully apprehended from the following detailed description of a preferred illustrative embodiment thereof taken in connection with the appended drawings in which:

FIG. 1 is a block schematic diagram of a transmitter station which illustrates the principles of the invention;

FIG. 2 is a block schematic diagram of a receiver station constructed in accordance with the principles of the invention;

FIG. 3 is a block schematic diagram of an adaptive predictor suitable for use in the practice of the invention;

FIG. 4 is a block schematic diagram which illustrates the construction of a suitable predictor parameter computer,

FIG. 5 is a block schematic diagram of a suitable arrangement for computing the values of parameter α used for adjusting an adaptive predictor used in the practice of the invention,

FIG. 6 is a block schematic diagram of a transmitter station in accordance with the invention in which the difference signal and the several parameter signals are conveyed to a receiver station by separate transmission facilities, and

FIG. 7 is a block schematic diagram which illustrates the manner in which the parameter signals may be scrambled for transmission.

DETAILED DESCRIPTION

A predictive coding system for speech signals in accordance with the invention, includes: a transmitter (FIG. 1) for converting an input speech signal to a low-bit rate digital signal for transmission to a receiver; a predictor parameter computer (FIG. 4) to calculate the parameters of an adaptive predictor (FIG. 3); and a receiver (FIG. 2) to synthesize a speech signal from the received digital signal.

Transmitter

A block diagram of a transmitter which illustrates the principles of the predictive coding system of the invention is shown in FIG. 1. An input speech signal supplied at an input terminal is initially filtered in conventional low-pass filter 10 and sampled in sampling unit 11. In accordance with the well-known sampling theorem, the sampling rate is twice the cutoff frequency of the filter. A suitable sampling rate for speech signals is 6 kilohertz, so that low-pass filter 10 has a cutoff frequency of 3 kHz. Speech samples from sampler 11 are delayed by an interval of 60 samples (10 msec.) by delay line 12 and delivered to one terminal of differencing network 13, for example, a subtracting network. (Since sampler 11 converts the input speech signal to a sequence of brief samples, i.e., to digital form, it is appropriate to consider the operation of the circuit on a sample-by-sample basis).

A predicted value Z

Adaptive predictor 30, which may be of the form illustrated in FIG. 3, is periodically adapted to accommodate changing signal conditions, for example, in accordance with parameter signals developed in computer 40. Details of a suitable computer are discussed hereinafter with reference to FIG. 4. Predictor parameter computer 40 operates on signal samples supplied directly from sampler 11, and hence on signals in advance of their interaction in the differential operation, since signal S

The binary signal at the output of quantizer 15, parameter signals for adjusting the predictor, and the signal Q, representative of the gain of amplifiers 14 and 16, thus constitute components of the transmitted signal. They may be combined for transmission to a distant station in any desired manner. For example, the binary signal at the output of quantizer 15 may be supplied directly to multiplexer 18 and the parameter value signals b, K and α, and the signal Q may also be supplied to multiplexer 18 for composite transmission to a receiver station. Alternatively, to achieve a degree of signal security, the several signals may be transmitted via independent channels as shown in FIG. 6. Moreover, additional security may be achieved by scrambling the parameter signals according to a known code prior to transmission. A suitable arrangement is illustrated in FIG. 7. Scramblers suitable for cryptically encoding signals are known to those in the art. It is obvious that scrambled parameter signals and the difference signal may be transmitted to a distant station in any desired fashion, for example, by multiplexing as illustrated in FIG. 1 or by transmission over diverse paths as shown in FIG. 6.

Receiver

A block diagram illustrating the various functions performed by a receiver constructed in accordance with the invention is shown in FIG. 2. Demultiplexer 21 serves to separate the various components of the composite signal received at an input terminal, namely, the quantized difference signal, signals denoting predictor parameters, and a signal representative of the gain of the amplifiers used at the transmitter. The predictor parameters are supplied to adaptive predictor 30', which may be identical in all respects to adaptive predictor 30 at the transmitter. The signal representative of the gain Q is supplied to amplifier 22 and decoded difference signals are delivered to amplifier 22. After being adjusted in gain by the factor 1/Q, the difference signal is added to a predicted value Z

Adaptive Predictor

Two of the main causes of redundancy in speech are (1) quasi-periodicity during voiced segments and (2) lack of flatness of the short-time spectral envelopes. In accordance with the invention, redundancy due to the quasi-periodic nature of speech is reduced by a linear predictor consisting, for example, of a delay and a gain. The z-transform of the predictor is given by

p

where z

Redundancy caused by the spectral envelope of speech is reduced, in accordance with the invention, by means of an eighth-order linear predictor. The z-transform of such a predictor is given by

An eighth-order linear predictor has been found to substantially reduce redundancies due to three formants of the vocal-tract transmission function and the spectral envelope of the vocal source.

An adaptive predictor, which is suitable for speech signals, and is in accordance with these considerations, is illustrated in FIG. 3. It consists essentially of two separate linear predictor systems, which exhibit transfer characteristics in accordance with equations (1) and (2), and means for combining them.

Reconstructed signal samples, r

During each 5 msec. interval, the values of y

c

defines an input-output characteristic which corresponds to the form of equation (1), and specifies an output signal c

Arithmetic unit 33 is programmed to develop values of u

u

The momentary value of r

Similarly, storage unit 35 provides digital storage facility for a variable u

Equation (2a) corresponds to the generalized relation of equation (2). In essence, arithmetic unit 36 is a cumulative multiplier network which sums the product of α and u for values of m=1 through m=8 for each value of u

Z

Evidently, arithmetic unit 34 may comprise an adder network.

The resulting values of Z

Predictor Parameters Computer

Parameters for the adaptive predictors at the transmitter and receiver stations are calculated in special computation apparatus which may be of the form illustrated in FIG. 4. Such apparatus develops the predictor parameters necessary to adjust the predictor optimally despite the nonstationary, time-varying character of the input speech signals. Predictor parameters are recalculated every 5 msec. to ensure that prediction is efficient even when the speech characteristics are changing relatively fast.

Input speech samples, S

Signals from storage unit 41 are supplied in parallel to arithmetic unit 42 wherein computational values χ

Arithmetic unit 42 includes individual computational units, 42a, 42b ,..., 42n, which operate in parallel to compute χ

The computed array of values of χ, i.e., χ

Parameter K is also supplied to B computer 44 and to arithmetic unit 45. Computer 44 is also supplied with the signals w

As before, a special purpose computer or a conventional arrangement of arithmetic units may be employed to evaluate b. The computed value of b similarly constitutes a parameter necessary to adjusting the adaptive predictors. Values of b are also supplied to arithmetic unit 45.

Arithmetic unit 45 is scheduled to develop an array of signal values u

u

n=0,..., 29. (7)

Values of signals in the array w

The various computations outlined above are carried out serially in the order stated. The suboperations, e.g., the computation of values of χ in arithmetic unit 42, b in computer 44, and u

Every 5 msec. the array of signal values u

Periodically, under the influence of clock signals, an array of signal values u

Computations of f

The array of signals u

Arithmetic unit 47B preferably comprises an array of eight individual units operating in parallel to evaluate the several values of g. The resultant array, g

Computer 48 is programmed to solve the matrix equation

Fα=G (10)

to yield values of α. Although any special purpose computer may be programmed for this evaluation, one suitable arrangement is described below with reference to FIG. 5. Suffice it to say at this point that the output of computer 48 is an array α of signal values a

Array α is also delivered to Q computer 49. Computer 49 constitutes an arithmetic unit arranged to evaluate values of Q according to the relation

Arithmetic units for obtaining products, summations, differentials, absolute values and so on, are well known to those skilled in the art. Values of Q thus evaluated are used both at the transmitter and at the receiver to set the gains of the several adjustable gain amplifiers used in the predictive networks. At the transmitter, values of signal Qare used to set the gains of amplifiers 14 and 16; at the receiver to set the gain of amplifier 22.

Although the several individual processing steps required to evaluate the various intermediate parameter values take place sequentially in the apparatus of FIG. 4, it is evident that essentially instantaneous processing takes place in the various computational units within each frame interval. The various sets of parameter signals are advanced, one unit to the next, for example, in accordance with pulses from a clock (such as clock 37 in FIG. 3).

The various predictor parameters and gain factor Q are recalculated every 5 msec. These calculated values are held fixed for a duration of 5 msec., the period over which the predictor parameters have been optimized. Due to the 10 msec. delay of incoming signals at the transmitter, the predictor parameters computer calculates the parameters ahead of the time they are needed at the transmitter. The adaptive predictors are reset just before the arrival of the first speech sample of each frame at the transmitter.

Parameter α Computer

Operations sufficient for evaluating α in accordance with equation (10) are described, for example, at pages 145-146 of "Computational Methods of Linear Algebra" by D. K. Feddeev and V. N. Faddena (English translation by R. C. Williams published by W. H. Freeman & Co., San Francisco, 1963. Although conventional operations as described in the literature may be employed, one arrangement that has been found particularly suitable is illustrated in the block schematic diagram of FIG. 5.

In FIG. 5, the array of signals F, representative of values of f developed in arithmetic unit 47A, are supplied, respectively, to arithmetic units 51. The system of arithmetic units operates on the supplied values of f to produce an array of modified functions, designated h

h

h

Evidently, arithmetic unit 51b comprises a plurality of individual units for developing a quotient signal. The necessary value of h

In like manner, values of h

Evidently, units shown in the drawing and designated 51c, 51e, 51g, make the evaluations according to equation (14), i.e., for values of h

It is, of course, possible that values of h

The computed values of h are supplied to arithmetic units 56 together with values of G (from arithmetic unit 47B) and functions p

It is apparent that the apparatus described herein represents merely one suitable manner of carrying out the necessary operations to adaptively predict the values of a speech signal to promote efficient coding for transmission. Numerous alternative techniques may be employed for the evaluation; in fact, many of the operations may be programmed for evaluation by a special purpose computer. Moreover, the signals prepared for transmission may be combined in any desired fashion or, in the alternative, may be transmitted separately to achieve secure transmission of the speech signals.

It is of interest that the quantizing noise appearing at the output of the receiver, as described in this invention, is essentially white in nature (flat spectrum). Frequently it is desirable that quantizing noise have a nonflat spectrum. For example, noise whose spectrum is weighted down at high frequencies may be subjectively less annoying. Any desired noise spectral characteristics can be obtained by employing a suitable preemphasis network before low-pass filter 10 in the transmitter and a deemphasis network after low-pass filter 24 at the receiver. A suitable preemphasis characteristic for speech signals is one which is flat up to about 500 Hz. and rises at about 10 db. per octave between 500 and 300 Hz. It is not necessary that a preemphasis network be used prior to low-pass filtering. It may, for example, be used just after the sampler 11. Similarly, the deemphasis network may be used just prior to low-pass filter 24 in the receiver.

In all events, the above-described arrangements are merely illustrative of the application of the principles of the invention. Numerous arrangements may be devised by those skilled in the art without, however, departing from the spirit and scope of the invention.