Speech-synthesis system
United States Patent 3869575

The present invention relates to speech synthesis system wherein the acoustic message items are optically stored. The system according to the invention comprises: an optical store including a plurality of diffractive elements pertaining to the holographic technique. Each element is constructed for selectively projecting images carrying acoustic information in optical form; each projected image being coverted into electrical speech signals through an image converter tube associated with a loudspeaker. The image selection and its analysis are controlled by adresses supplied from a computer.

Spitz, Erich (Paris, FR)
Rembault, Michel (Paris, FR)
Application Number:
Publication Date:
Filing Date:
Primary Class:
Other Classes:
359/25, 365/106, 704/E13.005
International Classes:
G10L13/04; G10L13/08; G10L15/24; (IPC1-7): G10L1/00; G11C13/04
Field of Search:
179/1SA,1SM,1SP,1VS,1VC,1SF,1SG 340
View Patent Images:
US Patent References:

Other References:

Vitols, V. A., "Hologram Memory," IBM Tech. Disclosure Bulletin, Vol. 8, No. 11, April 1966..
Primary Examiner:
Cooper, William C.
Assistant Examiner:
Kemeny E. S.
Attorney, Agent or Firm:
Cushman, Darby And Cushman
1. Speech synthesis system for synthetizing acoustic messages made of a succession of elementary acoustic signals under the control of information data including addresses and a command signal for each of said acoustic signals; said system comprising:

2. Speech synthesis system as claimed in claim 1, wherein said analyser

3. Speech synthesis system for synthetizing acoustic messages made of a succession of elementary acoustic signals under the control of information data including addresses for each of said acoustic signals; said system comprising:

4. Speech synthesis system for synthetizing acoustic messages made of a succession of elementary acoustic signals under the control of information data including addresses for each of said acoustic signals; said system comprising:

The present invention relates to systems designed to effect synthesis of acoustic messages and more particularly to utilisation in said systems of a fast-access optical store with a high storage capacity.

One of the numerous examples of application of these devices is illustrated by conversion units which enable the electrical data supplied by a computer to be transformed into acoustic signals simulating speech.

The acoustic messages synthesised by the above mentioned units are generally developed from words or phonetic elements previously stored and reconstituted in a predetermined order under the control of a computer program.

In variant embodiments, the vocabulary takes the form of numerical data stored in a direct-access memory of the computer, the conversion unit transforming these data into acoustic signals generated by a multichannel synthesiser of the vocoder type. Unfortunately, the storage capacities of these systems rapidly become prohibitive.

In other embodiments, the elements of the acoustic message are recorded in analog form in order to make them directly accessible to read-out. The data carrier may, for example, be a magnetic drum, a disc or a film. The difficulty of reducing the access time to the message elements, is one of the chief drawbacks of these systems.

The object of the present invention is to overcome these drawbacks by using the holographic technique as a means of storage of the acoustic signals or simply as a means of rapid access to the stored information.

Speech synthesis system for synthesising acoustic messages made of a succession of elementary acoustic signals under the control of information data including adresses for each of said acoustic signals; said synthesis system comprising: holographic storage means including a plurality of elementary fringes patterns arranged side by side on a support, illuminating means controlled by said adresses for directing toward anyone of said fringes patterns a beam of monochromatic radiant energy; image analyser means having an input face positioned for receiving from anyone of said fringes patterns a reconstructed image of said acoustic signals upon illumination thereof by said beam, and electro-acoustical means having an electrical input coupled to the electrical output of said image analyser means for delivering said acoustic messages.

In a better understanding of the invention and to show how the same may be carried into effect reference will be made to the following description and the appended drawings among which:

FIG. 1 is a functional diagram illustrating the various components of a speech-synthesis device in accordance with the invention.

FIGS. 2 and 3 are first and second examples of a system in accordance with the invention.

FIGS. 4 and 5 are explanatory illustrations.

As shown in FIG. 1, a storage block 1 contains in optical form all the information items required for speech synthesis. In accordance with the invention and as described in more detail in relation to the following figures, the storage block 1 comprises a plurality of diffractive elements of holographic type; each diffractive element is designed to reconstitute by projection of an image, an elementary form of the acoustic message and is defined by its position i.e. by its address.

The sequence of information items which, juxtaposed in time, form the desired acoustic message, is constructed from a program supplied by a computer 2.

The addresses produced by the computer 2 control scanning means 3 which are designed to selectively illuminate the various regions of the store 1 which give access to the various information items. When the store 1 receives localised illumination from the scanned beam source 3, it projects an image onto the entry face of an optical-electrical analyser 6; the electrical signal produced by the analyser 6 in response to the received image, is applied to electro-acoustic means comprising an audio amplifier 9 and a loudspeaker 10.

The acoustic information to be stored for the purpose of subsequent reconstitution and processing by the synthesiser of FIG. 1, comes from a sampling operation of those messages which it is desired to synthesise. The samples are, for example, the words of a dictionary.

A more elaborate solution consists in employing more elementary samples such as phonemes, or phonetic elements which are constituted by the association of at least two phonemes.

In order to facilitate the following description, these chosen samples will be phonemes but it should be clearly understood that any other method of sampling could equally well be selected.

Each of the phonemes represents an item of the acoustic message. By known techniques of optical recording of sound, it is stored in the form of an optical signal.

Each optical signal representing a phoneme can be constituted by a track of non-uniform transparency recorded upon a photographic substrate. The optical store in accordance with the invention, is constructed from a set of transparencies carrying for example 30 characteristic phonemes required for the synthesis of the acoustic messages.

FIG. 2 provides a schematic illustration of a first variant embodiment of a synthesiser in accordance with the invention. In accordance with this embodiment, the transparencies representing the phonemes are recorded on a substrate 11 in the form of micro-photographs 20 arranged in row and column fashion. By use of a known technique, these are associated with a matrix 12 of holographic lenses 22 the latter being designed to project the images of the micro-photographs opposite them, onto the window 13 of an optical-electrical analyser 14. The analyser 14 may be a vidicon tube. By successively illuminating the pairs formed in each case by image and holographic lens, it is possible to produce on the window 13 of the analyser 14 the sequence of images containing the information corresponding to the synthesis of the acoustic message. The illumination can be by means of a monochromatic light beam which can be used to successively illuminate the micro-photographs 20 or, again, by means of a matrix of monochromatic light sources. By way of example, the light sources are constituted by photo-emissive diodes connected to a supply source 18 through the medium of switching devices 16 and 17 respectively referenced feed X and feed Y.

The computer 19 supplies lighting commands to the X and Y feeds 16 and 17, in accordance with a previously determined program. For example, if at a given instant, during a synthesis sequence, the phoneme corresponding to the micro-image 20 is to be reconstructed, the computer 19 supplies the command to the feed X 16 to switch to the position b and to the feed Y 17 to switch to the position d; the light radiation 21 emitted by the source whose address corresponds to the coordinates (b, d) illuminates the micro-photograph 20. The corresponding holographic lens 22 then projects the image of the micro-photograph 20 onto the target 13 of the vidicon tube 14. Concurrently with the adresses defined in the illuminated zone of the optical store, the computer 19 supplies to a scanner element 23 a command to scan the image picked up by the target 13. The image must be scanned in accordance with the line of the optical tracks projected onto the target 13; the analyser thus produces an electrical signal 24 which is then amplified by the audio amplifier 25 and supplied to the terminals of a loudspeaker 26.

The acoustic message reconstituted by the loudspeaker 26 will only be intelligible if the number of elementary data and the cadence of transmission of these data, satisfy certain rules fixed by the technique of telecommunications.

In the case of a band of frequencies corresponding to a range of acoustic frequencies extending between 200 and 3000 C/s, and taking a signal-to-noise ratio of the order of 42 dB, the information cadence is close to 47,600 bits per second. On the other hand, it is well known that in current language the mean transmission rate is in the order of 2 words per second. Assuming that each word contains an average of 3 phonemes, the number of bits will be 8,000 per phoneme.

In practice, it is possible to produce holographic lenses which are capable of distinguishing two points which are spaced 3 microns apart. Taking the transverse dimension of a bit as four times this distance, namely 12 microns, this means that on a substrate with a sidelength of 1.08 mm a matrix of (90 × 90) bits can be recorded; this combination produces 8,100 bits. Assuming the light source to be a matrix of electroluminescent gallium arsenide diodes and using as optical-electrical analyser a matrix of photo-electric diodes, the time taken by addressing, transmission and read-out, is in the order of 100 nanoseconds and this gives a theoretical read out cadence very substantially in excess of the aforesaid rate of 47,600 bits per second.

The small size of the micro - image as well as their access time, confer remarkable properties of the store in accordance with the invention.

In the variant embodiment of FIG. 2, described hereinbefore, the matrix of micro-images 11 can be considered as a stage constituting the optical storage plane and the matrix 12 of holographic lenses as a second stage constituting the optical transfer plane.

In accordance with another variant embodiment of the invention, shown in FIG. 3, these two storage and transfer stages are combined into a single stage which performs both functions and is simply constituted by a juxtaposed arrangement of holograms.

The acoustic information is previously processed to form the optical recording of each phoneme; this optical recording then serves to construct a hologram.

A complete store designed in accordance with the invention, contains 30 holograms in the case where the synthesis of the acoustic message is based upon the use of 30 characteristic phonemes. In the case of synthesis by words or by phonetic elements, a store will contain as many holograms as there are words or phonetic elements which it is deemed necessary to obtain correct speech synthesis.

The store 27 in accordance with the invention thus has as many holograms as there are phonemes, distributed in the plane x, y in accordance with well defined addresses. A laser source 28 emitting light radiation, can in particular be utilised to selectively illuminate each of these holograms in order to enable the reconstitution of the data stored therein. The light radiation 25 must strike each of the holograms in the optical storage plane 27, at an angle of incidence which is determined by the conditions under which the hologram is recorded. The translational displacement or deflection of the light beam 29, is supplied by the deflector 30. For example, if, during a sequence of synthesising a given message, the phoneme corresponding to the hologram 31 whose coordinates are (x, y) is to be reconstituted, a computer 19 (similar elements carry the same references as in the preceding figure) supplies an address to the control element 32 of the deflector 30. This address determines the orientation of the entering radiation 29 so that said radiation, after passing through the deflector 30, occupies the position 33 of the exit plane thereof, enabling it to correctly illuminate the hologram 31. The recording of the hologram 31, as also of all the holograms recorded in the storage plane 27, ensure the projection of the data carried by the hologram onto the target 13 of an analysis tube 14. Concurrently with the positioning addresses supplied by the computer 19 to the control element 32 of the deflector 30, the computer 19 triggers a cycle of operation of the scanner device 23 associated with the optical-electrical analyser 14. In the case of a vidicon, this scanning is effected by means of two deflection coils 31 and 35.

The electrical output signal 24 resulting from the analysis of the optical input signal, is amplified by the audio amplifier 25 and applied to the terminals of a loudspeaker 26 which reconstitutes it in an audible form.

As a function of the program supplied from the computer 19, all the data stored in the holograms contained in the storage plane 27, are reconstructed in the order corresponding to the spoken message.

The conditions of intelligibility which have been developed previously in the case of the variant embodiment shown in FIG. 2, are still valid in the case of the variant embodiment of FIG. 3.

FIG. 4 schematically illustrates an optical device for constructing holograms for utilisation in the device of FIG. 3. The hologram-constructing device is designed so that the holograms can project a real image into the read-out equipment. FIG. 4 illustrates two variant utilisations which differ from one another by the direction in which it is chosen to apply the reference wave to the photographic emulsion 111.

Coherent radiation 100 issuing from a laser source (not shown in FIG. 4) is split by a beam splitter 101 into two beams 102 and 103.

The object beam 102 illuminates a diffracting object 104 constituted in the case of the present invention by a transparency upon which there has previously been recorded in the form of a non-uniform transparency pattern, one of the phonemes utilised for the synthesis of the acoustic messages. The object 104, under the action of the beam 102, emits diffracted radiation which is picked up by a lens 106 designed to project the real image A1 B1 of the object AB which coincides with the object plane M0. The light emerging from the lens 106 is received by a portion of the unexposed emulsion 111 carried by the substrate 110. The same portion of the emulsion 111 receives the reference beam 103 via the mirror 126 and the semi-transparent plate 107. Under the action of these two received radiation fractions, a pattern of interference fringes is formed at the surface of the emulsion 111 and recorded there. After development of the emulsion 111, a hologram H1 H2 is obtained which, when illuminated with a read-out beam having the same characteristics as the beam reflected by the plate 107, projects a real image A1 B1 onto the plane Mi. The same result can be obtained if the beam 103 is directed onto that face of the emulsion 111 opposite to the one which receives the light emerging from the lens 106; in this case, the elements 107 and 126 are replaced by the mirrors 112 and 113.

If x, y represent the optical axis of the store described in FIG. 3, the distance d separating the plane Mi from the plane MH of the photographic emulsion, corresponds to the distance separating the storage plane 27 from the target plane 13 of the analysis tube 14. Thus, at the time of reconstitution, the hologram H1 H2 of FIG. 4 centred in relation to the axis O1 O2, will form upon the target of the analysis tube centred in relation to the axis (x, y), a real image which contains all the data carried by the object AB. By a process of recording which is repeated by changing the object and displacing the emulsion 111 and the lens 106, the series of holograms forming the holographic matrix 27 can be produced.

Each of these holograms is designed for projecting a real image which is in all cases centred on the target of the analysis tube.

FIG. 5 schematically illustrates another device for hologram construction, which differs from that of FIG. 4 by the fact that the projected image is a virtual image. To simplify the description, the same references as those used in the preceding figure, have been chosen.

The object 104, taking the form of a transparent substrate containing the image AB representing the phoneme to be stored, is placed in the plane M0 and illuminated by a beam 102 derived from the radiant energy 120 produced by a laser source 121 and previously reflected at an auxiliary mirror 122. Another fraction of the radiant energy 120 is reflected by the semi-transparent mirror 130 in the form of radiant energy 103 constituting the reference wave which interferes at the plane MH with the wave 109 diffracted by the image AB. A transparent substrate 110 carrying a photographic emulsion 111 is placed in this plane. A mask 123 has been provided for limiting the exposure of the emulsion 111 to the zone reserved for formation of the hologram. After the developing of the photographic plate, the hologram H1 H2 centred in relation to the axis O1 O2, contains all the information carried by the object AB. When correctly illuminated by a reconstruction wave, the hologram thus produces all these data in the form of a virtual image located at the distance D1 from the hologram plane.

To the right of the emulsion 111 a lens L has been illustrated which does not form part of the hologram construction device. This lens forms part of the optical system for analysing the images and has been shown in FIG. 5 in order to indicate how the virtual image produced by the hologram on read-out, can be used to form a real image A1 B1 in a plane Mi ; the plane Mi will in this case be the plane of the target 13 of the optical-electrical detector 14.

As in the case of the device of FIG. 4, it is possible on the same substrate to produce the number of holograms corresponding to the number of phonemes which are to be stored. In the machine for manufacturing the holograms, it is merely necessary to displace the mask 123 in order to remove a fresh portion of photographic emulsion each time a new object is recorded.

At the time of reconstruction, the virtual images produced by the hologram are all located in the plane M0 at the distance D1 from the hologram plane. The lens L will convert each of these virtual images into a real image centred in relation to the axis (x, y) on the target of the detector.

In FIG. 5, the reference wave 103 is projected onto the photographic emulsion at the same side as the wave carrying the object, but this reference wave 103 could, by the use of a set of auxiliary mirrors not shown in FIG. 5, be made to illuminate the photographic emulsion on the opposite face, as in the case of FIG. 4.