Speech recognition apparatus
United States Patent 6061652
US Patent References:
Speech recognition apparatus
Shikano - February, 1986 - 4570232

Speech recognition method
Kuroda et al. - May, 1989 - 4829577

Speech recognition apparatus using neural network and fuzzy logic
Amano - August, 1991 - 5040215

Speech recognition method
Nishimura - September, 1991 - 5050215

Method of recognizing coherently spoken words
Ney et al. - October, 1991 - 5058166


Inventors:
Tsuboka, Eiichi (Neyagawa, JP)
Nakahashi, Junichi (Nara, JP)
Application Number:
08/596305
Publication Date:
05/09/2000
Filing Date:
06/18/1996
View Patent Images:
Assignee:
Matsushita Electric Industrial Co., Ltd. (Osaka, JP)
Primary Class:
Other Classes:
382/225, 704/E15.028
International Classes:
G10L15/14; G10L15/00; G10L5/06; G10L9/00
Field of Search:
395/2.54, 382/225
US Patent References:
5129002Pattern recognition apparatusJuly, 1992Tsuboka381/43
5263120Adaptive fast fuzzy clustering systemNovember, 1993Bickel704/245
5307444Voice analyzing system using hidden Markov model and having plural neural network predictorsApril, 1994Tsuboka704/256
5608841Method and apparatus for pattern recognition employing the hidden Markov modelMarch, 1997Tsuboka395/2.65
Foreign References:
JP63250698October, 1988
Other References:
E. Tsuboka et al., "On the Multiplication Type FVQ/HMM", Technical Report of IEICE, SP-93-27 Jun. 1993), pp. 25-31.
Japanese language search report dated Sep. 19, 1995.
English translation of Japanese language search report.
El Gamal A., et al, Using Simulated Annealing to Design Good Codes,
Primary Examiner:
Hudspeth, David R.
Assistant Examiner:
Sax, Robert Louis
Attorney, Agent or Firm:
Ratner & Prestia
Claims:
1. 1. A speech recognition apparatus where a system that is an object ofanalysis is provided with a plurality of states, comprising:PA1 a code book in which a feature vector space is clustered and representativevectors of respective clusters are stored in a form searchable by labelsthereof;PA1 cluster occurrence probability storing means for storing occurrenceprobabilities of the respective labels in the respective states;PA1 membership degree calculating means for calculating membership degrees ofobservation vectors to the respective clusters by using the code book; andPA1 observation vector occurrence degree calculating means for calculating asum of products ofPA2 (a) the calculated membership degrees of the observation vectors to therespective clusters andPA2 (b) logarithmic values of the occurrence probabilities of the respectiveclusters stored in the cluster occurrence probability storing means, or anamount equivalent thereto,PA2 wherein the sum of products corresponds to occurrence degrees ofobservation vectors in the respective states of the system, andPA2 the cluster occurrence probability storing means calculates the occurrenceprobabilities of the clusters such that, for a predetermined value N, theoccurrence probabilities with respect to the clusters in which theoccurrence probabilities at an (N+1)-th order or lower are rendered anon-zero common value, and a total sum of the occurrence probabilities ofthe clusters is 1.NUM 2.PAR 2. The speech recognition apparatus according to claim 1, wherein themembership degree calculating means calculates the membership degrees suchthat the membership degrees with respect to the clusters in which themembership degrees are at a K+1 order or lower are rendered null and atotal sum of the membership degrees is 1.NUM 3.PAR 3. A speech recognition apparatus where a system that is an object ofanalysis is provided with a plurality of states, comprising a code book inwhich a feature vector space is clustered and representative vectors ofrespective clusters are stored in a form searchable by labels thereof, acluster occurrence probability storing means for storing occurrenceprobabilities of the respective labels in the respective states, amembership degree calculating means for calculating membership degrees ofobservation vectors to the respective clusters by using the code book andan observation vector occurrence degree calculating means for calculatinga product sum of logarithmic values of the calculated membership degreesof the observation vectors to the respective clusters and the occurrenceprobabilities of the respective clusters stored in the cluster occurrenceprobability storing means or an amount equivalent thereto and calculatingoccurrence degrees of the observation vectors in the respective states ofthe system.NUM 4.PAR 4. The speech recognition apparatus according to claim 3, wherein thecluster occurrence probability storing means in each state stores theoccurrence probabilities calculated such that with regard to apredetermined N a total sum of the probabilities becomes 1 with respect tothe clusters having higher orders of up to N with regard to the occurrenceprobabilities and the occurrence probabilities of other of the clustersare rendered 0.NUM 5.PAR 5. The speech recognition apparatus according to claim 3, wherein themembership degree calculating means calculates the membership degrees ofthe observation vectors to the respective clusters such that themembership degrees of the clusters having the membership degrees at a K+1order or lower are rendered a common value that is not null and a totalsum of the membership degrees is rendered 1.NUM 6.PAR 6. The speech recognition apparatus according to claim 1 or claim 3,wherein the respective states are respective states of Hideen MarkovModel.NUM 7.PAR 7. A speech recognition apparatus according to claim 1, whereinPA1 estimation of the occurrence probabilities of the respective clusters inthe respective states is calculated by using the observation vectoroccurrence degree calculating means, andPA1 at recognition time the membership degrees of the observation vectors arecalculated such that a maximum membership degree is rendered 1 and all theother membership degrees are rendered 0.NUM 8.PAR 8. A speech recognition apparatus comprising a cluster storing means inwhich a feature vector is to be classified, a membership degreecalculating means for calculating, with respect to vectors y and z to becompared, a membership degree of each of the vectors to each of theclusters or a posterior probability of each of the clusters to each of thevectors, and calculating membership degree vectors a and b having themembership degrees of the respective vectors to the respective clusters aselements and a similarity degree calculating means for calculating adistance or a similarity degree between the membership degree vectors,wherein the distance or the similarity degree is rendered a distance or asimilarity degree of the feature vectors x and y.NUM 9.PAR 9. The speech recognition apparatus according to claim 8 comprising acluster storing means in which a feature vector is to be classified, amembership degree calculating means for calculating a membership degree ofeach vector of the vector series to the each cluster and calculating amembership degree vector having a membership degree of the vector to eachcluster as an element, a reference pattern storing means expressing arecognition unit to be compared similarly by a membership degree vectorseries and a matching means for matching an input pattern comprising themembership degree vector series provided as an output of the membershipdegree calculating means with the reference pattern, wherein as a resultof the matching a similarity degree or a distance of the input pattern andthe reference pattern is calculated.NUM 10.PAR 10. The speech recognition apparatus according to claim 9, furthercomprising a similarity degree calculating means for calculating thedistance or the similarity degree of corresponding similarity degreevectors by linearly or nonlinearly expanding or contracting a time axis ofeither of the input pattern and the reference pattern or time axes of boththereby aligning the time axes of the both patterns and an accumulationsimliarity degree calculating means for calculating the distance or thesimilarity degree along the time axis of either of the input pattern andthe reference pattern or the time axes of both, wherein the accumulationvalue is rendered the distance or the similarity degree of the inputpattern and the reference pattern.NUM 11.PAR 11. The speech recognition apparatus according to claim 9, furthercomprising a similarity degree calculating means for calculating distanceor similarity degrees among membership degree vectors and a dynamicprogramming means for optimality corresponding each of the membershipdegree vectors constituting the input pattern with each of the membershipdegree vectors constituting the reference pattern to be matched with theinput pattern such that an accumulation value of the distance or thesimilarity degrees among corresponded membership degree vectors of theboth patterns accumulated along a time axis of either of the input patternor the reference pattern, and along time axes of the both is minimized ormaximized and calculating a maximum or minimum value thereof.NUM 12.PAR 12. The speech recognition apparatus according to claim 8, wherein., whentwo of the membership degree vectors of which distance or similaritydegree is to be calculated are respectively defined as a=(a.sub.1, . . .,a.sub.M) and b=(b.sub.1, . . . ,b.sub.M), the similarity degreecalculating means for calculating the distance or the similarity degreecalculates the distance or the similarity degree by either of thefollowing equations or as an amount equivalent thereto.##EQU78##where C is a constant of C>0.NUM 13.PAR 13. The speech recognition apparatus according to claim 11, wherein, whenthe membership degree vector corresponding to a frame t of the inputpattern is defined as a.sub.t =(a.sub.t1, . . . ,a.sub.tM), the membershipdegree vector corresponding to a frame j of the reference pattern isdefined as b.sub.j =(b.sub.j1, . . . ,b.sub.jM), a k-th (t,j) coordinateon the matching path is defined as x(k)=(t(k),j(k)) and a weightingcoefficient at x(k) is defined as w(x(k)), the similarity degree ofa.sub.t(k) and b.sub.j(k) is specified as below,##EQU79##and an accumulated similarity degree along the path of vector seriesa.sub.t(1), . . . ,a.sub.t(K) and b.sub.j(1), . . . ,b.sub.j(K) isspecified as below,##EQU80##wherein for 1.ltoreq.n.ltoreq.k-1, if t(k)-t(k-n)=1, w(x(k-n+1))+ . ..+w(x(k))=1.NUM 14.PAR 14. The speech recognition apparatus according to claim 13, wherein thematching path is t(k)-t(k-1)=1 and w(x(k))=1.NUM 15.PAR 15. The speech recognition apparatus according to claim 13, wherein forx(k)=(t,j),k-1.gtoreq.n.gtoreq.1, the matching path includes either of (1)x(k-1)=(t-1,j-n) or x(k-1)=(t-1,j), (2) x(k-1)=(t-1,j-1) orx(k-1)=(t-1,j), x(k-m)=(t-1,j-m) for m=2, . . . ,n, (3) x(k-m)=(t,j-m),x(k-n)=(t-1,j-n) for m=1, . . . ,n-1, (4) x(k-m)=(t,j-m), x(k-n)=(t-1,j-n)for m=1, . . . ,n-1 and (5) x(k-1)=(t-1,j-1) or x(k-1)=(t-1,j),x(k-m)=(t-1,j-m) for m=2, . . . ,n and w(x(k))=1 for the path(1),w(x(k))=1,w(x(k-m+1))=0 for the path(2), w(x(K-m+1))=0, w(x(k-n+1))=1 forthe path(3) and w(x(k-m+1))=1/n for the paths (4) and (5).NUM 16.PAR 16. The speech recognition apparatus according to claim 11, wherein, whenthe membership degree vector corresponding to a frame t of the inputpattern is defined as a.sub.t =(a.sub.t1, . . . ,a.sub.tM), the membershipdegree vector corresponding to a frame j of the reference pattern isdefined as b.sub.j =(b.sub.j1, . . . ,b.sub.jM), a k-th (t,j) coordinateon the matching path is defined as x(k)=(t(k),j(k)) and a weightingcoefficient at x(k) is defined as w(x(k)), the similarity degreecalculating means for calculating the distance or the similarity degreespecifies the similarity degree of a.sub.t(k) and b.sub.j(k) as follows,##EQU81##and the similarity degree of vector series a.sub.t(1), . . . ,a.sub.t(K)and b.sub.j(1), . . . ,b.sub.j(K) along the path is specified as follows,##EQU82##where for 1.ltoreq.n.ltoreq.k-1, if j(k)-j(k-n)=1, w(x(k-n+1))+ . ..+w(x(k))=1.NUM 17.PAR 17. The speech recognition apparatus according to claim 16, wherein thematching path is specified as j(k)-j(k-1)=1 and w(x(k))=1.NUM 18.PAR 18. The speech recognition apparatus according to claim 16, wherein forx(k)=(t,j), k-1.gtoreq.n.gtoreq.1, the matching path includes either of(1) x(k-1)=(t-n,j-1) or x(k-1)=(t,j-1), (2) x(k-1)=(t-1,j-1) orx(k-1)=(t,j-1), x(k-m)=(t-m,k-1) for m=2, . . . ,n, (3)x(k-m)=(t-m,j),x(k-n)=(t-n,j-1) for m=, . . . ,n-1, (4) x(k-m)=(t-m,j),x(k-n)=(t-n,j-1) for m=1, . . . ,n-1 and (5) x(k-1)=(t-1,j-1) orx(k-1)=(t,j-1), x(k-m)=(t-m,j-1) for m=2, . . . ,n and w(x(k))=1 for thepath(1), w(x(k))=1, w(x(k-m+1))=0 for the path(2), w(x(k-m+1))=0,w(x(k-n+1))=1 for the path(3) and w(x(k-m+1))=1/n for the paths (4) and(5).NUM 19.PAR 19. The speech recognition apparatus according to claim 8, furthercomprising a reference pattern storing means for storing N ofb.sub.j,g(j,1), b.sub.j,g(j,2), . . . ,b.sub.j,g(j,N) taken from b.sub.j1,. . . ,b.sub.jM in an order of largeness (g(j,n) designates a label of an-th largest cluster in a frame j of a reference pattern, N.ltoreq.M) asthey are and remaining terms having a constant value b.sub.0 calculated toestablish b.sub.j,g(k,1) + . . .+b.sub.j,g(j,N) +b.sub.0 (M-N)=1 or in theform of respective logarithmic values thereof log b.sub.j,g(j,1), logb.sub.j,g(j2), . . . log b.sub.j,g(j,M) and log b.sub.0 when themembership degree of a feature vector of the frame j of the referencepattern to a cluster 11 is defined as b.sub.jm and a number of theclusters is defined as M.NUM 20.PAR 20. The speech recognition apparatus according to claim 8, wherein, whenthe membership degree of a feature vector of a frame j of a referencepattern to a cluster m is defined as b.sub.jm and a number of clusters isdefined as M, N of b.sub.j,g(j,1),b.sub.j,g(j,2), . . . ,b.sub.j,g(j,N)taken from b.sub.j1, . . . ,b.sub.jM in an order of largeness (g(j,n)designates a label of a n-th largest cluster in the frame j of thereference pattern, N.ltoreq.M) are stored as values calculated toestablish b.sub.j,g(j,1) + . . .+b.sub.j,g(j,N) =1 and remaining terms arestored such that b.sub.j,g(j,N+1) =. . . =b.sub.j,g(j,M) =0.NUM 21.PAR 21. The speech recognition apparatus according to claim 8, wherein, whenthe membership degree of a feature vector y.sub.t at a frame t of an inputpattern to a cluster in is defined as u.sub.tm and a number of clusters isdefined as M, the membership degree vector to which y.sub.t is to betransformed has a value calculated such that K of u.sub.t,h(t,1),u.sub.t,h(t,2), . . . ,u.sub.t,h(t,K) taken from u.sub.t1, . . . ,u.sub.tMin an order of largeness (h(t,k) designates label of a k-th largestcluster at the frame t of the input pattern, K.ltoreq.M) remain as theyare and remaining terms have a constant value of u.sub.0 establishingu.sub.t,h(t,1) + . . .+u.sub.t,h(t,K) +u.sub.o (M-K)=1.NUM 22.PAR 22. The speech recognition apparatus according to claim 8, wherein, whenthe membership degree of a feature vector y.sub.t at a frame t of an inputpattern to a cluster in is defined as u.sub.tm and a number of clusters isdefined as M, the membership degree vector to which y.sub.t is to betransformed has a value calculated such that K of u.sub.t,h(t,1),u.sub.t,h(t,2), . . . ,u.sub.t,h(t,k) taken from u.sub.t1, . . . ,u.sub.tM in an order of largeness (h(t,k) designates a label of a k-thlargest cluster at the frame t of the input pattern, K.ltoreq.M) havevalues establishing u.sub.t,h(t,1) + . . .+u.sub.t,h(t,K) =1 and remainingterns are rendered as u.sub.t,h(t,K+1) =. . . =u.sub.t,h(t,M) =0.NUM 23.PAR 23. The speech recognition apparatus according to claim 8, wherein thesimilarity degree of a t-th frame of an input pattern and a j-th frame ofa reference pattern is specified below,##EQU83##with respect to N of b.sub.j,g(j,1),b.sub.j,g(j,2), . . . ,b.sub.j,g(j,N)taken from b.sub.j1, . . . ,b.sub.jM in an order of largeness (g(j,n)designates a label of a n-th largest cluster at the frame j of thereference pattern, N.ltoreq.M), a value b.sub.o calculated to establishb.sub.j,g(j,1) + . . .+b.sub.j,g(j,N) +b.sub.o (M-N)=1 and u.sub.tmcalculated for all the clusters or K of u.sub.t,h(t,1),u.sub.t,h(t,2), . .. ,u.sub.t,h(t,K) calculated to establish u.sub.t,h(t,1) +u.sub.t,h(t,K)=1 in correspondence with an order of largeness among u.sub.t1, . . .,u.sub.tM (h(t,k) designates a label of a k-th largest cluster at theframe t of the input pattern, K.ltoreq.M).NUM 24.PAR 24. The speech recognition apparatus according to claim 8, wherein thesimilarity degree of a t-th frame of an input pattern and a i-th frame ofa reference pattern is specified below,##EQU84##with respect to N of b.sub.j,g(j,1),b.sub.j,g(j,2),b.sub.j,g(j,N)calculated to establish b.sub.j,g(j,g) + . . .+b.sub.j,g(j,N) =1 incorrespondence with an order of largeness among b.sub.j1, . . . ,b.sub.jM(g(j,n) designates a label of a n-th largest cluster at the frame j of thereference pattern, N.ltoreq.M), u.sub.tm calculated for all the clustersor K of u.sub.t,h(t,1),u.sub.t,h(t,2), . . . ,u.sub.t,h(t,K) taken fromu.sub.t1, . . . ,u.sub.tm in an order of largeness (h(t,k) designates alabel of a k-th largest cluster at the frame t of the input pattern,K.ltoreq.M) and a value u.sub.o calculated to establish u.sub.t,h(t,1) + .. .+u.sub.t,h(t,K) +u.sub.o (M-K)=1.NUM 25.PAR 25. The speech recognition apparatus according to claim 8, wherein thesimilarity degree of a t-th frame of an input pattern and a j-th frame ofa reference pattern is specified as follows,##EQU85##with respect to b.sub.jm calculated for all the clusters or N ofb.sub.j,g(j,1),b.sub.j,g(j,2), . . . ,b.sub.j,g(j,N) calculated toestablish b.sub.j,g(j,1) + . . .+b.sub.j,g(j,N) =1 in correspondence withan order of largeness among b.sub.j1, . . . ,b.sub.jM (g(j,n) designates alabel of a n-th largest cluster at the frame j of the reference pattern,N.ltoreq.M), K of u.sub.t,h(t,1),u.sub.t,h(t,2), . . . ,u.sub.t,h(t,K)taken from u.sub.t1, . . . ,u.sub.tM in an order of largeness (h(t,k)designates a label of a k-th largest cluster at the frame t of the inputpattern, K.ltoreq.M) and a value u.sub.o calculated to establishu.sub.t,h(t,1) + . . .+u.sub.t,h(t,K) +u.sub.o (M-K)=1.NUM 26.PAR 26. The signal recognition apparatus according to claim 1, wherein asimilarity degree of a t-th frame of an input pattern and a j-th frame ofa reference pattern is specified below,##EQU86##with respect to b.sub.jm calculated for all the clusters or N ofb.sub.j,g(j,1),b.sub.j,g(j,2), . . . ,b.sub.j,g(j,N) taken from b.sub.j1,. . . ,b.sub.jM in an order of largeness (g(j,n) designates a label of an-th largest cluster at the frame j of the reference pattern, N.ltoreq.M),b.sub.o calculated to establish b.sub.j,g(j,1) + . . .+b.sub.j,g(j,N)+b.sub.o (N-M)=1 and K of u.sub.t,h(t,1),u.sub.t,h(t,2), . . . ,u.sub.t,h(t,K) calculated to establish u.sub.t,h(t,1) + . ..+u.sub.t,h(t,K)= 1 in correspondence with an order of largeness amongu.sub.t1, . . . ,u.sub.tM (t,K) designates a label of a k-th largestcluster at the frame t of the input pattern, K.ltoreq.M).NUM 27.PAR 27. The speech recognition apparatus according to claim 5 or claim 19,wherein the membership degrees are calculated from distances of thevectors for calculating the membership degrees and representative vectorsof the respective clusters in which the membership degrees are calculatedby using the distances as they are with respect to the clusters up to theabove-mentioned K in the case of the formed and up to the above-mentionedN in the case of the latter successively from the clusters having smallerdistances and by using a common value with respect to the clusters havingorders thereof of K+1 or lower or N+1 or lower.NUM 28.PAR 28. The speech recognition apparatus according to claim 27, wherein thecommon value is an average of the distances for the respective clusterswith respect to the clusters having the orders of the above-mentioned K+1or lower or the above-mentioned N+1 or lower.NUM 29.PAR 29. The speech recognition apparatus according to claim 27, wherein thecommon value is an average of a minimum distance and a maximum distancewith respect to the clusters having the orders of the above-mentioned K+1or lower or the above-mentioned N+1 or lower.NUM 30.PAR 30. The speech recognition apparatus according to claim 5 or claim 19,wherein the membership degrees are calculated such that the membershipdegrees are equal to a constant value of a predetermined 1/K or less or1/N or less with respect to the clusters at a K+1 or lower or a N+1 orlower of an order determined from smaller ones of the distances of theobservation vectors for calculating the membership degrees andrepresentative vectors of the respective clusters and the membershipdegrees are calculated for the clusters K or N successively from theclusters having smaller ones of the distances such that a total sum of themembership degrees become 1 from the respective distances and the constantvalue.

Description:

PAC BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is an explanatory view of HMM.

FIG. 2 is a diagram showing an example of HMM that is frequently used inspeech recognition.

FIG. 3 is a block diagram showing a conventional example of a speechrecognition device by a discrete type HMM.

FIG. 4 is a block diagram showing a conventional example and an embodimentof the present application invention of a speech recognition device by HMMbased on fuzzy vector quantization.

FIG. 5 is a block diagram of a conventional example of a speech recognitiondevice by pattern matching.

FIG. 6 is an explanatory view showing an example of a restriction conditionin matching paths of DP matching of an input pattern axis dependent type.

FIG. 7 is a block diagram showing a conventional example of a speechrecognition device using vector quantization.

FIG. 8 is an explanatory view of a method of word spotting.

FIG. 9 is a block diagram showing an embodiment of a speech recognitiondevice by DP matching according to the present application invention basedon fuzzy vector quantization.

FIG. 10 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of an input pattern axisdependent type.

FIG. 11 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of an input pattern axisdependent type.

FIG. 12 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of an input pattern axisdependent type.

FIG. 13 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of an input pattern axisdependent type.

FIG. 14 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of an input pattern axisdependent type.

FIG. 15 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type.

FIG. 16 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type.

FIG. 17 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type.

FIG. 18 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type.

FIG. 19 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type.

FIG. 20 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type which is frequently used.

FIG. 21 is an explanatory view explaining a method of storing an occurrenceprobability of a cluster in respective states of HMM according to thepresent application invention or a membership degree of a feature vectorto the cluster in a reference pattern of DP matching according to thepresent application invention.

FIG. 22 is an explanatory view explaining a method of storing an occurrenceprobability of a cluster in respective states of HMM according to thepresent application invention or a membership degree of a feature vectorto the cluster in a reference pattern of DP matching according to thepresent application invention.

FIG. 23 is an explanatory view explaining a method of storing a membershipdegree of a feature vector to a cluster in an input pattern of HMMaccording to the present application invention or DP matching according tothe present application invention.

FIG. 24 is an explanatory view explaining a method of storing a membershipdegree of a feature vector to a cluster in an input pattern of HMMaccording to the present application invention of DP matching according tothe present application invention or DP matching according to the presentapplication invention. PAC DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS

An explanation will be given of embodiments of the present invention inreference to the drawings as follows.

Equation 7 is derived from Kullback-Leiber Divergence (hereinafter, KLD) ofa distribution u t ={u t1 , . . . ,u tM } and a distributionb i ={b i1 , . . . ,b iM }. (Literature: the Institute ofElectronics, Information and Communication Engineers Technical Report(SP93-27 (June, 1993)). That is, defining a separation degree of u tfrom b i as D(u t ││b i ), KLD is given bythe following equation.##EQU17##

This signifies a difficulty of occurrence of u t from a populationb i , in other words, a difficulty of occurrence of u t in a statei. Accordingly, putting log φ i (y t )=-D(u tb i ),φ i (y t ) signifies an ease of occurrence of u tat a state i which can be used as ω i (y t ). Then, Equation1 can be rewritten as follows by substituting φ i (y t ) forω i (y t ).##EQU18##

Since the component of the equation specified as follows is a quantitydetermined only by inputs irrespective of a model, it need not beconsidered when comparing models by considering the occurrence possibilityof an input pattern based on the value of Equation 22.##EQU19##Then, the following equation can newly be defined.##EQU20##

That is, Equation 24 is formed by defining the component as follow inEquation 1.##EQU21##

Equation 7 can be derived in this manner. This is the principle of themultiplication type FVQ/HMM.

However, the above procedure can be established under the assumption thatall the vectors forming an input pattern Y occurs only once and alwaysonce with regard to an arbitrary state series X in each of the HMM's.Meanwhile, in case that input pattern is considered to be a connection ofvector series in which several models occur, when we search which partialsection occurs more from a specific model, we have to think differently.In this case, we define an input pattern frame S (X) which corresponds toan initial state 1 of a state series X, of a model to be compared and wedefine an input pattern frame E(x) which corresponds to a final state J.In principle the following equation is calculated for all of X.##EQU22##

Thereby, the following equation is calculated by which S(X*) through E(X*)can be rendered the partial section of an input speech pattern to becalculated.##EQU23##

When the calculation is performed, all the combinations of E(X) and S(X)are calculated and the amount of calculation becomes very large. Moreover,in this case Equation 23 depends on X and accordingly, it cannot beomitted. Then, solving the optimum problem of Equation 27 by the DynamicProgramming method is considered. Assuming a frame s as a start-end frame,a finish end frame t is varies in a range of t'±V and degrees ofoccurrence of input portion patterns (y s , . . . ,y t ) from HMMμ are calculated by the Dynamic Programming method. In this caseEquation 10 is applicable and an optimum state series maximizingoccurrence degree with respect to each of input patterns (y s , . . .y t '-v), . . . ,(y s , . . . ,y t '+v) can be provided by theDynamic Programming method. That is, the finish-end frame t is moved in apertinent range and an optimum one is selected from the finish ends. Thenan optimum finish-end with s as the start-end can be calculated. Apredetermined value can be used for the range of the finish-ends from, forexample, an average length of vector series generated by the HMM to becompared with the start-end s, or the like can be used. In this case it isnecessary to normalize it by t-s+1 every time t is changed. When the sameoperation is performed by moving s, as a result, optimum start end andfinish end, is calculated, which is a desired partial section. In thiscase it is not necessary to calculate Equation 10 every time t is changed.That is, when a model of FIG. 2 is used, with regard to a certain s, thecorrespondence of an input feature vector and a state of the HMM islimited to the hatched range of FIG. 8. The range of an input frame m (i)corresponding to a state i in the hatched portion is specified as follows.##EQU24##

The range of m(i+1) is specified as follows.##EQU25##

Following Equation 10, φ i +1 (m(i+1)) in the range specified byEquation 29 is calculated as a continuation of φ i (m(i)) in therange of m(i) specified by Equation 28.

Accordingly, when φ i (m(i)) is calculated in the range of Equation28 for each of 0 i =1, . . . ,J+1, each of φ J +1 (m(J+1))provided in a range specified below for i=J+1 becomes each comparisonresult for the finish-end frame t'±v when the start-end frame of theinput pattern is s.##EQU26##

That is, in this way the results for the end frames t'±v with regard toone start-end frame s can be calculated in one operation. However, evenwith this method the above-mentioned calculation must be performed withrespect to the above-mentioned hatched range by changing the start-end atevery frame and the amount of calculation is very large. To further reducethe amount of calculation the start-end frame may automatically bedetermined by the Dynamic Programming method. For that purpose therecursive equation (Equation 10) is manipulated as follows.

Now, define an optimum state series corresponding to y s , . . .,y t as X*=x* s ,x* s +1, . . . ,x* t . To apply the DynamicProgramming method, when s<m'<t and x* m '-1, x* m ' =j (j≠i)in X* with respect to an input frame m', the optimum state series to whicha partial pattern y s , . . . ,y m ' corresponds must agree withx* s , . . . ,x* m '. That is, defining φ j (m') as a degreeof occurrence of the partial pattern y s , . . . ,y m ' withrespect to the state series x* s ', . . . ,x* m ',z(i) as a sectionlength (frame number) of a state i at the time and dur i (z)=dur i (z(i)) as a degree of continuing the state i by z frames the followingequation is established.##EQU27##where j≠i.

Then, the following equation is established.##EQU28##

When x* m -1 =j, x* m =h for m specific as m'<m<t, the followingequation is established.##EQU29##where h≠j.

Then, the following equation must be established.##EQU30##

In the above equations W i ,W i ', w di , w ai , w bi (k),w di , w ai ,w bi (k)'(i=1, . . . ,J) and the like designateweighting coefficients accompanied by state series or their summations andΨ i (m) designates a degree of occurrence of an input partialpattern y s ', . . . ,y m -z(i) corresponding to a state seriesx s ', . . . ,x m -z(i) (≠x* s , . . . ,x8 m -z(i)).W i designates a total sum of the weighting coefficients along thestate series x* s , . . . ,x* m -z(i) and W i ' designates atotal sum of the weighting coefficients along the state series x s ', .. . ,x m -z(i)'. Here, by adequately selecting the weightingcoefficients, Equation 31 through Equation 34 can always be establishedirrespective of the state series. For example, the above conditions aresatisfied if clearly, w i =w i ',W di '=w ai , w bi (1)'+ . . . +w bi (z(i)')'=w bi (1)+ . . . +w bi (z(i)). Thatis, with regard to a state series which becomes a state i from anotherstate at an input frame m, the sum of weighting coefficients along thestate series is made constant irrespective of m, the start-end frame s,and the way of taking the state up to a point (m,i). Substitutingnumerical values, it is possible to establish w di =W ai =1,w bi (k)=1/z(i) for i=1, . . . ,J.

From the above discussion, it is apparent that when the state is changed toj at the input frame t, the degree of occurrence of an optimum partialsection up to a point (t,j) can be calculated from the following recursiveequation when the number of frames of a partial section of an inputcorresponding to the state i is defined as z.##EQU31##

In this case i and z satisfy Equation 35 are designated as i*, z*. Thefollowing equation is simultaneously stored. B j (t)=B i * (t-z*) (Equation 36)

Then, word spotting can be performed by the following steps.

(1) Initialization Φ 1 (t)=0 for t=1, . . . ,T (π 1 =1,π i ≠0.A-inverted.i≠1) Φ i (0)=-∞ for i=1, . . . ,J B 1 (t)=0 for t=1, . . . ,T

(2) Execute (3) and (4) for T=1, . . . ,T+1

(3) Execute Equation 26 and Equation 27 for j=1, . . . ,J+1

(4) Φ(t)=Φ J +1 (t), B(t)=B J +1 (t-z*)

(5) Detect partial patterns

Finish end-frame:##EQU32##

Start end-frame: B(t*)

In this way the calculation of Φ i (t) in Equation 35 is performedonly once for each (t,j) and with respect to the addition of logω i (y t -z-l+k) over k=1 through z. The calculation is notperformed at every change of z since, when a sum over k=1 through z(m) isdesignated as s(m), the sum up to z(m) can be obtained by the calculationof s(m+1)=s(m)+log ω i (y t -z+z(m)) and accordingly, theamount of calculation is considerably reduced.

Here, in the present invention the following equation is established.##EQU33##

Then, the following recursive equation is defined corresponding to Equation35.##EQU34##

When the model of FIG. 2 is used, the following equation is established.##EQU35##

Now, assuming for simplicity that an optimum state series is calculated andthe length of the state i at that time is designed as z(i)*, the followingequation is established.##EQU36##

Then, the following equation is established.##EQU37##

When total sums are calculated for both sides, the following equations areestablished.##EQU38##Accordingly, when the recursive equation of Equation 31 is used,Φ J +1 (t) includes total sums with regard to a state of the finalterm on the right hand side of the equation and the second term from thefinal term on the right hand side thereof irrespective of the stateseries. This is a quantity that is determined when a model to be comparedis determined irrespective of the input pattern and is irrelevant to themaximization problem of this equation. Accordingly, this quantity is notnecessary in finding the optimum section of the input pattern. Therefore,ω i (y t ) redefined by the following equation can beadopted.##EQU39##

Next, an explanation will be given of a method of applying theabove-mentioned FVQ to DP matching which is the second embodiment of theinvention of this application. FIG. 9 is a block diagram showing theprinciple of the present invention. Parts 91 and 93 operate as similarlyto parts 71 and 73 in FIG. 7. Numeral 92 designates a membership degreematrix calculation unit corresponding to the distance matrix calculationunit 72 in FIG. 7, which calculates a membership degree matrix in thisembodiment. That is, it calculates membership degrees u tm (m=1, . . .,M,u t1 +u t2 + . . . +u tM =1) with respect to a cluster Cmof a feature vector y t . The membership degree can be a membershipfunction or the like used in the fuzzy theory or the like. In this case asimilar membership degree to the one used in the above-mentioned FVQ/HMMsuch as Equation 4 or the like is used. A word dictionary 95 is comprisedof membership degree matrices each previously registered in correspondencewith each word to be recognized. That is, a reference pattern of the r-thword is registered as a membership degree matrix provided by themembership degree matrix calculation unit 92 with respect to itspronunciation. In FIG. 9 a membership degree in a reference pattern of aword r with respect to a cluster m of a j-th frame is indicated byb (r) jm . Numeral 96 designates membership degree matrices forwords 1, . . . ,R.

In this case, a similarity degree of a frame t of an input pattern atrecognition time and a frame j of a reference pattern r is given as asimilarity degree of the membership degree vector u t =(u t1 ,u t2 , . . . ,u tM ) T and b (r) j =(b (r) j1 , b (r) j2 , . . . ,b (r) jM ) T .Here, u tm ≥0,u t1 + . . . +u tM =1,b (r) jm ≥0 and b (r) j1 + . . . +b (r) jM =1, andaccordingly, the above-mentioned both vectors can be regarded asprobability distribution vectors (when u tm and b (r) jm areposterior probabilities, they are exactly probability distributions perse). Therefore, the above-mentioned similarity degree can be given byKullback-Leibler Divergence, known to be distances between probabilitydistributions as in the above-mentioned HMM. That is, the separationdegree between distributions (q 1 , . . . ,q M ) and (p 1 , . . .,p M ) can be given by the following equation.##EQU40##

By using this the following three definitions can be performed as d (r)(t,j) are given.##EQU41##Equation 45(1) defines a distance useful in adopting a weightingcoefficient of Equation 17(1), Equation 45(2) defines a distance useful inoutputting a weighting coefficient of Equation 17(2) and Equation 45(3)define a distance useful in case where symmetry of distance is a problem.It is possible to use a definition multipled by a constant in case ofaddition style and a definition powered by a constant in case ofmultiplication style.

A matching unit 94 performs DP matching between respective membershipdegree matrices corresponding to respective words registered in the worddictionary 95 and membership degree matrices provided by input patterns.That is, the recursive equation (Equation 18) is calculated based on theinter-frame distance d (r) (t.j) specified by Equation 45, and anaccumulation distance D (r) defined by Equation 13(a) is calculated. Adetermination unit 97 calculates Equation 14 by which a recognition resultis provided.

In this way, each feature vector of a feature vector series pronounced toconstitute a word dictionary (to be compared) is not replaced by only onephoneme-like but a plurality of phoneme-likes provided with membershipdegrees and made to correspond to each frame and accordingly. Hence, theadverse influence of quantization error of the conventional example can bealleviated. Further, as is apparent from the above-mentioned explanation,the present invention is characterized by a calculation of membershipdegrees and a calculation of distances between respective frames of areference pattern and input frames that are based on mathematically cleardefinitions.

Next, an explanation will be given of a further improvement to the presentinvention.

First, an explanation will be given of a case where Equation 45(1) is usedas a distance measure.

In this case the inter-frame distance is specified below.##EQU42##

When this is substituted into Equation 13 with Equation 17(1) as aweighting coefficient, the following equation is established.##EQU43##

n is specified by 1≤n≤k-1 establishing t(k)-t(k-n)=1 (thereis no jump with regard to an input pattern frame in a matching path). Whena sum of weighting coefficients along a path from x(k-n) to x(k) is 1,that is, w(k-n+1)+w(k-n+2)+ . . . +w(k)=1, for example, in cases as shownby FIGS. 10 through 14, Equation 47 becomes as follows.##EQU44##

Examples of FIGS. 10 through 14 show cases for x(k)=(t,j),k-1≥n≥1 wherein in FIG. 10, x(k-1)=(t-1,j) or (t-1,j-n); inFIGS. 11 and 14, x(k-1)=(t-1,j) or (t-1,j-1) and x(k-n)=(t-1,j-m) for m=2,. . . ,n; in FIGS. 12 and 13, x(k-1)=(t-1,j), (t-1,j-1) or (t,j-1) andx(k-m)=(t,j-m), x(k-n)=(t-1,j-n) for m=2, . . . ,n-1. Numerical valuesappearing next to paths in respective figures designate an example ofweighting coefficients along these paths.

As the first term of the right hand side of Equation 48 is independent fromthe way of selecting paths and reference patterns, it represents an amountdetermined only by input patterns. Accordingly, it can be omitted whenonly large or small comparison results between respective referencepatterns and input patterns are a problem. By omitting the term andchanging the sign, the following equation can be a similarity degreebetween patterns.##EQU45##

In this case an inter-frame similarity degree between an input frame t anda frame j of a reference pattern can be defined as follows.##EQU46##

Here, when t(k)-t(k-1)=1 (there is no jump nor duplication with respect toan input pattern frame in matching paths, corresponding to FIG. 6 or FIG.10), the following equation is established.##EQU47##

In the above equation, j=j(t) designates a function showing a matching pathin a t-j plane which is provided by eliminating k from t=t(k), j=j(k).When paths in FIG. 6 are used, the matching unit 94 calculates a recursiveequation specified below based on the inter-frame similarity degrees (r) (t,j) by which the accumulation similarity degree S (r)defined by Equation 51 is calculated.##EQU48##where f(1, 1)=s (r) (1 , 1) S (r) =f(x(K))=f(I , J (r))

The determination unit 97 calculates the following equation by which arecognition result is provided.##EQU49##

The above-mentioned method in which n is specified by 1≤n≤k-1establishing t(k)-t(k-n)=1 (there is no jump with respect to an inputpattern frame in matching paths) a sum of weighting coefficients alongpaths from x(k-n) to x(k) is 1, that is, w(k-n+1)+w(k-n+2)+ . . . +w(k)=1,is useful in continuous word voice recognition. The reason is that in thisway, the problem of finding an optimum connection pattern of individuallyregistered recognition word reference patterns, which is the most similarto an input word voice pattern pronounced continuously, can be calculatedeffectively by using well-known two-stage DP or the like. The inter-framesimilarity degree proposed here is applicable to such a case and canprovide a high recognition function by simple calculation.

Next, an explanation will be given of a case in which Equation 45(2) isused as a distance measure.

In this case the inter-frame distance is specified below.##EQU50##

When this is substituted into Equation 13 with Equation 17(2) as aweighting coefficient, the following equation is established.##EQU51##

In a case where n is specified by 1≤n≤k-1 and j(k)-j(k-n)=1(there is no jump with respect to a reference pattern frame in matchingpaths) a sum of weighting coefficients along paths from x(k-n) to x(k) isequal to 1, that is, w(k-n+1)+w(k-n+2)+ . . . +w(k)=1. For example, in thecase of FIGS. 15 through 17, Equation 55 becomes##EQU52##

Examples of FIGS. 15 through 19 show cases for x(k)=(t,j),k-1≥n≥1 wherein in FIG. 15, x(k-1)=(t,j-1) or (t-n,j-1); inFIGS. 16 and 19, x(k-1)=(t,j-1) or (t-1,j-1), x(k-m)=(t-m,j-1) for m=2, .. . ,n and in FIGS. 17 and 18, x(k-1)=(t,j-1),(t-1,j-1) or (t-1,j) andx(k-m)=(t-m,j), x(k-n)=(t-n,j-1) for m=2, . . . n-1. Numerical valuesappearing next to paths in the respective figures designate an example ofweighting coefficients along the paths in respective cases.

In this case, the first term on the right hand side of Equation 56 isindependent from the way of selecting paths and any section of an inputpattern and is a quantity determined only by a reference pattern (thisquantity with respect to a reference pattern r is designated asC (r)). Therefore, it can be omitted only when large or smallcomparison results between a certain reference pattern and varioussections of an input continuous word speech pattern (or various inputpatterns) are a problem. Therefore, by omitting the term and changing thesign the following equation can be rendered a similarity degree betweenpatterns.##EQU53##

In this case an inter-frame similarity degree between an input frame t anda frame j of a reference pattern r can be specified as follows.##EQU54##

In determining to which reference pattern the input pattern is proximate byusing the definition of the inter-frame similarity degree, (S (r)-C (r))/J (r) are compared and the maximum one is found.

Here, when j(k)-j(k-1)=1 (there is no jump nor duplication with respect toa reference pattern frame in matching paths, corresponding to FIG. 20 or15) is further established, the following equation is derived.##EQU55##

In the above equation t=t(j) designates a function indicating a matchingpath in a t-j plane which is provided by eliminating k from t=t(k) andj=j(k). When the paths in FIGS. 15 through 19 are used, the matching unit94 calculates a recursive equation specified below based on theinter-frame similarity degree s (r) (t,j), defined by Equation 58, bywhich the accumulation similarity degree S (r), defined in Equation59, is calculated.##EQU56##where f(1 , 1)=s (r) (1, 1) S (r) =f(x(K))=f(I , J (r))

The above-mentioned method in which n is specified 1≤n≤k-1establishing j(k)-j(k-n)=1 (there is no jump with respect to a referencepattern frame in matching paths) a sum of weighting coefficients alongpaths from x(k-n) to x(k) is equal to 1. That is, w(k-n+1)+w(k-n+2)+ . ..+w(k)=1, a fact that is useful in performing so-called word spotting inwhich partial sections aligning the most with a certain reference patternis identified by an input pattern of a continuous word speech produced bycontinuously pronouncing a word. In this case, only S (r) inrespective sections are compared irrespective of lengths of sections of aninput pattern when a reference pattern to be compared is designated by r.That is, in this way the problem of word spotting can be solvedeffectively by applying the Dynamic Programming method using the followingsteps. The inter-frame similarity degree proposed here is applicable tosuch a case and can provide high recognition function by simplecalculation. For example, word spotting with respect to a certain word isperformed by using the path restriction condition of FIG. 20 as follows.

(1) Initialization f(0,j)=f(-1,j)=-∞ for i=-1,0,1, . . . ,J f(0,0)=0

(2) Execute (3) through (6) with respect to t=1, . . . ,T+1

(3) f(t,0)=-∞

(4) f(t,1)=s(t,1)

(5) B(t,1)=t-1

(6) Calculate the following recurring equation with respect j=2, . . . ,J##EQU57##(7) D(t)=f(t,J), B(t)=B(t,J) (8) Detect partial pattern

Finish end frame:##EQU58##Start end frame: B(t*)+1

Next, an explanation will be given of a method of reducing the amount ofmemory storage and the computational burden in the above-mentioned HMM andDP; which is the third embodiment of the present invention.

The basic way of thinking is based on storing the membership degree of thereference pattern with respect to only higher order N<M in the case ofreducing the storage amount, and calculating the membership degree of theinput pattern with respect to only higher order K<M in the case ofreducing the computational burden. In this case, attention should be paidso that although p i =0 for some i ε{1, . . . ,M} is possible,q i >0 for all ε{1, . . . ,M}, and q i cannot be 0 indefining the similarity between a probability distribution (p 1 , . . .,p M ) and a probability distribution (q 1 , . . . ,q M ) asfollows.##EQU59##Accordingly, in the case of calculating or storing only higher order N ofq i , with regard to remaining q i , a common value is determinedand used such that q 1 + . . .+q M =1. Therefore, the amount ofstorage necessary for q i (i=1, . . . ,M) in this case is N withregard to q g (1), . . . ,q g (N) and 1 with regard to q g (1), .. . ,q g (M). In the above notation, g(n) is a suffix of a n-th largestq among {q 1 , . . . ,q M }. Although the same treatment can beperformed with regard to p i as with q i by classifying them intothose having higher orders up to K and those having orders of K+1 andlower (K is not necessarily equal to N), these can be 0 and accordingly,it is possible that P h (l) +P h (K) =1, P h (K+1) + . ..+P h (M) =0 when h(K) is defined as a suffix of a k-th largest p among{P 1 , . . . ,P M }. In this case the amount of storage necessaryfor p i (i=1, . . . ,M) is only K with regard to P h (1), . . .,p h (K).ω (r) i (y t ) (specifying that theabove-mentioned ω i (y t ), b im , a ij etc. isrelated to a word r, is shown by attaching (r) as a superscript) in theabove-mentioned multiplication type FVQ/HMM, and all of s (r) (t,j) inthe multiplication type FVQ/DP are in the form of Equation 63 and the sameis applicable with regard to the above-mentioned reduction in storageamount and computational burden. Therefore, an explanation will be givenof embodiments with respect to the multiplication type FVQ/HMM, that is,ω (r) i (y t ). In this case, if a state j in a HMM ischanged to read a j-th frame in the reference pattern in DP, and theoccurrence probability b (r) im of a cluster m in the state j of aHMM is changed to read the membership degree of the j-th frame of areference pattern r to a cluster m in DP matching, the same discussion asin the case of the HMM applies also in the case of DP matching.

The following methods are conceivable for reducing respective amounts ofstorage with regard to u tm , b (r) jm as the definition ofω (r) j (y). In the following equation, a suffix g(r,j,n)signifies the name (number) of a cluster in the case where the occurrenceprobability of the j-th state of a HMM r is of the n-th order,b (r) j ,g(r,j,n) signifies the occurrence probability of thecluster g(r,j,n) at the j-th state of HMM r, h(t,k) signifies the name ofcluster in the case where the membership degree of the feature vector of at-th frame of the input pattern is of the k-th order, and u t ,h(t,k)signifies the membership degree of y t to the cluster h(t,k).PAL (First Method)##EQU60##

In the above equation, with regard to b (r) j ,g(r,j,n) theestimated values for n=1, . . . ,N [in 1≤n≤N is as it is]are b (r) j.g. (r,j,n) and specified as follows forN+1≤n≤M.##EQU61##

u tm may be used for all the estimated values,1≤m≤M/(1.1) or u t ,h(t,k) may be estimated as specifiedbelow 1≤k≤K, and u t ,h(t,k) =0 forK+1≤k≤M(1.2).##EQU62##

In the case of (1.2), the reduction in the membership calculation issimultaneously performed (mentioned later).PAL (Second Method)##EQU63##where b (r) j ,g(r,j,n) is estimated as specified below for1≤n≤N and b (r) j .g(r,j,n) =0 forN+1≤n≤M.##EQU64##u t ,h(t,k) may be used for all the estimated values of u t ,h(t,k)in 1≤k≤M(2.1) or u t ,h(t,k) similar to theabove-mentioned are used for 1≤k≤k and inK+1≤k≤M the following equation is established (2.2).##EQU65##

In the case of (2.2) the reduction in the membership degree calculation issimultaneously performed (mentioned later).PAL (Third Method)##EQU66##where with regard to u tm ,u t ,h(t,k) is the value for k=1, . . .,K specified below.##EQU67##b (r) j ,g(r,j,n,) may be used for all the estimated values forn=1, . . . ,M (3.1) or it may be estimated as specified below, settingb (r) j ,g(r,j,n) =0 for N+1≤n≤M.##EQU68##In the case of (3.2) the reduction in the storage amount is simultaneouslyperformed.PAL (Fourth Method)##EQU69##where u t ,h(t,k) may be estimated as specified below for1≤k≤K and u t ,h(t,k) =0 for K+1≤k≤M.##EQU70##b (r) j ,g(r,j,n) may be used for all the estimated values, of n in1≤n≤M or b (r) j ,g(r,j,n) may be defined as theestimated values for 1≤n≤N and as specified below forN+1≤n≤M.##EQU71##In the case of (4.2) the reduction in the storage amount is simultaneouslyperformed.

In the first method, the second method, (3.2) of the third method and (4.2)of the fourth method, the occurrence probabilities of the clusters in therespective states of the HMM (or the membership degrees as word referencepatterns in DP) are not stored with respect to all the clusters but thelabels and the probabilities (membership degrees) of these are stored forthe clusters having the probabilities (membership degrees) up to the N-thorder among the clusters having high probabilities (membership degrees)with respect to each state of HMM (each frame of the reference patterns ofDP). For example, a HMM (reference pattern) with regard to the r-th wordis as shown in FIG. 21 or FIG. 22. FIG. 21 can be used in the case ofdefining the similarity degree by Equation 67 and Equation 70, and FIG. 22can be used in the case of defining the similarity degree by Equation 64and Equation 73.

In (1.2) of the first method, (2.2) of the second method, the third methodand the fourth method, the membership degree matrix as the input patternis not calculated with respect to all the clusters but the membershipdegrees up to the K-th order among the clusters having high membershipdegrees are calculated with respect to each frame of the input pattern.For example, the input pattern is as shown by FIG. 23 or FIG. 24. FIG. 23can be used in the case of defining the similarity degree by Equation 64and Equation 73, and FIG. 24 can be used in the case of defining thesimilarity degree by Equation 67 and Equation 70.

In the case of Equation 64, Equation 73, if log b (r) j ,g(r,j,n) isstored as the membership matrix of the reference patterns in place ofb (r) j ,g(r,j,n) zin FIG. 22 (not shown), this calculation may bea product sum calculation. In this case the computational burden isincreased more than that in the conventional example of FIG. 7, where,assuming N=3 in Equation 64 and K=3 in Equation 73, the number ofmultiplication operations is increased by 3 times at each lattice point,and accordingly the number of multiplication operations is2560+3×50×100=4060. Therefore, the computational burden iscertainly increased compared with that in the conventional example 7, butis considerably smaller than that in the case of FIG. 5. Therefore, higherrecognition accuracy can be provided than in the conventional example ofFIG. 7.

In the case of Equation 67 and Equation 70, if log u t ,h(t,k) is storedin place of u t ,h(t,k) in FIG. 24 (not shown) as the membership matrixof the input pattern, this calculation may be a product sum calculation.In this case the amount of calculation is increased compared with that inthe conventional example of FIG. 7; where assuming N=3 in Equation 67 andK=3 in Equation 70, the number of multiplication operations is increasedby 3 times at each lattice point, and accordingly, the number ofmultiplication operations is 2560+3×50×100=4060. The number ofcalculations is certainly increased compared with that in the conventionalexample of FIG. 7, but is considerably less than that in the case of FIG.5 and resulting in a higher recognition accuracy than that in theconventional example of FIG. 7. In this case, compared with the case ofthe preceding paragraph where log b im is stored, calculation of logu t ,h(t,k) is necessary for each frame of the input pattern. However,if K=3, this is performed only 3 times for each frame and u t ,h(t,k)takes only numerical values between 0 and 1. Therefore, if a table of logx, 0≤x≤1 is formed, table look-up may be performed in placeof the calculation.

If the membership degree is defined by Equation 4 and u t ,h(t,1) + . ..+u t ,h(t,K) =1, u t ,h(t,K+1) =. . . =u t ,h(t,M) =0, the orderof increasing u tm is equal to the order of decreasing d(y t m ) and accordingly, firstly, d(y t , μ m ) is calculatedfor all the clusters and the calculation of u tm of higher orders Kmay be performed with regard to clusters of d(y t , μ m ) oflower orders K by which the amount of calculation can be reduced. That is,assume that the following equation is established.##EQU72##

The membership degree for 1≤k≤K is given as follows.##EQU73##In this case, both of the fraction calculation of the denominator inEquation 76 and the calculation of Equation 77 are performed by K times.If M=256, K=3 through 6, the amount of calculation becomes 1/40 through1/80.

When the membership degree is defined by Equation 4 where u t ,h(t,K+1)=. . . =u t ,h(t,M) =u t0 , u t ,h(t,1) + . . .+u t ,h(t,M)=1, the membership degree in 1≤k≤K is calculated by thefollowing Equation 79 by putting D as specified in the following equation78,##EQU74##(Equation 79) i) With respect to mε{h(1), . . . ,h(K)}, themembership degree is defined as##EQU75##ii) With respect to mε{h(K+1), . . . ,h(M)}, the membership degreeis defined as a common value,##EQU76##M fraction calculations of the denominator of Equation 78 are necessary.As the increasing order of values of u tm is equal to the decreasingorder of values of d(y t μ m ), d(y t μ m ) iscalculated first for all the clusters and the calculation of u tm ini) of Equation 79 may be performed with regard to clusters of lower ordersK of d(y t μ m ).

The following is a method for further simplification. For example, Equation78 is approximately specified as below by putting d t0 ={d(y t , μ h (K+1)}+ . . .+d(y t , μ H (M)}/(M-K) or d t0 ={d(y t , μ h (K+1)) +d(y t H (M))}/2, d t0 ={d(y t , μ h (K+1) }=. . . =d(y t (M))=d t0 .##EQU77##

Another possibility is that the membership degree calculating meanscalculates the membership degrees from distances between the observationvectors and the representative vectors of the respective clusters, wherethe one having the smallest distance is rendered the first order one. Apredetermined constant value of 1/K or less is provided to the clusters of(K+1)-th order or lower by arranging them in order and the calculation isperformed with regard to the clusters from 1 to K successively (from theones having small distances) such that the total sum of the membershipdegrees becomes 1 from the individual distances and the above-mentionedconstant value.

In the case of the multiplication type DP matching, the reference patternis the membership degree vector series. In the reference pattern, when weregister the membership degrees with respect to the clusters havingmembership degrees of higher orders up to N, we can use the same methodapplied to the above-mentioned u tm . That is, whenb (r) j ,g(r,j,1) + . . .+b (r) j ,g(r,j,N)=1,b (r) r ,j,N+1) =. . . =b (r) j ,g(r,j,M) =0, by puttingK➝N, h(t,k)➝g(r,j,n),u t ,h(t,k)➝b (r) j ,g(r,j,n), b (r) jm can be calculated incorrespondence with Equation 76 and Equation 77. Similarly, whenb (r) j ,g(r,j,N+1) =. . .=b (r) r ,j,M)=b (r) j ,g(r,j,1) + . . .+b (r) jm can be calculated incorrespondence with Equation 78, Equation 79, Equation 80 or the like.

Next, an explanation will be given of a fourth embodiment of theapplication. This embodiment is effective in the above-mentioned case of aHMM. The idea comes from the fact that the evaluation of u tm is notinfluenced theoretically even though there is a difference between theabove mentioned K for estimating b im and the above mentioned K forrecognition. It is often preferable that the number of calculations is aslow as possible in the case of the recognition; setting aside the case offorming a model. The discrete type HMM involves the least number ofcalculations, which corresponds to the case of recognition putting K=1 inthe FVQ/HMM. Accordingly, it is possible that for the purpose of forming amodel the method of FVQ/HMM is performed and for purposes of recognition,the recognition is performed by the method of the discrete type HMM. Asmentioned above, the FVQ is more effective in alleviating the problem ofinsufficient number of learning samples in learning parameters of HMMs andincreasing estimation accuracy of the parameters, than in reducing thequantizing strain by vector quantization through interpolation. When weform a model by the FVQ type and perform a recognition by the discretetype, the recognition rate capability is little deteriorated compared withthe case of using the FVQ type for both forming a model and recognition.However, the improvement in recognition rate in the former case can beverified experimentally when the code book size is large as compared withthe case of using the discrete type for both forming a model andrecognition.

Further, in case of matching by linear expansion and contraction of timeaxis comparison of input pattern with reference pattern can be performedbased on comparison between above-mentioned membership degree vectors. Inthe case of the linear expansion and contraction to match the number offrames of a reference pattern with the number of frames of an inputpattern, the definition of similarity degree of Equation 7 can be used. Inthe case of the linear expansion and contraction to match the number offrames of an input pattern with the number of frames of a referencepattern, the definition of similarity of Equation 43 can be used.

In the first embodiment, an HMM device in which word spotting is performedaccurately and with a small number of calculations by using a distancemeasure called Kullbach-Leibler Divergence, is detailed.

In the second embodiment, [the feature vector belongs to a plurality ofclusters at rates corresponding to membership degrees with regard torespective clusters or belongs thereto to a plurality of clusters at ratescorresponding to posterior probabilities of the respective clusters to thefeature vector. Similarity degrees among frames are defined by a distancemeasure of a probability based on the membership degrees. Therefore, apattern comparing device which is robust to variation in spectra, wherethe number of calculations is increased more than that in the conventionalcase only by a small amount, can be realized.

According to the third embodiment, occurrence probabilities of all theclusters are not stored for each state of the HMM corresponding to eachrecognition unit but are stored up to the N-th order in the order ofprobability. One common value is stored for the remaining ones, as theyshare an equal probability, by which the necessary memory storage isconsiderably reduced.

According to the fourth embodiment, formation of a model is performed bythe multiplication type FVQ/HMM and recognition is performed by thediscrete type HMM. Accordingly, estimation error due to insufficiency inthe number of learning samples in forming the model are decreased. Thus, adevice performing a small number of calculations at recognition time canbe realized.





<- Previous Patent (Apparatus that detec...)   |   Next Patent (Speech recognition s...) ->