| 5129002 | Pattern recognition apparatus | July, 1992 | Tsuboka | 381/43 |
| 5263120 | Adaptive fast fuzzy clustering system | November, 1993 | Bickel | 704/245 |
| 5307444 | Voice analyzing system using hidden Markov model and having plural neural network predictors | April, 1994 | Tsuboka | 704/256 |
| 5608841 | Method and apparatus for pattern recognition employing the hidden Markov model | March, 1997 | Tsuboka | 395/2.65 |
| JP63250698 | October, 1988 |
PAC BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 is an explanatory view of HMM.
FIG. 2 is a diagram showing an example of HMM that is frequently used inspeech recognition.
FIG. 3 is a block diagram showing a conventional example of a speechrecognition device by a discrete type HMM.
FIG. 4 is a block diagram showing a conventional example and an embodimentof the present application invention of a speech recognition device by HMMbased on fuzzy vector quantization.
FIG. 5 is a block diagram of a conventional example of a speech recognitiondevice by pattern matching.
FIG. 6 is an explanatory view showing an example of a restriction conditionin matching paths of DP matching of an input pattern axis dependent type.
FIG. 7 is a block diagram showing a conventional example of a speechrecognition device using vector quantization.
FIG. 8 is an explanatory view of a method of word spotting.
FIG. 9 is a block diagram showing an embodiment of a speech recognitiondevice by DP matching according to the present application invention basedon fuzzy vector quantization.
FIG. 10 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of an input pattern axisdependent type.
FIG. 11 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of an input pattern axisdependent type.
FIG. 12 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of an input pattern axisdependent type.
FIG. 13 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of an input pattern axisdependent type.
FIG. 14 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of an input pattern axisdependent type.
FIG. 15 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type.
FIG. 16 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type.
FIG. 17 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type.
FIG. 18 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type.
FIG. 19 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type.
FIG. 20 is an explanatory view showing an example of a restrictioncondition of matching paths of DP matching of a reference pattern axisdependent type which is frequently used.
FIG. 21 is an explanatory view explaining a method of storing an occurrenceprobability of a cluster in respective states of HMM according to thepresent application invention or a membership degree of a feature vectorto the cluster in a reference pattern of DP matching according to thepresent application invention.
FIG. 22 is an explanatory view explaining a method of storing an occurrenceprobability of a cluster in respective states of HMM according to thepresent application invention or a membership degree of a feature vectorto the cluster in a reference pattern of DP matching according to thepresent application invention.
FIG. 23 is an explanatory view explaining a method of storing a membershipdegree of a feature vector to a cluster in an input pattern of HMMaccording to the present application invention or DP matching according tothe present application invention.
FIG. 24 is an explanatory view explaining a method of storing a membershipdegree of a feature vector to a cluster in an input pattern of HMMaccording to the present application invention of DP matching according tothe present application invention or DP matching according to the presentapplication invention. PAC DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS
An explanation will be given of embodiments of the present invention inreference to the drawings as follows.
Equation 7 is derived from Kullback-Leiber Divergence (hereinafter, KLD) ofa distribution u t ={u t1 , . . . ,u tM } and a distributionb i ={b i1 , . . . ,b iM }. (Literature: the Institute ofElectronics, Information and Communication Engineers Technical Report(SP93-27 (June, 1993)). That is, defining a separation degree of u tfrom b i as D(u t ││b i ), KLD is given bythe following equation.##EQU17##
This signifies a difficulty of occurrence of u t from a populationb i , in other words, a difficulty of occurrence of u t in a statei. Accordingly, putting log φ i (y t )=-D(u tb i ),φ i (y t ) signifies an ease of occurrence of u tat a state i which can be used as ω i (y t ). Then, Equation1 can be rewritten as follows by substituting φ i (y t ) forω i (y t ).##EQU18##
Since the component of the equation specified as follows is a quantitydetermined only by inputs irrespective of a model, it need not beconsidered when comparing models by considering the occurrence possibilityof an input pattern based on the value of Equation 22.##EQU19##Then, the following equation can newly be defined.##EQU20##
That is, Equation 24 is formed by defining the component as follow inEquation 1.##EQU21##
Equation 7 can be derived in this manner. This is the principle of themultiplication type FVQ/HMM.
However, the above procedure can be established under the assumption thatall the vectors forming an input pattern Y occurs only once and alwaysonce with regard to an arbitrary state series X in each of the HMM's.Meanwhile, in case that input pattern is considered to be a connection ofvector series in which several models occur, when we search which partialsection occurs more from a specific model, we have to think differently.In this case, we define an input pattern frame S (X) which corresponds toan initial state 1 of a state series X, of a model to be compared and wedefine an input pattern frame E(x) which corresponds to a final state J.In principle the following equation is calculated for all of X.##EQU22##
Thereby, the following equation is calculated by which S(X*) through E(X*)can be rendered the partial section of an input speech pattern to becalculated.##EQU23##
When the calculation is performed, all the combinations of E(X) and S(X)are calculated and the amount of calculation becomes very large. Moreover,in this case Equation 23 depends on X and accordingly, it cannot beomitted. Then, solving the optimum problem of Equation 27 by the DynamicProgramming method is considered. Assuming a frame s as a start-end frame,a finish end frame t is varies in a range of t'±V and degrees ofoccurrence of input portion patterns (y s , . . . ,y t ) from HMMμ are calculated by the Dynamic Programming method. In this caseEquation 10 is applicable and an optimum state series maximizingoccurrence degree with respect to each of input patterns (y s , . . .y t '-v), . . . ,(y s , . . . ,y t '+v) can be provided by theDynamic Programming method. That is, the finish-end frame t is moved in apertinent range and an optimum one is selected from the finish ends. Thenan optimum finish-end with s as the start-end can be calculated. Apredetermined value can be used for the range of the finish-ends from, forexample, an average length of vector series generated by the HMM to becompared with the start-end s, or the like can be used. In this case it isnecessary to normalize it by t-s+1 every time t is changed. When the sameoperation is performed by moving s, as a result, optimum start end andfinish end, is calculated, which is a desired partial section. In thiscase it is not necessary to calculate Equation 10 every time t is changed.That is, when a model of FIG. 2 is used, with regard to a certain s, thecorrespondence of an input feature vector and a state of the HMM islimited to the hatched range of FIG. 8. The range of an input frame m (i)corresponding to a state i in the hatched portion is specified as follows.##EQU24##
The range of m(i+1) is specified as follows.##EQU25##
Following Equation 10, φ i +1 (m(i+1)) in the range specified byEquation 29 is calculated as a continuation of φ i (m(i)) in therange of m(i) specified by Equation 28.
Accordingly, when φ i (m(i)) is calculated in the range of Equation28 for each of 0 i =1, . . . ,J+1, each of φ J +1 (m(J+1))provided in a range specified below for i=J+1 becomes each comparisonresult for the finish-end frame t'±v when the start-end frame of theinput pattern is s.##EQU26##
That is, in this way the results for the end frames t'±v with regard toone start-end frame s can be calculated in one operation. However, evenwith this method the above-mentioned calculation must be performed withrespect to the above-mentioned hatched range by changing the start-end atevery frame and the amount of calculation is very large. To further reducethe amount of calculation the start-end frame may automatically bedetermined by the Dynamic Programming method. For that purpose therecursive equation (Equation 10) is manipulated as follows.
Now, define an optimum state series corresponding to y s , . . .,y t as X*=x* s ,x* s +1, . . . ,x* t . To apply the DynamicProgramming method, when s<m'<t and x* m '-1, x* m ' =j (j≠i)in X* with respect to an input frame m', the optimum state series to whicha partial pattern y s , . . . ,y m ' corresponds must agree withx* s , . . . ,x* m '. That is, defining φ j (m') as a degreeof occurrence of the partial pattern y s , . . . ,y m ' withrespect to the state series x* s ', . . . ,x* m ',z(i) as a sectionlength (frame number) of a state i at the time and dur i (z)=dur i (z(i)) as a degree of continuing the state i by z frames the followingequation is established.##EQU27##where j≠i.
Then, the following equation is established.##EQU28##
When x* m -1 =j, x* m =h for m specific as m'<m<t, the followingequation is established.##EQU29##where h≠j.
Then, the following equation must be established.##EQU30##
In the above equations W i ,W i ', w di , w ai , w bi (k),w di , w ai ,w bi (k)'(i=1, . . . ,J) and the like designateweighting coefficients accompanied by state series or their summations andΨ i (m) designates a degree of occurrence of an input partialpattern y s ', . . . ,y m -z(i) corresponding to a state seriesx s ', . . . ,x m -z(i) (≠x* s , . . . ,x8 m -z(i)).W i designates a total sum of the weighting coefficients along thestate series x* s , . . . ,x* m -z(i) and W i ' designates atotal sum of the weighting coefficients along the state series x s ', .. . ,x m -z(i)'. Here, by adequately selecting the weightingcoefficients, Equation 31 through Equation 34 can always be establishedirrespective of the state series. For example, the above conditions aresatisfied if clearly, w i =w i ',W di '=w ai , w bi (1)'+ . . . +w bi (z(i)')'=w bi (1)+ . . . +w bi (z(i)). Thatis, with regard to a state series which becomes a state i from anotherstate at an input frame m, the sum of weighting coefficients along thestate series is made constant irrespective of m, the start-end frame s,and the way of taking the state up to a point (m,i). Substitutingnumerical values, it is possible to establish w di =W ai =1,w bi (k)=1/z(i) for i=1, . . . ,J.
From the above discussion, it is apparent that when the state is changed toj at the input frame t, the degree of occurrence of an optimum partialsection up to a point (t,j) can be calculated from the following recursiveequation when the number of frames of a partial section of an inputcorresponding to the state i is defined as z.##EQU31##
In this case i and z satisfy Equation 35 are designated as i*, z*. Thefollowing equation is simultaneously stored. B j (t)=B i * (t-z*) (Equation 36)
Then, word spotting can be performed by the following steps.
(1) Initialization Φ 1 (t)=0 for t=1, . . . ,T (π 1 =1,π i ≠0.A-inverted.i≠1) Φ i (0)=-∞ for i=1, . . . ,J B 1 (t)=0 for t=1, . . . ,T
(2) Execute (3) and (4) for T=1, . . . ,T+1
(3) Execute Equation 26 and Equation 27 for j=1, . . . ,J+1
(4) Φ(t)=Φ J +1 (t), B(t)=B J +1 (t-z*)
(5) Detect partial patterns
Finish end-frame:##EQU32##
Start end-frame: B(t*)
In this way the calculation of Φ i (t) in Equation 35 is performedonly once for each (t,j) and with respect to the addition of logω i (y t -z-l+k) over k=1 through z. The calculation is notperformed at every change of z since, when a sum over k=1 through z(m) isdesignated as s(m), the sum up to z(m) can be obtained by the calculationof s(m+1)=s(m)+log ω i (y t -z+z(m)) and accordingly, theamount of calculation is considerably reduced.
Here, in the present invention the following equation is established.##EQU33##
Then, the following recursive equation is defined corresponding to Equation35.##EQU34##
When the model of FIG. 2 is used, the following equation is established.##EQU35##
Now, assuming for simplicity that an optimum state series is calculated andthe length of the state i at that time is designed as z(i)*, the followingequation is established.##EQU36##
Then, the following equation is established.##EQU37##
When total sums are calculated for both sides, the following equations areestablished.##EQU38##Accordingly, when the recursive equation of Equation 31 is used,Φ J +1 (t) includes total sums with regard to a state of the finalterm on the right hand side of the equation and the second term from thefinal term on the right hand side thereof irrespective of the stateseries. This is a quantity that is determined when a model to be comparedis determined irrespective of the input pattern and is irrelevant to themaximization problem of this equation. Accordingly, this quantity is notnecessary in finding the optimum section of the input pattern. Therefore,ω i (y t ) redefined by the following equation can beadopted.##EQU39##
Next, an explanation will be given of a method of applying theabove-mentioned FVQ to DP matching which is the second embodiment of theinvention of this application. FIG. 9 is a block diagram showing theprinciple of the present invention. Parts 91 and 93 operate as similarlyto parts 71 and 73 in FIG. 7. Numeral 92 designates a membership degreematrix calculation unit corresponding to the distance matrix calculationunit 72 in FIG. 7, which calculates a membership degree matrix in thisembodiment. That is, it calculates membership degrees u tm (m=1, . . .,M,u t1 +u t2 + . . . +u tM =1) with respect to a cluster Cmof a feature vector y t . The membership degree can be a membershipfunction or the like used in the fuzzy theory or the like. In this case asimilar membership degree to the one used in the above-mentioned FVQ/HMMsuch as Equation 4 or the like is used. A word dictionary 95 is comprisedof membership degree matrices each previously registered in correspondencewith each word to be recognized. That is, a reference pattern of the r-thword is registered as a membership degree matrix provided by themembership degree matrix calculation unit 92 with respect to itspronunciation. In FIG. 9 a membership degree in a reference pattern of aword r with respect to a cluster m of a j-th frame is indicated byb (r) jm . Numeral 96 designates membership degree matrices forwords 1, . . . ,R.
In this case, a similarity degree of a frame t of an input pattern atrecognition time and a frame j of a reference pattern r is given as asimilarity degree of the membership degree vector u t =(u t1 ,u t2 , . . . ,u tM ) T and b (r) j =(b (r) j1 , b (r) j2 , . . . ,b (r) jM ) T .Here, u tm ≥0,u t1 + . . . +u tM =1,b (r) jm ≥0 and b (r) j1 + . . . +b (r) jM =1, andaccordingly, the above-mentioned both vectors can be regarded asprobability distribution vectors (when u tm and b (r) jm areposterior probabilities, they are exactly probability distributions perse). Therefore, the above-mentioned similarity degree can be given byKullback-Leibler Divergence, known to be distances between probabilitydistributions as in the above-mentioned HMM. That is, the separationdegree between distributions (q 1 , . . . ,q M ) and (p 1 , . . .,p M ) can be given by the following equation.##EQU40##
By using this the following three definitions can be performed as d (r)(t,j) are given.##EQU41##Equation 45(1) defines a distance useful in adopting a weightingcoefficient of Equation 17(1), Equation 45(2) defines a distance useful inoutputting a weighting coefficient of Equation 17(2) and Equation 45(3)define a distance useful in case where symmetry of distance is a problem.It is possible to use a definition multipled by a constant in case ofaddition style and a definition powered by a constant in case ofmultiplication style.
A matching unit 94 performs DP matching between respective membershipdegree matrices corresponding to respective words registered in the worddictionary 95 and membership degree matrices provided by input patterns.That is, the recursive equation (Equation 18) is calculated based on theinter-frame distance d (r) (t.j) specified by Equation 45, and anaccumulation distance D (r) defined by Equation 13(a) is calculated. Adetermination unit 97 calculates Equation 14 by which a recognition resultis provided.
In this way, each feature vector of a feature vector series pronounced toconstitute a word dictionary (to be compared) is not replaced by only onephoneme-like but a plurality of phoneme-likes provided with membershipdegrees and made to correspond to each frame and accordingly. Hence, theadverse influence of quantization error of the conventional example can bealleviated. Further, as is apparent from the above-mentioned explanation,the present invention is characterized by a calculation of membershipdegrees and a calculation of distances between respective frames of areference pattern and input frames that are based on mathematically cleardefinitions.
Next, an explanation will be given of a further improvement to the presentinvention.
First, an explanation will be given of a case where Equation 45(1) is usedas a distance measure.
In this case the inter-frame distance is specified below.##EQU42##
When this is substituted into Equation 13 with Equation 17(1) as aweighting coefficient, the following equation is established.##EQU43##
n is specified by 1≤n≤k-1 establishing t(k)-t(k-n)=1 (thereis no jump with regard to an input pattern frame in a matching path). Whena sum of weighting coefficients along a path from x(k-n) to x(k) is 1,that is, w(k-n+1)+w(k-n+2)+ . . . +w(k)=1, for example, in cases as shownby FIGS. 10 through 14, Equation 47 becomes as follows.##EQU44##
Examples of FIGS. 10 through 14 show cases for x(k)=(t,j),k-1≥n≥1 wherein in FIG. 10, x(k-1)=(t-1,j) or (t-1,j-n); inFIGS. 11 and 14, x(k-1)=(t-1,j) or (t-1,j-1) and x(k-n)=(t-1,j-m) for m=2,. . . ,n; in FIGS. 12 and 13, x(k-1)=(t-1,j), (t-1,j-1) or (t,j-1) andx(k-m)=(t,j-m), x(k-n)=(t-1,j-n) for m=2, . . . ,n-1. Numerical valuesappearing next to paths in respective figures designate an example ofweighting coefficients along these paths.
As the first term of the right hand side of Equation 48 is independent fromthe way of selecting paths and reference patterns, it represents an amountdetermined only by input patterns. Accordingly, it can be omitted whenonly large or small comparison results between respective referencepatterns and input patterns are a problem. By omitting the term andchanging the sign, the following equation can be a similarity degreebetween patterns.##EQU45##
In this case an inter-frame similarity degree between an input frame t anda frame j of a reference pattern can be defined as follows.##EQU46##
Here, when t(k)-t(k-1)=1 (there is no jump nor duplication with respect toan input pattern frame in matching paths, corresponding to FIG. 6 or FIG.10), the following equation is established.##EQU47##
In the above equation, j=j(t) designates a function showing a matching pathin a t-j plane which is provided by eliminating k from t=t(k), j=j(k).When paths in FIG. 6 are used, the matching unit 94 calculates a recursiveequation specified below based on the inter-frame similarity degrees (r) (t,j) by which the accumulation similarity degree S (r)defined by Equation 51 is calculated.##EQU48##where f(1, 1)=s (r) (1 , 1) S (r) =f(x(K))=f(I , J (r))
The determination unit 97 calculates the following equation by which arecognition result is provided.##EQU49##
The above-mentioned method in which n is specified by 1≤n≤k-1establishing t(k)-t(k-n)=1 (there is no jump with respect to an inputpattern frame in matching paths) a sum of weighting coefficients alongpaths from x(k-n) to x(k) is 1, that is, w(k-n+1)+w(k-n+2)+ . . . +w(k)=1,is useful in continuous word voice recognition. The reason is that in thisway, the problem of finding an optimum connection pattern of individuallyregistered recognition word reference patterns, which is the most similarto an input word voice pattern pronounced continuously, can be calculatedeffectively by using well-known two-stage DP or the like. The inter-framesimilarity degree proposed here is applicable to such a case and canprovide a high recognition function by simple calculation.
Next, an explanation will be given of a case in which Equation 45(2) isused as a distance measure.
In this case the inter-frame distance is specified below.##EQU50##
When this is substituted into Equation 13 with Equation 17(2) as aweighting coefficient, the following equation is established.##EQU51##
In a case where n is specified by 1≤n≤k-1 and j(k)-j(k-n)=1(there is no jump with respect to a reference pattern frame in matchingpaths) a sum of weighting coefficients along paths from x(k-n) to x(k) isequal to 1, that is, w(k-n+1)+w(k-n+2)+ . . . +w(k)=1. For example, in thecase of FIGS. 15 through 17, Equation 55 becomes##EQU52##
Examples of FIGS. 15 through 19 show cases for x(k)=(t,j),k-1≥n≥1 wherein in FIG. 15, x(k-1)=(t,j-1) or (t-n,j-1); inFIGS. 16 and 19, x(k-1)=(t,j-1) or (t-1,j-1), x(k-m)=(t-m,j-1) for m=2, .. . ,n and in FIGS. 17 and 18, x(k-1)=(t,j-1),(t-1,j-1) or (t-1,j) andx(k-m)=(t-m,j), x(k-n)=(t-n,j-1) for m=2, . . . n-1. Numerical valuesappearing next to paths in the respective figures designate an example ofweighting coefficients along the paths in respective cases.
In this case, the first term on the right hand side of Equation 56 isindependent from the way of selecting paths and any section of an inputpattern and is a quantity determined only by a reference pattern (thisquantity with respect to a reference pattern r is designated asC (r)). Therefore, it can be omitted only when large or smallcomparison results between a certain reference pattern and varioussections of an input continuous word speech pattern (or various inputpatterns) are a problem. Therefore, by omitting the term and changing thesign the following equation can be rendered a similarity degree betweenpatterns.##EQU53##
In this case an inter-frame similarity degree between an input frame t anda frame j of a reference pattern r can be specified as follows.##EQU54##
In determining to which reference pattern the input pattern is proximate byusing the definition of the inter-frame similarity degree, (S (r)-C (r))/J (r) are compared and the maximum one is found.
Here, when j(k)-j(k-1)=1 (there is no jump nor duplication with respect toa reference pattern frame in matching paths, corresponding to FIG. 20 or15) is further established, the following equation is derived.##EQU55##
In the above equation t=t(j) designates a function indicating a matchingpath in a t-j plane which is provided by eliminating k from t=t(k) andj=j(k). When the paths in FIGS. 15 through 19 are used, the matching unit94 calculates a recursive equation specified below based on theinter-frame similarity degree s (r) (t,j), defined by Equation 58, bywhich the accumulation similarity degree S (r), defined in Equation59, is calculated.##EQU56##where f(1 , 1)=s (r) (1, 1) S (r) =f(x(K))=f(I , J (r))
The above-mentioned method in which n is specified 1≤n≤k-1establishing j(k)-j(k-n)=1 (there is no jump with respect to a referencepattern frame in matching paths) a sum of weighting coefficients alongpaths from x(k-n) to x(k) is equal to 1. That is, w(k-n+1)+w(k-n+2)+ . ..+w(k)=1, a fact that is useful in performing so-called word spotting inwhich partial sections aligning the most with a certain reference patternis identified by an input pattern of a continuous word speech produced bycontinuously pronouncing a word. In this case, only S (r) inrespective sections are compared irrespective of lengths of sections of aninput pattern when a reference pattern to be compared is designated by r.That is, in this way the problem of word spotting can be solvedeffectively by applying the Dynamic Programming method using the followingsteps. The inter-frame similarity degree proposed here is applicable tosuch a case and can provide high recognition function by simplecalculation. For example, word spotting with respect to a certain word isperformed by using the path restriction condition of FIG. 20 as follows.
(1) Initialization f(0,j)=f(-1,j)=-∞ for i=-1,0,1, . . . ,J f(0,0)=0
(2) Execute (3) through (6) with respect to t=1, . . . ,T+1
(3) f(t,0)=-∞
(4) f(t,1)=s(t,1)
(5) B(t,1)=t-1
(6) Calculate the following recurring equation with respect j=2, . . . ,J##EQU57##(7) D(t)=f(t,J), B(t)=B(t,J) (8) Detect partial pattern
Finish end frame:##EQU58##Start end frame: B(t*)+1
Next, an explanation will be given of a method of reducing the amount ofmemory storage and the computational burden in the above-mentioned HMM andDP; which is the third embodiment of the present invention.
The basic way of thinking is based on storing the membership degree of thereference pattern with respect to only higher order N<M in the case ofreducing the storage amount, and calculating the membership degree of theinput pattern with respect to only higher order K<M in the case ofreducing the computational burden. In this case, attention should be paidso that although p i =0 for some i ε{1, . . . ,M} is possible,q i >0 for all ε{1, . . . ,M}, and q i cannot be 0 indefining the similarity between a probability distribution (p 1 , . . .,p M ) and a probability distribution (q 1 , . . . ,q M ) asfollows.##EQU59##Accordingly, in the case of calculating or storing only higher order N ofq i , with regard to remaining q i , a common value is determinedand used such that q 1 + . . .+q M =1. Therefore, the amount ofstorage necessary for q i (i=1, . . . ,M) in this case is N withregard to q g (1), . . . ,q g (N) and 1 with regard to q g (1), .. . ,q g (M). In the above notation, g(n) is a suffix of a n-th largestq among {q 1 , . . . ,q M }. Although the same treatment can beperformed with regard to p i as with q i by classifying them intothose having higher orders up to K and those having orders of K+1 andlower (K is not necessarily equal to N), these can be 0 and accordingly,it is possible that P h (l) +P h (K) =1, P h (K+1) + . ..+P h (M) =0 when h(K) is defined as a suffix of a k-th largest p among{P 1 , . . . ,P M }. In this case the amount of storage necessaryfor p i (i=1, . . . ,M) is only K with regard to P h (1), . . .,p h (K).ω (r) i (y t ) (specifying that theabove-mentioned ω i (y t ), b im , a ij etc. isrelated to a word r, is shown by attaching (r) as a superscript) in theabove-mentioned multiplication type FVQ/HMM, and all of s (r) (t,j) inthe multiplication type FVQ/DP are in the form of Equation 63 and the sameis applicable with regard to the above-mentioned reduction in storageamount and computational burden. Therefore, an explanation will be givenof embodiments with respect to the multiplication type FVQ/HMM, that is,ω (r) i (y t ). In this case, if a state j in a HMM ischanged to read a j-th frame in the reference pattern in DP, and theoccurrence probability b (r) im of a cluster m in the state j of aHMM is changed to read the membership degree of the j-th frame of areference pattern r to a cluster m in DP matching, the same discussion asin the case of the HMM applies also in the case of DP matching.
The following methods are conceivable for reducing respective amounts ofstorage with regard to u tm , b (r) jm as the definition ofω (r) j (y). In the following equation, a suffix g(r,j,n)signifies the name (number) of a cluster in the case where the occurrenceprobability of the j-th state of a HMM r is of the n-th order,b (r) j ,g(r,j,n) signifies the occurrence probability of thecluster g(r,j,n) at the j-th state of HMM r, h(t,k) signifies the name ofcluster in the case where the membership degree of the feature vector of at-th frame of the input pattern is of the k-th order, and u t ,h(t,k)signifies the membership degree of y t to the cluster h(t,k).PAL (First Method)##EQU60##
In the above equation, with regard to b (r) j ,g(r,j,n) theestimated values for n=1, . . . ,N [in 1≤n≤N is as it is]are b (r) j.g. (r,j,n) and specified as follows forN+1≤n≤M.##EQU61##
u tm may be used for all the estimated values,1≤m≤M/(1.1) or u t ,h(t,k) may be estimated as specifiedbelow 1≤k≤K, and u t ,h(t,k) =0 forK+1≤k≤M(1.2).##EQU62##
In the case of (1.2), the reduction in the membership calculation issimultaneously performed (mentioned later).PAL (Second Method)##EQU63##where b (r) j ,g(r,j,n) is estimated as specified below for1≤n≤N and b (r) j .g(r,j,n) =0 forN+1≤n≤M.##EQU64##u t ,h(t,k) may be used for all the estimated values of u t ,h(t,k)in 1≤k≤M(2.1) or u t ,h(t,k) similar to theabove-mentioned are used for 1≤k≤k and inK+1≤k≤M the following equation is established (2.2).##EQU65##
In the case of (2.2) the reduction in the membership degree calculation issimultaneously performed (mentioned later).PAL (Third Method)##EQU66##where with regard to u tm ,u t ,h(t,k) is the value for k=1, . . .,K specified below.##EQU67##b (r) j ,g(r,j,n,) may be used for all the estimated values forn=1, . . . ,M (3.1) or it may be estimated as specified below, settingb (r) j ,g(r,j,n) =0 for N+1≤n≤M.##EQU68##In the case of (3.2) the reduction in the storage amount is simultaneouslyperformed.PAL (Fourth Method)##EQU69##where u t ,h(t,k) may be estimated as specified below for1≤k≤K and u t ,h(t,k) =0 for K+1≤k≤M.##EQU70##b (r) j ,g(r,j,n) may be used for all the estimated values, of n in1≤n≤M or b (r) j ,g(r,j,n) may be defined as theestimated values for 1≤n≤N and as specified below forN+1≤n≤M.##EQU71##In the case of (4.2) the reduction in the storage amount is simultaneouslyperformed.
In the first method, the second method, (3.2) of the third method and (4.2)of the fourth method, the occurrence probabilities of the clusters in therespective states of the HMM (or the membership degrees as word referencepatterns in DP) are not stored with respect to all the clusters but thelabels and the probabilities (membership degrees) of these are stored forthe clusters having the probabilities (membership degrees) up to the N-thorder among the clusters having high probabilities (membership degrees)with respect to each state of HMM (each frame of the reference patterns ofDP). For example, a HMM (reference pattern) with regard to the r-th wordis as shown in FIG. 21 or FIG. 22. FIG. 21 can be used in the case ofdefining the similarity degree by Equation 67 and Equation 70, and FIG. 22can be used in the case of defining the similarity degree by Equation 64and Equation 73.
In (1.2) of the first method, (2.2) of the second method, the third methodand the fourth method, the membership degree matrix as the input patternis not calculated with respect to all the clusters but the membershipdegrees up to the K-th order among the clusters having high membershipdegrees are calculated with respect to each frame of the input pattern.For example, the input pattern is as shown by FIG. 23 or FIG. 24. FIG. 23can be used in the case of defining the similarity degree by Equation 64and Equation 73, and FIG. 24 can be used in the case of defining thesimilarity degree by Equation 67 and Equation 70.
In the case of Equation 64, Equation 73, if log b (r) j ,g(r,j,n) isstored as the membership matrix of the reference patterns in place ofb (r) j ,g(r,j,n) zin FIG. 22 (not shown), this calculation may bea product sum calculation. In this case the computational burden isincreased more than that in the conventional example of FIG. 7, where,assuming N=3 in Equation 64 and K=3 in Equation 73, the number ofmultiplication operations is increased by 3 times at each lattice point,and accordingly the number of multiplication operations is2560+3×50×100=4060. Therefore, the computational burden iscertainly increased compared with that in the conventional example 7, butis considerably smaller than that in the case of FIG. 5. Therefore, higherrecognition accuracy can be provided than in the conventional example ofFIG. 7.
In the case of Equation 67 and Equation 70, if log u t ,h(t,k) is storedin place of u t ,h(t,k) in FIG. 24 (not shown) as the membership matrixof the input pattern, this calculation may be a product sum calculation.In this case the amount of calculation is increased compared with that inthe conventional example of FIG. 7; where assuming N=3 in Equation 67 andK=3 in Equation 70, the number of multiplication operations is increasedby 3 times at each lattice point, and accordingly, the number ofmultiplication operations is 2560+3×50×100=4060. The number ofcalculations is certainly increased compared with that in the conventionalexample of FIG. 7, but is considerably less than that in the case of FIG.5 and resulting in a higher recognition accuracy than that in theconventional example of FIG. 7. In this case, compared with the case ofthe preceding paragraph where log b im is stored, calculation of logu t ,h(t,k) is necessary for each frame of the input pattern. However,if K=3, this is performed only 3 times for each frame and u t ,h(t,k)takes only numerical values between 0 and 1. Therefore, if a table of logx, 0≤x≤1 is formed, table look-up may be performed in placeof the calculation.
If the membership degree is defined by Equation 4 and u t ,h(t,1) + . ..+u t ,h(t,K) =1, u t ,h(t,K+1) =. . . =u t ,h(t,M) =0, the orderof increasing u tm is equal to the order of decreasing d(y t ,μ m ) and accordingly, firstly, d(y t , μ m ) is calculatedfor all the clusters and the calculation of u tm of higher orders Kmay be performed with regard to clusters of d(y t , μ m ) oflower orders K by which the amount of calculation can be reduced. That is,assume that the following equation is established.##EQU72##
The membership degree for 1≤k≤K is given as follows.##EQU73##In this case, both of the fraction calculation of the denominator inEquation 76 and the calculation of Equation 77 are performed by K times.If M=256, K=3 through 6, the amount of calculation becomes 1/40 through1/80.
When the membership degree is defined by Equation 4 where u t ,h(t,K+1)=. . . =u t ,h(t,M) =u t0 , u t ,h(t,1) + . . .+u t ,h(t,M)=1, the membership degree in 1≤k≤K is calculated by thefollowing Equation 79 by putting D as specified in the following equation78,##EQU74##(Equation 79) i) With respect to mε{h(1), . . . ,h(K)}, themembership degree is defined as##EQU75##ii) With respect to mε{h(K+1), . . . ,h(M)}, the membership degreeis defined as a common value,##EQU76##M fraction calculations of the denominator of Equation 78 are necessary.As the increasing order of values of u tm is equal to the decreasingorder of values of d(y t μ m ), d(y t μ m ) iscalculated first for all the clusters and the calculation of u tm ini) of Equation 79 may be performed with regard to clusters of lower ordersK of d(y t μ m ).
The following is a method for further simplification. For example, Equation78 is approximately specified as below by putting d t0 ={d(y t , μ h (K+1)}+ . . .+d(y t , μ H (M)}/(M-K) or d t0 ={d(y t , μ h (K+1)) +d(y t ,μ H (M))}/2, d t0 ={d(y t , μ h (K+1) }=. . . =d(y t ,μ (M))=d t0 .##EQU77##
Another possibility is that the membership degree calculating meanscalculates the membership degrees from distances between the observationvectors and the representative vectors of the respective clusters, wherethe one having the smallest distance is rendered the first order one. Apredetermined constant value of 1/K or less is provided to the clusters of(K+1)-th order or lower by arranging them in order and the calculation isperformed with regard to the clusters from 1 to K successively (from theones having small distances) such that the total sum of the membershipdegrees becomes 1 from the individual distances and the above-mentionedconstant value.
In the case of the multiplication type DP matching, the reference patternis the membership degree vector series. In the reference pattern, when weregister the membership degrees with respect to the clusters havingmembership degrees of higher orders up to N, we can use the same methodapplied to the above-mentioned u tm . That is, whenb (r) j ,g(r,j,1) + . . .+b (r) j ,g(r,j,N)=1,b (r) r ,j,N+1) =. . . =b (r) j ,g(r,j,M) =0, by puttingK➝N, h(t,k)➝g(r,j,n),u t ,h(t,k)➝b (r) j ,g(r,j,n), b (r) jm can be calculated incorrespondence with Equation 76 and Equation 77. Similarly, whenb (r) j ,g(r,j,N+1) =. . .=b (r) r ,j,M)=b (r) j ,g(r,j,1) + . . .+b (r) jm can be calculated incorrespondence with Equation 78, Equation 79, Equation 80 or the like.
Next, an explanation will be given of a fourth embodiment of theapplication. This embodiment is effective in the above-mentioned case of aHMM. The idea comes from the fact that the evaluation of u tm is notinfluenced theoretically even though there is a difference between theabove mentioned K for estimating b im and the above mentioned K forrecognition. It is often preferable that the number of calculations is aslow as possible in the case of the recognition; setting aside the case offorming a model. The discrete type HMM involves the least number ofcalculations, which corresponds to the case of recognition putting K=1 inthe FVQ/HMM. Accordingly, it is possible that for the purpose of forming amodel the method of FVQ/HMM is performed and for purposes of recognition,the recognition is performed by the method of the discrete type HMM. Asmentioned above, the FVQ is more effective in alleviating the problem ofinsufficient number of learning samples in learning parameters of HMMs andincreasing estimation accuracy of the parameters, than in reducing thequantizing strain by vector quantization through interpolation. When weform a model by the FVQ type and perform a recognition by the discretetype, the recognition rate capability is little deteriorated compared withthe case of using the FVQ type for both forming a model and recognition.However, the improvement in recognition rate in the former case can beverified experimentally when the code book size is large as compared withthe case of using the discrete type for both forming a model andrecognition.
Further, in case of matching by linear expansion and contraction of timeaxis comparison of input pattern with reference pattern can be performedbased on comparison between above-mentioned membership degree vectors. Inthe case of the linear expansion and contraction to match the number offrames of a reference pattern with the number of frames of an inputpattern, the definition of similarity degree of Equation 7 can be used. Inthe case of the linear expansion and contraction to match the number offrames of an input pattern with the number of frames of a referencepattern, the definition of similarity of Equation 43 can be used.
In the first embodiment, an HMM device in which word spotting is performedaccurately and with a small number of calculations by using a distancemeasure called Kullbach-Leibler Divergence, is detailed.
In the second embodiment, [the feature vector belongs to a plurality ofclusters at rates corresponding to membership degrees with regard torespective clusters or belongs thereto to a plurality of clusters at ratescorresponding to posterior probabilities of the respective clusters to thefeature vector. Similarity degrees among frames are defined by a distancemeasure of a probability based on the membership degrees. Therefore, apattern comparing device which is robust to variation in spectra, wherethe number of calculations is increased more than that in the conventionalcase only by a small amount, can be realized.
According to the third embodiment, occurrence probabilities of all theclusters are not stored for each state of the HMM corresponding to eachrecognition unit but are stored up to the N-th order in the order ofprobability. One common value is stored for the remaining ones, as theyshare an equal probability, by which the necessary memory storage isconsiderably reduced.
According to the fourth embodiment, formation of a model is performed bythe multiplication type FVQ/HMM and recognition is performed by thediscrete type HMM. Accordingly, estimation error due to insufficiency inthe number of learning samples in forming the model are decreased. Thus, adevice performing a small number of calculations at recognition time canbe realized.