The method comprises the steps of:
time dividing the time-evolving signal (
for at least one sub-frame:
determining at least one parameter value (Psfi) yielded by the signal over at least a part of that sub-frame, and
expressing said parameter value as a component of a vector (S) along a dimension thereof specifically allocated to that sub-frame (SFi),
the resultant vector yielding the signature (S) of said gesture.
The invention also relates to an apparatus producing such signatures and commands by gesture recognition.
[0002] In the more general field body movement analysis, much of the early research was conducted for medical applications, notably in orthopaedics and prosthesis control. In that context, the analysis is conducted on patients who produce well-defined movements in controlled environments. In particular, the movements are analysed from a fixed reference position of a body portion. For instance, the movement of a forearm is generally made with the elbow or shoulders held stationary at a materialised reference point, e.g. using a rest. With such a restraint, the movements analysed are not gestures produced in free space where, on the contrary, the body and limbs are allowed to move freely without calling on an absolute reference or point of origin.
[0003] There is now a demand for man-machine interfacing techniques that allow to extract exploitable information from gestures produced in free space, notably for controlling apparatus or providing feedback on body movements. As a non-limiting example, such man-machine interfaces find applications for commanding music/video (media) player functions, video games, simulators, personal computers, domestic appliances, and can also be used for music generation and other forms of artistic expression.
[0004] Extracting useful information from gestures in free space in such a context imposes technical challenges not encountered when a body portion is physically restrained to a fixed reference position. In particular, efficient algorithms need to be implemented to produce a suitable signature from the raw data signals at the sensor outputs, and identify the corresponding gesture from that signature with acceptable reliability and speed.
[0005] Typically, the algorithm operates in two modes: a learning—or training—mode, in which a correlation is established between each one of a set of selected gestures and the end information extracted from the corresponding raw data signals, and a recognition mode, in which the correlation acquired in the learning mode is used to identify a particular end information, such as a command, from its corresponding gesture.
[0006] An example of a known technique for analysing arm movements in three-dimensional space is disclosed in a paper entitled “EMG-based Human-Machine Interface System” by O. A. Alsayegh, published in Proceedings of the IEEE Conference on Multimedia and Exposition, 2000. In this approach, EMG electrodes are placed on shoulder and elbow joints and their output signals are frequency filtered at a preprocessing stage. Then, their envelope is extracted. Control points representing local and global maximas and minimas (extremas) are extracted from the envelope. Finally, a temporal signature is constructed on the basis of the time instances of the recorded extremas to identify the corresponding gesture. More particularly, the preprocessed EMG signal is time integrated for each time interval between two successive extremas of the envelope. A plot is then produced in a feature space defined by three orthogonal axes, respectively corresponding to different muscles of the subject. For each of these muscles, a given time integral of the envelope signal is plotted on its respective axis. For each particular gesture, successive plotted points produce an ellipsoidal figure in that space which constitutes the gesture's signature. These ellipsoidal figures then serve as references for identifying future gestures.
[0007] The study demonstrated that twelve different arm gestures could be recognised in this way.
[0008] The invention envisages a different approach which is targeted principally—but not exclusively—at achieving fast and accurate gesture recognition, and which can be used even with less pronounced gestures, such as those involving mainly hand movements. The technique according to the invention results in a highly responsive and intuitive interaction between the subject producing the gestures and the interfaced electronic equipment which can be used to advantage in many different applications in consumer, professional, and institutional fields.
[0009] More particularly, the invention provides, according to a first aspect, a method of obtaining a signature of a gesture produced in free space, by deriving at least one time-evolving signal representative of muscular activity involved in producing the gesture and determining at least one value of a parameter yielded by that signal,
[0010] characterised in that it comprises the steps of:
[0011] time dividing the time-evolving signal into sub-frames, and
[0012] for at least one sub-frame:
[0013] determining at least one parameter value yielded by the time-evolving signal over at least a part of that sub-frame, and
[0014] expressing that parameter value as a component of a vector along a dimension thereof specifically allocated to that sub-frame,
[0015] the resultant vector forming the signature of that gesture.
[0016] Optional features are presented below.
[0017] The steps of determining a parameter value and expressing the latter as a component of a vector can be performed cyclically on the fly as the time-evolving signal progresses to occupy successive sub-frames, so that the resultant vector forming the signature acquires an increasing number of dimensions during the progression of the time-evolving signal.
[0018] There can be provided a gesture identification mode for identifying an analysed gesture among a set of previously learnt gestures acquired in a learning mode, each learnt gesture being expressed as a signature vector, wherein a decision on the identification of a gesture under identification is produced while that gesture under identification is still giving rise to an active time-varying signal.
[0019] Each learnt gesture can have a determined number of dimensions corresponding to a determined number of sub-frames over which that learnt signature was obtained, wherein the decision on an identification is produced on the basis of fewer sub-frames covered by the time-varying signal of the gesture under identification, relative to that determined number of sub-frames.
[0020] The method can further comprise a gesture learning mode comprising, for a given class of gesture to be learnt, the steps of:
[0021] acquiring a set of signatures for that class of gesture by repeatedly producing that gesture and obtaining its signature,
[0022] storing the set of signatures, and
[0023] storing the correspondence relating that set of signatures to the learned class of gesture and, optionally, to a command associated to that class of gesture.
[0024] The method can further comprise a gesture identification mode, in which a gesture under identification is currently produced, comprising the steps of:
[0025] producing on the fly at least one partial signature vector of the gesture under identification, the partial signature vector being limited by the number parameter value(s) currently available, and thereby having fewer dimensions than the learnt signature vectors,
[0026] for at least one partial signature vector produced, determining which signature among the learnt signatures best matches that partial signature vector, and
[0027] using the result of that determining step to produce a decision on the identification of the gesture under identification.
[0028] In one variant, the decision can be produced from a number of partial signatures obtained over a period bounded by a fixed limited number of sub-frames which is less than the total number of sub-frames used for obtaining the learnt signatures.
[0029] In another variant, the decision is produced from a number of partial signatures obtained over a period covering a number of sub-frames which is variable as function of the result of the step of determining which signature among the learnt signatures best matches the partial signature vector.
[0030] Advantageously, the method further comprises a step of deriving an envelope yielded by the time-evolving signal, and wherein the parameter value is a parameter value of that envelope.
[0031] For instance, the parameter value can be a value of at least one of:
[0032] the signal level of the envelope,
[0033] the power level of the envelope,
[0034] energy level of the envelope,
[0035] that value being preferably averaged over the duration of the sub-frame under consideration.
[0036] The duration of each sub-frame in the gesture identification mode is advantageously made equal to the duration of its corresponding sub-frame in the learning mode.
[0037] The sub-frames can be substantially uniform in duration.
[0038] The method can further comprise the step of monitoring the onset of a gesture by detecting the crossing of a threshold in the amplitude of the time-evolving signal, the start of the first sub-frame being made to coincide with the time of the crossing of that threshold.
[0039] The time-evolving signal can an electromyographic signal from one of a plurality of sensors, each sensor output being processed as an independent channel.
[0040] The time-evolving signal car, be obtained from at least one sensor positioned on a lower limb, preferably a forearm, and more preferably near the top of the forearm, to produce signals representative of a gesture involving a movement of an extremity of the limb, preferably a wrist, hand or fingers.
[0041] The time-evolving signal can be obtained from sensors at antagonistic muscle positions.
[0042] The method can be applied for commanding an apparatus, wherein at least one signature is made to correspond to a specific command.
[0043] The gesture can made to correspond to a selected trigger command setting the apparatus to enter an operating state, e.g. play, skip, pause etc..
[0044] At least one signature can be derived from a gesture taken from the following set of gestures:
[0045] producing a pointing finger,
[0046] producing an extended palm,
[0047] rotating a wrist,
[0048] forming a fist,
[0049] deflecting a wrist inwards,
[0050] deflecting a wrist outwards.
[0051] The method can be used to command a media player by means of at least one command gesture taken from the following set of command gestures:
[0052] producing a pointing finger, e.g. for commanding “play”,
[0053] producing an extended palm, e.g. for commanding “stop”,
[0054] rotating a wrist, e.g. for commanding “fast forward”, “rewind”, or “song seek/fast play”,
[0055] forming a fist, e.g. for pause/mute,
[0056] flexing a wrist (inward), e.g. for commanding “next track” or “step volume up”;
[0057] extending a wrist outwards, e.g. for “previous track” or “step volume down”.
[0058] The method can further comprise a step of generating a substantially continuously variable command output from a signal representative of muscular activity involved in producing a gesture.
[0059] The substantially continuously variable command output can be made to vary as a function of the natural oscillations of the signal representative of muscular activity involved in producing a gesture, and/or as a function of an envelope of that signal.
[0060] The substantially continuously variable command output can be independent of a command produced by gesture identification.
[0061] The substantially continuously variable command parameter can be made to control a quantity associated to a command produced by gesture identification.
[0062] The signal from which the substantially continuously variable command output is extracted and the signal from which said signature is obtained can originate from mutually different muscular regions, e.g. from respective arms.
[0063] The signal from which the substantially continuously variable command output is extracted and the signal from which the signature is obtained can originate from common muscular regions, e.g. from the same arm.
[0064] According to a second aspect, the invention relates to an apparatus for obtaining a signature of a gesture produced in free space, by deriving at least one time-evolving signal representative of muscular activity involved in producing the gesture and determining at least one value of a parameter yielded by that signal,
[0065] characterised in that it comprises:
[0066] means for time dividing the time-evolving signal into sub-frames,
[0067] means for determining at least one parameter value yielded by the time-evolving signal over at least a part of that sub-frame, and
[0068] means for expressing that parameter value as a component of a vector along a dimension thereof specifically allocated to that sub-frame,
[0069] the resultant vector forming the signature of that gesture.
[0070] According to a third aspect, the invention relates to an apparatus for executing a method according to the first aspect.
[0071] According to a fourth aspect, the invention relates an apparatus according to the second or third aspect, housed in a unit wearable by a person and further comprising means for sending extracted command inputs to a command executing apparatus.
[0072] According to a fifth aspect, the invention relates to a wearable apparatus responsive to user commands, characterised in that it comprises an apparatus according to any one of the second to fourth aspects.
[0073] According to a sixth aspect, the invention relates to the of the apparatus according to any one of the second to fifth aspects for sending user command inputs to an apparatus.
[0074] According to a seventh aspect, the invention relates to a computer program product loadable into the memory of a digital computer, comprising software code portions for performing at the steps of any one of appended claims
[0075] The invention and its advantages shall be more clearly understood from reading the following description of the preferred embodiments, given purely as non-limiting examples, with reference to the appended drawings in which:
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085] Referring to
[0086] Note that the movement is performed in free space, in that there is no established absolute reference position for the arm or body: the same gesture can just as well be produced e.g. in front of the face.
[0087] The data acquired from each of the sensors are digitised and formatted for sending, e.g. by a wireless transfer protocol such as “Bluetooth”, to a data acquisition interface
[0088] The basic analysis performed by the gesture analysis unit
[0089] At the gesture analysis unit
[0090] The envelope
[0091] Starting from time t0, the evolving frame F is divided into successive sub-frames SF
[0092] In Euclidean space, the magnitude |S| of the signature vector from the point of origin is simply |S|=[(Psw1)
[0093] In the example, the parameter Pi considered is the time averaged signal level (volts) of the envelope
[0094] In variant embodiments of the invention, instead of being the level of the envelope signal, the parameter can e.g. be one or a combination of the following:
[0095] the average power of the envelope over the sub-frame considered (the envelope
[0096] the energy of the envelope,
[0097] the rate of change of signal level, power or energy,
[0098] etc.
[0099] Similarly, the time averaging of the parameter can instead be replaced by another mathematical property, e.g.:
[0100] the minimum or maximum value within the sub-frame,
[0101] the median value over a given set of sample values within the sub-frame,
[0102] etc.
[0103] While the envelope
[0104] The signature is acquired in real time, i.e. on the fly, as the envelope
[0105] The signature vector S is represented graphically for the first three of its orthogonal components, each illustrated by a respective orthogonal axis APsf1, APsf2, APsf3 forming part of a multidimensional feature space
[0106] The signature S in that feature space representation is expressed as the position of a single mapped point with respect to an origin. Note that this point of the feature space representation starts to exist as soon as the first magnitude value Psf1 of the vector component Psf1 is determined, that point evolving in real time to occupy a successively increasing number of dimensions of the feature space
[0107] The above approach of acquiring the signature vector S of the envelope in an incremental manner as and when a vector component magnitude value becomes available is advantageous for identifying a gesture in real time, and makes it possible to obtain, if needs be, a gesture identification with only a subset of the full set of n vector components that completely specifies the envelope's signature., i.e. before considering all the possible time sub-frames, as shall be explained in more detail further.
[0108] When managing signals from several sensors
[0109] More particularly, in the example of a two-sensor system, the EMG signals from each the three sensors
[0110] The three two then together yield two respective parameter values P
[0111] There shall now be explained how this signature forming technique can be used in a gesture identification application, e.g for recognising a class of gesture being analysed among a set of learnt gesture classes. Such an application comprises a learning mode (also referred to as a training mode), and an identification mode. Here, the term “class” is used-for identifying a type of gesture; in the convention used here, each specific type of gesture considered (e.g. pointing, rotating the wrist, etc. ) is ascribed to a respective class designated Gx, where the suffix x is a unique identifier of the gesture The procedure begins at as stage where the duration D and number of sub-frames SF have already been determined experimentally. These factors are considered in correspondence with the length of the complete frame F, itself determined in terms of the start threshold Sthresh and a termination threshold Tthresh that corresponds to the dropping of the envelope below a prescribed level.
[0112] In the learning mode, envelopes are analysed over all the predefined sub-frames SF
[0113] A possible protocol for the learning mode is given by the flow chart of
[0114] The memory further stores the correspondence between the set of signatures {S
[0115] In other words, the memory contains an indexed correspondence table linking together the signatures vectors S
[0116] In applications where the gesture identification is not used for command purposes, the above correspondence to the command Cx can simply be ignored.
[0117] The above process is carried out in the same manner for each new class of gesture to be learnt. Thus, in the general case of m different classes of gesture learnt, assuming each class is sampled over the same number y of repeated gestures, there shall be a total of m*y stored signatures, which collectively forms the learnt corpus, each with a correspondence table correlating to a specific gesture and, when the application so requires, to a specific command.
[0118] For each class of gesture Gx, the y signature vectors, while not being identical owing to natural human inconsistencies, nevertheless lie within a common range of vector values. In a feature space representation, this common range would correspond to a cluster of y points plotted for the learnt gesture Gx. The learnt signature points of a cluster should preferably exhibit the least possible variance.
[0119] It is ensured in the learning phase that the m above ranges are well separated—or equivalently that the in corresponding clusters in feature space are well separated—to ensure that each of the learnt gestures can be clearly distinguished.
[0120] Once a required number of gestures have been learnt as above, the gesture identification mode can be initiated. In that mode, a gesture produced—hereafter referred as a “candidate” gesture (or gesture under identification) Gc—is analysed to determine its class G. Once identified, the gesture can then be used for sending a corresponding command C or for generating some feedback or other response.
[0121] The identification mode comprises extracting the signature vector Sc of the candidate gesture under the same parameter settings as in the learning mode. In particular, the timings and durations D of the sub-frames used in the learning mode are kept for the identification mode. However, as explained below, only some of the sub-frames will generally be considered, the identification thus being based on a partial form of the signature for the candidate gesture.
[0122] Then, a comparison technique, typically a nearest neighbour analysis, is applied to that partial signature Sc to determine a closest match among the signatures of the learnt corpus. The candidate signature Sc is then attributed as having for origin the same class of gesture as the one of the matching signature in the learnt corpus.
[0123] The core aspect of the gesture identification routine according to an embodiment is shown by the flow chart of
[0124] The procedure starts by detecting the onset of a gesture to be identified Gc in terms of the crossing of threshold Sthresh (step A
[0125] The class of the gesture attributed to the thus-identified nearest neighbour in the learnt corpus is recorded for later reference (step A
[0126] The procedure then loops back to the step A
[0127] Note that steps A
[0128] A constraint can be added whereby the Euclidean distance in vector space between the candidate signature Sc and the most closely matching signature of the learnt corpus must not exceed a prescribed limit value δ. This serves to filter out spurious or insufficiently identifiable gestures that could otherwise give rise to an unacceptable identification error rate.
[0129] In the example, the determination of the best match in the nearest neighbour analysis (step A
[0130] A decision algorithm is made to come into play (step A
[0131] Once that algorithm has made its choice, the class of gesture attributed to that chosen nearest neighbour signature is retrieved from the appropriate entry in the correspondence table and assigned to the candidate signature vector (step A
[0132] Note that in the preferred embodiment, the search for identification of the candidate signature is conducted on the fly, i.e. in real time, as can be seen from the time scale on the right of
[0133] In one form, this is achieved by establishing beforehand that a decision on the identification must taken on the basis of a limited set of sub-frames among the n sub-frames covering the evolution of the envelope for the candidate gesture. For instance, it can be imposed that a decision must reached on the basis of the parameter values taken from the first r successive time frames, starting from the first SF
[0134] This form of execution is illustrated by
[0135] As soon as the parameter value for the first sub-frame SF
[0136] The result of that early comparison is recorded in terms of the corresponding gesture for the closest match.
[0137] The above is repeated as soon the parameter value for the next sub-frame SF
[0138] The procedure is thus continued for each sub-frame up to and including the sub-frame SFr, with signature vector submitted for identification each time being constructed with the components already acquired, supplemented by the currently acquired component.
[0139] When all the r sub-frames SF
[0140] The above is an example where the end of the identification decision process generation is fixedly bounded in time (the data acquisitions exploited for outputting at least a first outputted decision ending at time tr). One of its advantages is to be guarantee to have a decision outputted after a known and constant time period starting from the moment t0.
[0141] In accordance with another approach, the moment of decision in the identification is variable in time that moment being determined rather by the consistency of the results currently at disposal.
[0142] This form of execution is illustrated by
[0143] Here, the procedure is as for above as regards establishing the signature vector components for successive sub-frames and each time obtaining a gesture identification. However, in this case, the successively obtained identification results are compared on the fly. Once it is determined that the results at disposal tend predominantly to one particular class of gesture, in accordance with pre-established rules, the decision to identify the candidate gesture to that class of gesture is produced. The decision is thus outputted for exploitation at the conclusion of a variable number of sub-frames processed, that number depending on the consistency of the accumulated identification results. The evolution of the input to the signature vector is illustrated in
[0144] In both approaches, vector components from further sub-frames can be submitted to nearest neighbour analysis, e.g. to provide a possibility of correcting an earlier erroneous decision that has been previously outputted.
[0145] In all the approaches, a visual or audio prompt is sent for the person to repeat the gesture if the results obtained do not allow to produce a satisfactory decision at the conclusion of either the r sub-frames (time bounded variant) or of the complete frame F (variable moment of decision output variant), e.g. if too many results fall outside the imposed limit δ.
[0146] The possibility thus afforded of producing a decision on an identified candidate gesture as soon as a satisfactory correlation with a class of the learnt corpus has been found greatly contributes to the speed of identification and to giving the person the impression of a highly responsive and intuitive gesture interfacing.
[0147] The gesture identification in accordance with the invention is particularly well suited for identifying gestures expressed at the level of the hand and wrist.
[0148] For instance, each of
[0149] In the illustrated example, each gesture is associated to a specific compact disk player transport command as follows:
[0150] a) pointing finger=>play, cf.
[0151] b) palm extended=>stop, cf.
[0152] c) wrist rotating (either clockwise or anticlockwise)=>seek-fast forward/song search (i.e. fast play of CD to advance quickly along a track), cf.
[0153] d) forming fist=>pause or mute, cf.
[0154] e) wrist flexion (inward flick of the wrist)=>jump to next track, cf.
[0155] f) wrist extension (outward flick of the wrist)=>jump to previous track, cf.
[0156] The above six gestures have been found by the Applicant to represent particularly judicious choices as regards their intuitive expression, ease of execution in free space, and ability of being well distinguished from the others by the above gesture identification technique.
[0157]
[0158] The sensors
[0159] At the receiving end, the sent raw EMG signals are detected by a Bluetooth interface
[0160] The envelope
[0161] The frame start and end detection unit
[0162] Each time an average signal level value for a sub-frame is obtained, it is sent to a real-time signature vector forming unit
[0163] The latter unit accesses in real time the storage unit
[0164] The signature comparison/identification unit
[0165] The command formatting unit
[0166] The overall operation of the entire device is handled by a central management unit
[0167] The different operational parameters of the gesture identification system
[0168] The software code for executing at least some of the different operations executed by the above-described units
[0169] The latter is insertable into an appropriate reader of the central management unit
[0170] The raw EMG signals—or other signals expressing muscular activity—can also be used separately or in combination with the above-described gesture analysis technique to express more continuous forms of commands or signal inputs to a responsive system, as illustrated with the example of
[0171] This system is composed of a user biosignal command generator
[0172] In the example, the user biosignal command generator
[0173] In parallel, the user can enter music “sculpting” commands, i.e. commands that influence characteristics of the sound produced by the music player
[0174] The outputs from the second set of sensors
[0175] The latter is programmed to analyse and extract characteristic properties available from either or both of the preprocessed raw right arm EMG signal
[0176] The sound processing unit
[0177] The above is an example of a substantially continuously variable command output, in this case extracted from the signal of the second armband
[0178] In other embodiments and applications, however, the continuously variable command parameter can be made instead to bear directly on a concomitant command produced by gesture identification. This can be achieved in different ways.
[0179] For instance:
[0180] a) by using separate muscular regions e.g. from right and left forearms using the configuration, shown in
[0181] b) by using substantially the same muscular regions, so that the same gesture serves to produce signals signal for both the identified gesture command and for expressing the substantially continuously variable quantity to be associated with its command. As an example, the earlier example of a finger pointing gesture can be made both to set the “play” mode and to determine the play level or the play start position, the latter being determined as a function of the intensity of the gesture (speed, vigour, tension, or amplitude), as detected directly from the sensor signals and their envelope. In this combined gesture approach, the same sensors can be used for both the command and the associated substantially variable quantity, whereupon the signals designated
[0182] The analysis of the sensor signal
[0183] It is clear that the present invention can be practised in many forms as regards applications, software and hardware implementations, sensor technology, sensor positioning, gesture to be analysed, commands or feedback to by produced, etc.
[0184] Moreover, the complete hardware for performing gesture identification and command execution functions (i.e. units
[0185] In addition to the applications and implementations already described above, the invention can find use in:
[0186] consumer electronics: media players, video games, music instrument control, commanding domestic appliances (lamps, kitchen and garden equipment, sound distribution, motorised devices, etc.),
[0187] professional electronics: recording studios, stage lighting control, etc.,
[0188] art forms: controlling musical instruments, graphic animation, colour generation, DJ sound effects (scratching, mixing, fading, etc.),
[0189] institutional environments : tuition (analysing playing styles, body expression, etc.),
[0190] health and care: automatic interpretation of sign language for transcription by voice synthesis or display of symbols or typed words,
[0191] industry: control of on-line process parameters, machinery, signalling by gestures,
[0192] transport: controlling land, air or sea vehicles by gesture commands,
[0193] command, control and intelligence,
[0194] etc.