Title:
System and method for detecting location of sound sources in a three-dimensional space
Kind Code:
A1


Abstract:

System and method for detecting location of sound sources in a three-dimensional space using linear microphone arrays comprising a plurality of microphones wherein the microphones are located in at least two groups of at least two microphones wherein each group has a different spacing of the respective microphones. Three such linear microphone arrays are used wherein first ends of all three microphone arrays, comprising the same arrangement of microphones, are in proximity or adjacent to each other; and the separate microphone arrays are positioned in different planes in three-dimensional space.




Inventors:
Paczkowski, Jacek (Patents Factory Ltd. Sp. z o.o.Spawaczy 3b/2, 65-119 Zielona Góra, PL)
Nalewa, Tomasz (Patents Factory Ltd. Sp. z o.o.Kossaka 83, 65-140 Zielona Góra, PL)
Kramek, Krzysztof (Patents Factory Ltd. Sp. z o.o.Spawaczy 3b/8, 65-119 Zielona Góra, PL)
Application Number:
EP20140461561
Publication Date:
02/24/2016
Filing Date:
08/21/2014
Assignee:
Patents, Factory Ltd Sp z. o. o. (Boleslawa Chrobrego 5, 65-043 Zielona Gora, PL)
International Classes:
H04R1/40; H04R3/00
View Patent Images:
Domestic Patent References:
EP2592846N/A2013-05-15



Foreign References:
58481701998-12-08
201401929992014-07-10
70901172006-08-15
Other References:
ZHENG Y R ET AL: "EXPERIMENTAL EVALUATION OF A NESTED MICROPHONE ARRAY WITH ADAPTIVE NOISE CANCELLERS", IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 53, no. 3, 1 June 2004 (2004-06-01), pages 777 - 786, XP001199042, ISSN: 0018-9456, DOI: 10.1109/TIM.2004.827304
DJAMILA MAHMOUDI ET AL: "MULTIRESOLUTION MICROPHONE ARRAY FOR SPEECH SOURCE ACQUISITION AND TRACKING", INTERNATIONAL WORKSHOP ON ACOUSTIC ECHO AND NOISE CONTROL (IWAENC'97), 11 September 1997 (1997-09-11), pages 116 - 119, XP055166115
FLANAGAN J L ET AL: "COMPUTER-STEERED MICROPHONE ARRAYS FOR SOUND TRANSDUCTION IN LARGE ROOMS", THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, AMERICAN INSTITUTE OF PHYSICS FOR THE ACOUSTICAL SOCIETY OF AMERICA, NEW YORK, NY, US, vol. 78, no. 5, 1 November 1985 (1985-11-01), pages 1508 - 1518, XP000195783, ISSN: 0001-4966, DOI: 10.1121/1.392786
Attorney, Agent or Firm:
Blonski, Pawel (EP-Patent Konstruktorow 30/2, 65-119 Zielona Gora, PL)
Claims:
1. A linear microphone array comprising a plurality of microphones the microphone array being characterized in that: • the microphones are located in at least two groups (103A-C) of at least two microphones whereas each group has a different spacing of the respective microphones.

2. The linear microphone array according to claim 1 characterized in that there are five groups of microphones each comprising at least two microphones wherein spacing of respective microphones in groups is such that in a subsequent group the spacing is twice of that of the preceding group.

3. The linear microphone array according to claim 1 characterized in that there are five groups of microphones and that the first group comprises seventeen microphones, while the remaining four groups comprise eight microphones each.

4. A linear microphone system using three linear microphone arrays according to claim 1, the system being characterized in that: • first ends of all three microphone arrays, comprising the same arrangement of microphones, are in proximity or adjacent to each other; and • the separate microphone arrays are positioned in different planes in three-dimensional space.

5. The linear microphone system according to claim 4 characterized in that the other ends of the microphone arrays linearly extend on X, Y and Z axis respectively.

6. A method for sound source localization using a microphone array, the method being characterized in that it comprises the steps of: • positioning the system according to claim 4 in a detection area; • assigning each group of microphones within each microphone array to a non-overlapping frequency band wherein the higher the frequency the lower the spacing of microphones; • for each of the microphone arrays executing the steps of: • filtering (401) sounds from each microphone with band-pass filters into sub-bands; • selecting active microphones (402) depending on the selected sub-band that is associated with microphones spacing; • selecting (403), for the selected active microphones and sub-band, appropriate samples wherein the higher the sub-band frequency the more samples are selected whereas sampling frequency is greater than the frequency of the sampled sub-band having the highest frequency; • selecting (404) angular sampling density based on band frequency; • calculating a delay (405), for each value of angle α within a range of -90° to +90°, with which sound will arrive to each microphone from a given direction assuming a distance from a sound source is infinite; • calculating (406) a sample of sound for a given direction by adding sound of all active microphones taking delays into account for a given angle; • calculating (407) signal strength arriving from each of the tested directions by summing absolute values of N directional samples thereby obtaining signal strength curve values; • detecting (408) local maxima and assuming them as detection result; • using sound source angles obtained for each of the microphone arrays in order to determine sound source localization by means of triangulation.

7. The method according to claim 6 characterized in that selected samples are all samples in case of over 15kHz band, every second sample for a band of 8..15kHz and every fourth sample for a band below 8kHz.

8. The method according to claim 6 characterized in that the angular sampling density is selected such that a round angle is divided into kk sections wherein a division coefficient kk is 360 except for band 0..500Hz wherein the kk=60 and 7..8kHz band wherein the kk = 120.

9. The method according to claim 6 characterized in that delay calculation step assumes that a reference point is the leftmost microphone whereas for angle -90° the delay will have a value calculated by the following equation: dt=dxVs
where dx is a distance between a given microphone and a reference microphone and Vs is a speed of sound, whereas a coefficient of delay dependent on sampling equals: dt2=dt*Fs
where Fs is a sampling frequency; whereas the calculated delay equals: dt3=dt2*sin90-α
where α is an angle selected for evaluation.

10. The method according to claim 6 characterized in that the detected local maxima are compared (409) with other sub-bands whereas in case a sound source is present in a plurality of sub-bands where the angles are similar, a resulting angle may be calculated as a weighted average.

11. A computer program comprising program code means for performing all the steps of the computer-implemented method according to claim 6 when said program is run on a computer.

12. A computer readable medium storing computer-executable instructions performing all the steps of the computer-implemented method according to claim 6 when executed on a computer.

13. A system for sound source localization comprising • a microphone array; • a data bus (701) communicatively coupling components of the system; • a memory (704) for storing data; • a controller (705); • a sampling module (703);
the system being characterized in that it comprises: • the microphone array system (702) according to claim 2; • whereas the controller (705) is configured to control the sampling module (703) and to execute all steps of the method according to claim 6.

Description:

The present invention relates to a system and method for detecting location of sound sources in a three-dimensional space. In particular the present invention relates to detecting an angle at which a sound source is located with respect to a linear microphone array.

Prior art defines

US 7090117 B2 entitled "Microphone array and digital signal processing system" disclosing a digital microphone array is configured in an open geometry such as a sphere with a large number of inexpensive microphone elements mounted in opposite-facing pairs. The microphone array with DSP is intended to be placed in a three-dimensional sound field, such as a concert hall or film location, and to completely isolate all sound sources from each other while maintaining their placement in a coherent sound field including reverberance.

It would be advantageous to provide a simpler microphone array capable of executing sound localization.

A publication of

EP2592846A1 discloses a method and an apparatus for processing signals of a spherical microphone array on a rigid sphere used for generating an Ambisonics representation of the sound field, wherein an equalisation filter is applied to the inverse microphone array response.

The aim of the development of the present invention is an improved, more accurate and resources cost effective system and method for detecting location of sound sources in a three-dimensional space.

An object of the present invention is a linear microphone array comprising a plurality of microphones wherein the microphones are located in at least two groups of at least two microphones whereas each group has a different spacing of the respective microphones.

Preferably, there are five groups of microphones each comprising at least two microphones wherein spacing of respective microphones in groups is such that in a subsequent group the spacing is twice of that of the preceding group.

Preferably, there are five groups of microphones and that the first group comprises seventeen microphones, while the remaining four groups comprise eight microphones each.

Another object of the present invention is a linear microphone system using three linear microphone arrays according to the first object of the present invention, the system being having first ends of all three microphone arrays, comprising the same arrangement of microphones, are in proximity or adjacent to each other; and the separate microphone arrays are positioned in different planes in three-dimensional space.

Preferably, the other ends of the microphone arrays linearly extend on X, Y and Z axis respectively.

Further object of the present invention is a method for sound source localization using a microphone array, the method comprising the steps of: positioning the system according to the second object of the present invention in a detection area; assigning each group of microphones within each microphone array to a non-overlapping frequency band wherein the higher the frequency the lower the spacing of microphones; for each of the microphone arrays executing the steps of: filtering sounds from each microphone with band-pass filters into sub-bands; selecting active microphones depending on the selected sub-band that is associated with microphones spacing; selecting, for the selected active microphones and sub-band, appropriate samples wherein the higher the sub-band frequency the more samples are selected whereas sampling frequency is greater than the frequency of the sampled sub-band having the highest frequency; selecting angular sampling density based on band frequency; calculating a delay, for each value of angle α within a range of -90° to +90°, with which sound will arrive to each microphone from a given direction assuming a distance from a sound source is infinite; calculating a sample of sound for a given direction by adding sound of all active microphones taking delays into account for a given angle; calculating signal strength arriving from each of the tested directions by summing absolute values of N directional samples thereby obtaining signal strength curve values; detecting local maxima and assuming them as detection result; using sound source angles obtained for each of the microphone arrays in order to determine sound source localization by means of triangulation.

Another object of the present invention is a computer program comprising program code means for performing all the steps of the computer-implemented method according to the present invention when said program is run on a computer.

Another object of the present invention is a computer readable medium storing computer-executable instructions performing all the steps of the computer-implemented method according to the present invention when executed on a computer.

These and other objects of the invention presented herein are accomplished by providing a system and method for detecting location of sound sources in a three-dimensional space. Further details and features of the present invention, its nature and various advantages will become more apparent from the following detailed description of the preferred embodiments shown in a drawing, in which:

  • Fig. 1 shows a microphone array;
  • Fig, 2 shows activation scheme of microphone arrays;
  • Figs. 3A-B depict a microphone array system;
  • Fig. 4 presents a diagram of the method according to the present invention;
  • Fig. 5 depicts an exemplary division of an acoustic band;
  • Fig. 6 presents an example of an answer of the detector;
  • Fig. 7 presents a diagram of the system according to the present invention; and
  • Fig. 8 shows an installation of the system in a room.

NOTATION AND NOMENCLATURE

Some portions of the detailed description which follows are presented in terms of data processing procedures, steps or other symbolic representations of operations on data bits that can be performed on computer memory. Therefore, a computer executes such logical steps thus requiring physical manipulations of physical quantities.

Usually these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. For reasons of common usage, these signals are referred to as bits, packets, messages, values, elements, symbols, characters, terms, numbers, or the like.

Additionally, all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Terms such as "processing" or "creating" or "transferring" or "executing" or "determining" or "detecting" or "obtaining" or "selecting" or "calculating" or "generating" or the like, refer to the action and processes of a computer system that manipulates and transforms data represented as physical (electronic) quantities within the computer's registers and memories into other data similarly represented as physical quantities within the memories or registers or other such information storage.

A computer-readable (storage) medium, such as referred to herein, typically may be non-transitory and/or comprise a non-transitory device. In this context, a non-transitory storage medium may include a device that may be tangible, meaning that the device has a concrete physical form, although the device may change its physical state. Thus, for example, non-transitory refers to a device remaining tangible despite a change in state.

DESCRIPTION OF EMBODIMENTS

A microphone array according to the present invention comprises, as shown in Fig. 1, a supporting body 101 and linearly, spatially located microphones 102A-L wherein the microphones are located in at least two groups 103A-C of at least two microphones whereas each group has a different spacing of the respective microphones.

The microphones 102 are preferably located on a straight line such that a first group of microphones comprises microphones spaced by for example 6,25mm, the second group of microphones comprises microphones spaced by for example 12,5mm, the third group of microphones comprises microphones spaced by for example 25mm, the fourth group of microphones comprises microphones spaced by for example 50mm and the fifth group of microphones comprises microphones spaced by for example 100mm. Therefore, there are five groups each comprising at least two microphones wherein spacing of respective microphones in groups is such that in subsequent group the spacing is for example twice of that of the preceding group.

Preferably, the first group comprises 17 microphones, while the remaining four groups comprise eight microphones each. This number is a preferred arrangement as shown by experiments and evaluation of response curve at different numbers of microphones in arrays.

Such an arrangement is due to different geometry spacing of the microphones for different sound frequencies. A general rule is that the frequency bands of the microphones groups are non-overlapping and the higher the frequency the lower the spacing of microphones. Therefore, for sounds up to 1 KHz there are used 17 microphones spaced by 100 mm. For band 1..4kHz there may be used 17 microphones spaced by 50 mm, for band 4..8kHz there may be used 17 microphones spaced by 25mm, for band 8..15kHz there may be used 17 microphones spaced by 12.5 mm and for band over 15kHz there may be used 17 microphones spaced by 6.25 mm.

Fig. 2 shows five microphone arrays 203A - 203E according to the present invention, wherein active microphones are marked with a thick border 201 and inactive microphones are marked with a thin border 202. For the sake of simplicity and readability of the drawing, the microphones are equally spaced on the drawing but in reality are spaced according to the spacing factor X to 16X as indicated in the figure.

The microphone array 203A on the bottom is configured for band up to 1 kHz, the microphone array 203B is configured for a band 1..4kHz, the microphone array 203C is configured for a band 4..8kHz, the microphone array 203D is configured for a band 8..15kHz and the microphone array 203E is configured for a band over 15kHz. The configuration denotes herein a selection of particular active microphones whereas being active means that samples from this microphone will be taken into account in signal analysis.

A single linear microphone array according to the present invention allows for determining an angle, at which sound sources are located with respect to this linear microphone array. In order to obtain a full information regarding location of sound sources in 3D space, it is necessary to apply at least three microphone arrays.

The microphone arrays must be spaced for example by 90 degrees wherein first ends of all microphone arrays (comprising the same arrangement of microphones) are in proximity or adjacent to a virtual center of a circle as shown in Fig. 3A. Fig. 3A shows a view in a single plane but the separate microphone arrays must be positioned in different planes in 3D space. Preferably, the other ends of microphone arrays linearly extend on X, Y and Z axis respectively (for example forming three edges of a cube as shown in Fig. 3B). Such a microphone system may be located in a corner of a room near the ceiling.

Having three detected angles (one per each microphone array), such a microphone system is able to determine location of each detected sound source by means of triangulation.

Depending on frequency of sounds to be detected it is necessary to apply appropriate sampling frequency, which is greater than the high frequency of the sampled band in case the frequency is specific or greater than the low frequency of the sampled band in case the band is and open range eg. above 15KHz. Hence for sounds up to 8kHz the sampling frequency is for example 48kHz, for sounds in band 8..15kHz the sampling frequency is for example 96kHz and for sounds above 15kHz the sampling frequency is for example 192kHz.

Typically, the analysis is executed in a full band, which requires a sampling frequency of 192kHz. For a band over 15kHz all samples will be analyzed, for a band in range of 8..15kHz every second sample will be analyzed while for a band below 8kHz every fourth sample will be analyzed (in order to decrease samples frequency). Similarly, all microphones will be sampled but only samples from the active microphones will be subject to sub-band analysis (in case of sampling frequency there is a different division into groups that in case of microphones spatial location).

Fig. 4 presents a diagram of the method according to the present invention wherein the system of Fig. 3B is applied. The method starts at step 401 from filtering sounds from each microphone with band-pass filters. Fig. 5 depicts an exemplary division of an acoustic band. All bands are analyzed simultaneously, therefore sampled. However, for a given sub-band analysis there are selected only microphones that are appropriate for the selected band. Samples from some microphones will be used only for a single sub-band while some other will be used for more sub-bands, for example the edge microphone, spaced by 6,25 mm from another, will be used in all sub-bands. The appropriate microphones are selected at step 402.

Selection of appropriate microphones depends on the selected sub-band that is associated with microphones spacing. In Fig. 2 there are five bands into which a sub-band falls thereby defining the active microphones. For example a sub-band 12 of 10-11 KHz falls into the band 8..15kHz therefore spacing of active microphones is assumed as 203D and these microphones are appropriate for this sub-band.

Next, at step 403, for the selected set of microphones and sub-band there are selected appropriate samples (eg. all in case of over 15kHz band, every second sample for a band of 8..15kHz and every fourth sample for a band below 8kHz).

Further, at step 404 there is selected angular sampling density based on the band frequency. In particular, for each sub-band there is executed detection of sound sources. A round angle is divided into kk sections (selection of an angle between samples). A division coefficient kk may have a value of 360 except for band 0..500Hz wherein the preferred value of kk=60 and 7..8kHz band wherein the preferred value of kk = 120.

For each value of angle α within a range of -90° to 90° there is, at step 405, calculated a delay, with which sound will arrive to each microphone from a given direction (it is assumed at this stage that a distance from a sound source is infinite; under such an assumption the delay of sound between microphones depends only on direction). A reference point is the leftmost microphone. For angle -90° the delay will have a value calculated by the following equation: dt=dxVs where dx is a distance between a given microphone and a reference microphone and Vs is a speed of sound. Next there is calculated a coefficient of delay dependent on sampling: dt2=dt*Fs where Fs is a sampling frequency and subsequently: dt3=dt2*sin90-α where α is an angle selected for evaluation.

In order for the values of delay to be above zero, for sounds from the left the leftmost microphone shall be taken as a reference microphone and for sounds from the right, the rightmost microphone shall be taken as a reference microphone.

Subsequently, at step 406, there is calculated a sample of sound for a given direction by adding sound of all 17 microphones taking delays into account (only 17 microphones are active regardless of the sampled frequency). For the reference microphone the delay equals 0 and for the remaining microphones it is derived from their distance to the reference microphone and the angle at which sound approaches. Mt=i=117Mi,t+dt3 where Mt is a sound sample, i is the number of the microphone (wherein the first microphone is the reference microphone (leftmost or rightmost) and the second is the next active microphone etc.) and t is the number of a sound sample for the reference microphone.

For i-th microphone there is taken into account, as the added value, a sample delayed by dt3 samples with respect to the reference microphone.

In this manner there is obtained, for each direction, a set of sound samples. It is equivalent of a directional microphone separately for each direction.

A further step is addition of absolute values of sound samples for all directions. The number of samples should be assumed as a result of experiments (verifying accuracy of detection), for example it may be N=200 samples. SM=i=1NabsMi

As a result there is obtained, at step 407, a vector of data comprising information on signal strength arriving from each of the tested directions. A sample graphical representation of the signal strength data is shown in Fig. 6. The plot 601 shows two local maxima 602, 603 that indicate two sound sources: one for an angle of about 45° and the other for an angle of about 135°.

The curve 601 may be filtered in order to easier find the local maxima. The bottom part of the plot may be discarded as noise. It may be assumed that data for further analysis shall be above 1/3 of the maximum value.

Subsequently, it is determined for which angles local maxima are present. These angles are the result of identification of directions from which sound source emit sound (this process determines direction for a single microphone array), step 408.

It needs to be taken into account that typical sound sources eg. humans emit sounds having different frequencies. Since the analysis is executed in sub-bands it may be confirmed that a given detection is accurate by comparing results in sub-bands 409. In case a sound source is present in a plurality of sub-bands where the angles are similar, a resulting angle may be calculated as a weighted average where the lowest sub-band has a lower weight than the other sub-bands. It is to be noted that typically not all sub-bands will be taken into account as a sound source typically does not emit sound in all of them.

The method according to the present invention allows for detection of an angle at which a sound source is present with respect to a microphone array. Since the quality of detection depends on microphones placement and sound frequencies, it is necessary to apply division into sub-bands and selection of microphones for each sub-band. Three such microphone arrays allow to detect a sound source in a three-dimensional space with respect to the microphone system.

After obtaining three angles: alpha, beta and gamma a final sound source location may be determined. For the purpose of this example, there is assumed an arrangement in proximity to the ceiling (Fig. 8 shows an installation of the system in a room) of a room where one microphone array is parallel to the floor in X axis and the angle it detects is alpha. Similarly, the second microphone array is also parallel to the floor but in the Y axis and the angle it detects is beta. Lastly the third microphone array is perpendicular to the floor and hence the other microphone arrays and the angle it detects is gamma. Thereby the system operates on Cartesian coordinate system. The alpha, beta and gamma angles point a location in 3D space from which sound arrives. The zero point is the location of the microphone arrays system. Each of the alpha, beta, gamma angles denotes a plane in a 3D space whereas the planes intersect at a point wherein the sound source is located.

Fig. 7 presents a diagram of the system according to the present invention. The system comprises the microphone array arrangement 702 shown in Fig. 3 and an appropriate sampling module 703 managed by a controller 705.

The system may be realized using dedicated components or custom made FPGA or ASIC circuits. The system comprises a data bus 701 communicatively coupled to a memory 704. Additionally, other components of the system are communicatively coupled to the system bus 701 so that they may be managed by the controller 705.

The memory 704 may store computer program or programs executed by the controller 705 in order to execute steps of the method according to the present invention.

Therefore, the controller 705 is configured to executed step of the method described with reference to Fig. 4.

The present invention results in a useful determination of sound location that may for example be used in surveillance systems. Such results are concrete and tangible thus not abstract. Therefore, the invention provides a useful, concrete and tangible result.

According to the present invention data acquired by different microphones are processed within a dedicated machine. Hence, the machine or transformation test is fulfilled and that the invention is not abstract.

It can be easily recognized, by one skilled in the art, that the aforementioned method for detecting location of sound sources in a three-dimensional space may be performed and/or controlled by one or more computer programs. Such computer programs are typically executed by utilizing the computing resources in a computing device. Applications are stored on a non-transitory medium. An example of a non-transitory medium is a non-volatile memory, for example a flash memory or volatile memory, for example RAM. The computer instructions are executed by a processor. These memories are exemplary recording media for storing computer programs comprising computer-executable instructions performing all the steps of the computer-implemented method according the technical concept presented herein.

While the invention presented herein has been depicted, described, and has been defined with reference to particular preferred embodiments, such references and examples of implementation in the foregoing specification do not imply any limitation on the invention. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader scope of the technical concept. The presented preferred embodiments are exemplary only, and are not exhaustive of the scope of the technical concept presented herein.

Accordingly, the scope of protection is not limited to the preferred embodiments described in the specification, but is only limited by the claims that follow.