3D Multiple Sound Source Localization by Proposed T-Shaped Circular Distributed Microphone Arrays in Combination with GEVD and Adaptive GCC-PHAT/ML Algorithms
Abstract
:1. Introduction
2. Distributed Microphone Array
2.1. Microphone Signal Model in SSL Applications
2.2. The Proposed T-Shaped Circular Distributed Microphone Array for SSL
3. The Proposed SSL Algorithm in Combination with Distributed Microphone Array
4. Results and Discussions
4.1. Data Recording and Simulation Conditions
4.2. The Evaluation’s Scenarios
4.3. The Results on Simulated and Real Data
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ADMM | Alternative direction method of multipliers |
AHB | Acoustical holography beamforming |
AOA | Angle of arrival |
BNP | Bayesian nonparametric |
CC | Cross-correlation |
CMA | Circular microphone array |
DMA | Distributed microphone array |
DNN | Deep neural networks |
DOA | Direction of arrival |
F-CRNN | Full-band recurrent neural networks |
FIR | Finite impulse response |
GCC | Generalized cross-correlation |
GCC-PHAT | Generalized cross-correlation-phase transform |
GCC-PHAT/ML | Generalized cross-correlation-phase transform/maximum likelihood |
GEVD | Generalized eigenvalue decomposition |
IFT | Inverse Fourier transform |
IGMM | Infinite Gaussian mixture model |
LMS | Least mean square |
LTI | Linear time-invariant |
MAEE | Mean absolute estimation error |
ML | Maximum likelihood |
MSE | Mean square error |
MUSIC | Multiple signal classification |
Power density function | |
PHAT | Phase transform |
RIR | Room impulse response |
RT60 | Reverberation time |
SD | Standard deviation |
SF-MCA | Sound field morphological component analysis |
SH | Spherical harmonic |
SHC | Spherical harmonic domain |
SH-TMSBL | Temporal extension of multiple response model of sparse Bayesian learning with spherical harmonic |
SMIPL | Speech, music, and image processing laboratory |
SNR | Signal-to-noise ratio |
SRP | Steered response power |
SRPD | Steered response power density |
SRP-PHAT | Steered response power-phase transform |
SSL | Sound source localization |
TCDMA-AGGPM | T-shaped circular distributed microphone array-adaptive generalized eigenvalue decomposition, generalized cross-correlation-phase transform/maximum likelihood |
TDOA | Time difference of arrival |
TF | Time-frequency |
TIMIT | Texas Instruments and Massachusetts Institute of Technology |
UTEM | Universidad Tecnológica Metropolitana |
VAD | Voice activity detection |
W-DO | Windowed-disjoint orthogonality |
WGN | White gaussian noise |
WM | Mixture weight |
References
- Lee, R.; Kang, M.S.; Kim, B.H.; Park, K.H.; Lee, S.Q.; Park, H.M. Sound Source Localization Based on GCC-PHAT With Diffuseness Mask in Noisy and Reverberant Environments. IEEE Access 2020, 8, 7373–7382. [Google Scholar] [CrossRef]
- Knapp, C.; Carter, G. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 1976, 24, 320–327. [Google Scholar] [CrossRef] [Green Version]
- Yao, K.; Chen, J.C.; Hudson, R.E. Maximum-likelihood acoustic source localization: Experimental results. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, Orlando, FL, USA, 13–17 May 2002; pp. 2949–2952. [Google Scholar] [CrossRef]
- Brandstein, M.; Ward, D. Microphone Arrays: Signal Processing Techniques and Applications; Springer: Berlin, Germany; New York, NY, USA, 2013. [Google Scholar]
- Hafezi, S.; Moore, A.H.; Naylor, P.A. Augmented Intensity Vectors for Direction of Arrival Estimation in the Spherical Harmonic Domain. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1956–1968. [Google Scholar] [CrossRef]
- Yilmaz, O.; Rickard, S. Blind Separation of Speech Mixtures via Time-Frequency Masking. IEEE Trans. Signal Process. 2004, 52, 1830–1847. [Google Scholar] [CrossRef]
- Li, X.; Girin, L.; Horaud, R.; Gannot, S. Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization. IEEE/ACM Trans. Audio Speech Lang. Proces. 2016, 24, 2171–2186. [Google Scholar] [CrossRef] [Green Version]
- Hu, Y.; Samarasinghe, P.N.; Abhayapala, T.D.; Gannot, S. Unsupervised Multiple Source Localization Using Relative Harmonic Coefficients. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 571–575. [Google Scholar] [CrossRef]
- Nadiri, O.; Rafaely, B. Localization of Multiple Speakers under High Reverberation using a Spherical Microphone Array and the Direct-Path Dominance Test. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 1494–1505. [Google Scholar] [CrossRef]
- Hu, Y.; Samarasinghe, P.N.; Abhayapala, T.D. Sound Source Localization Using Relative Harmonic Coefficients in Modal Domain. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 20–23 October 2019; pp. 348–352. [Google Scholar] [CrossRef]
- Benesty, J. Adaptive eigenvalue decomposition algorithm for passive acoustic source localization. J. Acoust. Soc. Am. 2000, 107, 384–391. [Google Scholar] [CrossRef] [Green Version]
- Sun, H.; Teutsch, H.; Mabande, E.; Kellermann, W. Robust localization of multiple sources in reverberant environments using EB-ESPRIT with spherical microphone arrays. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 117–120. [Google Scholar] [CrossRef]
- Vallet, P.; Mestre, X.; Loubaton, P. Performance Analysis of an Improved MUSIC DoA Estimator. IEEE Trans. Signal Process. 2015, 63, 6407–6422. [Google Scholar] [CrossRef] [Green Version]
- Liaquat, M.U.; Munawar, H.S.; Rahman, A.; Qadir, Z.; Kouzani, A.Z.; Mahmud, M.A.P. Sound Localization for Ad-Hoc Microphone Arrays. Energies 2021, 14, 3446. [Google Scholar] [CrossRef]
- Jo, B.; Choi, J.W. Direction of arrival estimation using nonsingular spherical ESPRIT. J. Acoust. Soc. Am. 2018, 143, EL181–EL187. [Google Scholar] [CrossRef]
- Birnie, L.I.; Abhayapala, T.D.; Samarasinghe, P.N. Reflection Assisted Sound Source Localization Through a Harmonic Domain MUSIC Framework. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 279–293. [Google Scholar] [CrossRef]
- Williams, E.G. Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography; Academic Press: San Francisco, CA, USA, 1999. [Google Scholar]
- Stefanakis, N.; Pavlidi, D.; Mouchtaris, A. Perpendicular Cross-Spectra Fusion for Sound Source Localization with a Planar Microphone Array. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1821–1835. [Google Scholar] [CrossRef]
- Coteli, M.B.; Olgun, O.; Hacihabiboglu, H. Multiple Sound Source Localization with Steered Response Power Density and Hierarchical Grid Refinement. IEEE/ACM Trans. Audio Speech Lang. Process. 2018, 26, 2215–2229. [Google Scholar] [CrossRef] [Green Version]
- Ma, N.; Gonzalez, J.A.; Brown, G.J. Robust Binaural Localization of a Target Sound Source by Combining Spectral Source Models and Deep Neural Networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2018, 26, 2122–2131. [Google Scholar] [CrossRef] [Green Version]
- Dai, W.; Chen, H. Multiple Speech Sources Localization in Room Reverberant Environment Using Spherical Harmonic Sparse Bayesian Learning. IEEE Sens. Lett. 2019, 3, 7000304. [Google Scholar] [CrossRef]
- Yang, B.; Liu, H.; Pang, C.; Li, X. Multiple Sound Source Counting and Localization Based on TF-Wise Spatial Spectrum Clustering. IEEE/ACM Trans. Audio Speech Lang. Process. 2019, 27, 1241–1255. [Google Scholar] [CrossRef]
- Kraljevic, L.; Russo, M.; Stella, M.; Sikora, M. Free-Field TDOA-AOA Sound Source Localization Using Three Soundfield Microphones. IEEE Access 2020, 8, 87749–87761. [Google Scholar] [CrossRef]
- Chu, N.; Ning, Y.; Yu, L.; Liu, Q.; Huang, Q.; Wu, D.; Hou, P. Acoustic Source Localization in a Reverberant Environment Based on Sound Field Morphological Component Analysis and Alternating Direction Method of Multipliers. IEEE Trans. Instrum. Meas. 2021, 70, 6503413. [Google Scholar] [CrossRef]
- SongGong, K.; Chen, H.; Wang, W. Indoor Multi-Speaker Localization Based on Bayesian Nonparametrics in the Circular Harmonic Domain. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 1864–1880. [Google Scholar] [CrossRef]
- Hu, Y.; Abhayapala, T.D.; Samarasinghe, P.N. Multiple Source Direction of Arrival Estimations Using Relative Sound Pressure Based MUSIC. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 253–264. [Google Scholar] [CrossRef]
- Stoter, F.R.; Chakrabarty, S.; Edler, B.; Habets, E.A.P. CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning. IEEE/ACM Trans. Audio Speech Lang. Process. 2019, 27, 268–282. [Google Scholar] [CrossRef] [Green Version]
- Dehghan Firoozabadi, A.; Abutalebi, H.R. SRP-ML: A Robust SRP-based speech source localization method for Noisy environments. In Proceedings of the 18th Iranian Conference on Electrical Engineering (ICEE), Isfahan, Iran, 11–13 May 2010; pp. 2950–2955. [Google Scholar]
- Dehghan Firoozabadi, A.; Irarrazaval, P.; Adasme, P.; Zabala-Blanco, D.; Palacios-Játiva, P.; Durney, H.; Sanhueza, M.; Azurdia-Meza, C. Three-dimensional sound source localization by distributed microphone arrays. In Proceedings of the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 23–27 August 2021; pp. 196–200. [Google Scholar] [CrossRef]
- Doclo, S.; Moonen, M. Robust Adaptive Time Delay Estimation for Speaker Localization in Noisy and Reverberant Acoustic Environments. EURASIP J. Adv. Signal Process. 2003, 2003, 495250. [Google Scholar] [CrossRef] [Green Version]
- Garofolo, J.S.; Lamel, L.F.; Fisher, W.M.; Fiscus, J.G.; Pallett, D.S.; Dahlgren, N.L.; Zue, V. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1; Web Download; Linguistic Data Consortium: Philadelphia, PA, USA, 1993; Available online: https://catalog.ldc.upenn.edu/LDC93S1 (accessed on 15 August 2021).
- Cetin, O.; Shriberg, E. Analysis of overlaps in meetings by dialog factors, hot spots, speakers, and collection site: Insights for automatic speech recognition. In Proceedings of the Interspeech, Pittsburg, PA, USA, 17–21 September 2006; pp. 293–296. [Google Scholar]
- Allen, J.B.; Berkley, D.A. Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 1979, 65, 943–950. [Google Scholar] [CrossRef]
- Momenzadeh, H. Speaker Localization Using Microphone Arrays. Master’s Thesis, Yazd University, Yazd, Iran, 2007. [Google Scholar]
- Jia, M.; Wu, Y.; Bao, C.; Wang, J. Multiple Sound Sources Localization with Frame-by-Frame Component Removal of Statistically Dominant Source. Sensors 2018, 18, 3613. [Google Scholar] [CrossRef] [Green Version]
Positions | X (cm) | Y (cm) | Z (cm) |
---|---|---|---|
Microphone m1 | 280 | 213.2 | 112 |
Microphone m2 | 277.9 | 212.1 | 112 |
Microphone m3 | 276.8 | 210 | 112 |
Microphone m4 | 277.9 | 207.9 | 112 |
Microphone m5 | 280 | 206.8 | 112 |
Microphone m6 | 282.1 | 207.9 | 112 |
Microphone m7 | 283.2 | 210 | 112 |
Microphone m8 | 282.1 | 212.1 | 112 |
Speaker 1 | 115 | 327 | 183 |
Speaker 2 | 136 | 84 | 165 |
Speaker 3 | 461 | 245 | 174 |
Room dimensions | 560 | 420 | 315 |
MAEE (cm) | HiGRID [19] | SH-TMSBL [21] | SF-MCA [24] | TF-MW-BNP-AHB [25] | Proposed TCDMA-AGGPM | |||||
---|---|---|---|---|---|---|---|---|---|---|
Simulated Data | ||||||||||
Speaker | S1 | S2 | S1 | S2 | S1 | S2 | S1 | S2 | S1 | S2 |
Scenario 1 (Reverberant) | 57 | 52 | 45 | 51 | 48 | 43 | 36 | 38 | 32 | 35 |
Scenario 2 (Noisy) | 45 | 41 | 36 | 40 | 39 | 37 | 31 | 34 | 25 | 28 |
Scenario 3 (Noisy-Reverberant) | 74 | 68 | 61 | 67 | 64 | 59 | 47 | 52 | 42 | 45 |
Real Data | ||||||||||
Speaker | S1 | S2 | S1 | S2 | S1 | S2 | S1 | S2 | S1 | S2 |
Scenario 1 (Reverberant) | 61 | 56 | 49 | 55 | 50 | 47 | 39 | 41 | 34 | 37 |
Scenario 2 (Noisy) | 47 | 44 | 39 | 43 | 40 | 41 | 32 | 36 | 30 | 33 |
Scenario 3 (Noisy-Reverberant) | 77 | 73 | 68 | 71 | 68 | 65 | 55 | 58 | 44 | 47 |
MAEE (cm) | HiGRID [19] | SH-TMSBL [21] | SF-MCA [24] | TF-MW-BNP-AHB [25] | Proposed TCDMA-AGGPM | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Simulated Data | |||||||||||||||
Speaker | S1 | S2 | S3 | S1 | S2 | S3 | S1 | S2 | S3 | S1 | S2 | S3 | S1 | S2 | S3 |
Scenario 1 (Reverberant) | 48 | 53 | 51 | 44 | 47 | 48 | 41 | 45 | 43 | 33 | 34 | 37 | 27 | 30 | 31 |
Scenario 2 (Noisy) | 46 | 49 | 47 | 41 | 45 | 46 | 39 | 43 | 42 | 32 | 33 | 35 | 26 | 28 | 28 |
Scenario 3 (Noisy-Reverberant) | 71 | 74 | 77 | 68 | 72 | 70 | 62 | 69 | 65 | 51 | 55 | 54 | 41 | 45 | 46 |
Real Data | |||||||||||||||
Speaker | S1 | S2 | S3 | S1 | S2 | S3 | S1 | S2 | S3 | S1 | S2 | S3 | S1 | S2 | S3 |
Scenario 1 (Reverberant) | 52 | 57 | 55 | 45 | 48 | 50 | 43 | 46 | 44 | 35 | 37 | 38 | 31 | 33 | 34 |
Scenario 2 (Noisy) | 49 | 53 | 51 | 44 | 46 | 49 | 41 | 45 | 40 | 37 | 40 | 43 | 30 | 32 | 31 |
Scenario 3 (Noisy-Reverberant) | 75 | 79 | 78 | 71 | 74 | 73 | 68 | 72 | 70 | 53 | 57 | 59 | 45 | 47 | 48 |
Run-Time (s) | HiGRID [19] | SH-TMSBL [21] | SF-MCA [24] | TF-MW-BNP-AHB [25] | Proposed TCDMA-AGGPM |
---|---|---|---|---|---|
2 Simultaneous Speakers | |||||
Scenario 1 (Reverberant) | 627 | 530 | 384 | 443 | 245 |
Scenario 2 (Noisy) | 584 | 508 | 352 | 419 | 213 |
Scenario 3 (Noisy-Reverberant) | 665 | 567 | 401 | 468 | 259 |
3 Simultaneous Speakers | |||||
Scenario 1 (Reverberant) | 651 | 559 | 399 | 465 | 262 |
Scenario 2 (Noisy) | 632 | 526 | 374 | 457 | 248 |
Scenario 3 (Noisy-Reverberant) | 683 | 592 | 422 | 476 | 271 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dehghan Firoozabadi, A.; Irarrazaval, P.; Adasme, P.; Zabala-Blanco, D.; Játiva, P.P.; Azurdia-Meza, C. 3D Multiple Sound Source Localization by Proposed T-Shaped Circular Distributed Microphone Arrays in Combination with GEVD and Adaptive GCC-PHAT/ML Algorithms. Sensors 2022, 22, 1011. https://doi.org/10.3390/s22031011
Dehghan Firoozabadi A, Irarrazaval P, Adasme P, Zabala-Blanco D, Játiva PP, Azurdia-Meza C. 3D Multiple Sound Source Localization by Proposed T-Shaped Circular Distributed Microphone Arrays in Combination with GEVD and Adaptive GCC-PHAT/ML Algorithms. Sensors. 2022; 22(3):1011. https://doi.org/10.3390/s22031011
Chicago/Turabian StyleDehghan Firoozabadi, Ali, Pablo Irarrazaval, Pablo Adasme, David Zabala-Blanco, Pablo Palacios Játiva, and Cesar Azurdia-Meza. 2022. "3D Multiple Sound Source Localization by Proposed T-Shaped Circular Distributed Microphone Arrays in Combination with GEVD and Adaptive GCC-PHAT/ML Algorithms" Sensors 22, no. 3: 1011. https://doi.org/10.3390/s22031011