Concurrent speakers localization using blind source separation and microphone array geometry

Khan, Muhammad Umair; Habib, Tania

doi:10.1007/s11045-021-00776-x

Concurrent speakers localization using blind source separation and microphone array geometry

Published: 09 May 2021

Volume 32, pages 1159–1184, (2021)
Cite this article

Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

369 Accesses
2 Citations
Explore all metrics

Abstract

Speaker localization has been an active topic of research due to its wide range of applications in multimedia and communication technologies. While traditional blind source separation algorithms are robust in reverberant environments, they are generally unable to localize more than two concurrent speakers. In this paper, a novel method for localization of concurrent speakers using blind source separation by exploiting microphone array geometry is presented. In this work, we used the TRINICON BSS (Buchner et al., in: 2004 IEEE international conference on acoustics, speech, and signal processing, IEEE, 2004) algorithm as the baseline for determining the raw direction of arrival estimates, the results have shown that the proposed algorithm is capable of localizing up to three concurrent speakers successfully by exploiting the redundancy in the microphone array. The algorithm is evaluated in real-world environments with background noise and reverberations such as computer labs and meeting rooms. The localization results were compared with the well-known Steered-Response Power Phase Transform (SRP-PHAT) algorithm using the root mean square error as an evaluation metric. The results for the two speakers and three concurrent speaker scenarios show that the proposed algorithm is more stable and robust as compared to the SRP-PHAT. Moreover, the proposed algorithm also shows the potential to track multiple simultaneous moving speakers, hence it can be used as a front-end by a speaker tracking algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A recursive expectation-maximization algorithm for speaker tracking and separation

Article Open access 04 December 2021

A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers

Article 19 May 2015

NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain

Article Open access 03 March 2021

Notes

References

Brendel, A., Gannot, S., & Kellermann, W. (2018). Localization of multiple simultaneously active speakers in an acoustic sensor network. In 2018 IEEE 10th sensor array and multichannel signal processing workshop (SAM) (pp. 450–454). IEEE.
Brendel, A., & Kellermann, W. (2017). Localization of multiple simultaneously active sources in acoustic sensor networks using ADP. In 2017 IEEE 7th international workshop on computational advances in multi-sensor adaptive processing (CAMSAP) (pp. 1–5). IEEE.
Buchner, H., Aichner, R., & Kellermann, W. (2004). Trinicon: A versatile framework for multichannel blind signal processing. In 2004 IEEE international conference on acoustics, speech, and signal processing (Vol. 3, pp. 889–892). IEEE.
DiBiase, J., Silverman, H., & Brandstein, M. (2001). Microphone arrays: Signal processing techniques and applications. In Robust localization in reverberant rooms (pp. 157–180). Springer.
Ester, M., Kriegel, HP., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. 1996 Int. Conf. knowledge discovery and data mining (KDD’96) (pp 226–231).
Evers, C., Dorfan, Y., Gannot, S., & Naylor, P. A. (2017). Source tracking using moving microphone arrays for robot audition. 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6145–6149). IEEE.
Firdaus, S., & Uddin, M. A. (2015). A survey on clustering algorithms and complexity analysis. International Journal of Computer Science Issues, 12(2), 62.
Google Scholar
Jian, M., Kot, AC., & Er, M. (1998). Doa estimation of speech source with microphone arrays. In Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (ISCAS’98) (Cat. No. 98CH36187) (Vol. 5, pp. 293–296). IEEE.
Kim, U. H., Nakadai, K., & Okuno, H. G. (2013). Improved sound source localization and front-back disambiguation for humanoid robots with two ears. In International conference on industrial, engineering and other applications of applied intelligent systems (pp. 282–291). Springer.
Kondo, K., Mizuno, Y., Nishino, T., & Takeda, K. (2012). Practically efficient blind speech separation using frequency band selection based on magnitude squared coherence and a small dodecahedral microphone array. Journal of Electrical and Computer Engineering, 2012, 1–11.
Article MathSciNet Google Scholar
Lombard, A., Zheng, Y., Buchner, H., & Kellermann, W. (2010). TDOA estimation for multiple sound sources in noisy and reverberant environments using broadband independent component analysis. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1490–1503.
Article Google Scholar
Lu, Y. C., & Cooke, M. (2011). Motion strategies for binaural localisation of speech sources in azimuth and distance by artificial listeners. Speech Communication, 53(5), 622–642.
Article Google Scholar
Makino, S., Lee, T. W., & Sawada, H. (2007). Blind speech separation. Springer.
Mandel, M. I., & Barker, J. (2016). Multichannel spatial clustering for robust far-field automatic speech recognition in mismatched conditions. In INTERSPEECH, ISCA (pp. 1991–1995)
Marković, I., & Petrović, I. (2010). Speaker localization and tracking with a microphone array on a mobile robot using von Mises distribution and particle filtering. Robotics and Autonomous Systems, 58(11), 1185–1196.
Article Google Scholar
McDonough Jr, J. W., Leutnant, V. S., Krishna, S. V. S. S. R., & Matsoukas, S., et al. (2017). Determining speaker direction using a spherical microphone array. US Patent 9,560,441
Nadiri, O., & Rafaely, B. (2014). Localization of multiple speakers under high reverberation using a spherical microphone array and the direct-path dominance test. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10), 1494–1505.
Article Google Scholar
Nogueira, L. C., & Petraglia, M. R. (2015). Robust localization of multiple sound sources based on BSS algorithms. In 2015 IEEE 24th international symposium on industrial electronics (ISIE) (pp. 579–583). IEEE.
Rickard, S. (2006). Sparse sources are separated sources. In 2006 14th European signal processing conference (pp. 1–5). IEEE.
Schwartz, O., Dorfan, Y., Habets, E. A., & Gannot, S. (2016). Multi-speaker DOA estimation in reverberation conditions using expectation-maximization. In 2016 IEEE international workshop on acoustic signal enhancement (IWAENC) (pp. 1–5). IEEE.
Schwartz, O., Dorfan, Y., Taseska, M., Habets, E. A., & Gannot, S. (2017). DOA estimation in noisy environment with unknown noise power using the EM algorithm. In 2017 Hands-free speech communications and microphone arrays (HSCMA) (pp 86–90). IEEE.
Schwartz, O., & Gannot, S. (2013). Speaker tracking using recursive EM algorithms. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2), 392–402.
Article Google Scholar
Strobel, N., Spors, S., & Rabenstein, R. (2001). Joint audio-video object localization and tracking. IEEE Signal Processing Magazine, 18(1), 22–31.
Article Google Scholar
Wang, L., Reiss, J. D., & Cavallaro, A. (2016). Over-determined source separation and localization using distributed microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9), 1573–1588.
Article Google Scholar
Zohourian, M., & Martin, R. (2016). Binaural speaker localization and separation based on a joint ITD/ILD model and head movement tracking. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 430–434). IEEE.

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, UET Lahore, Lahore, 54890, Pakistan
Muhammad Umair Khan & Tania Habib

Authors

Muhammad Umair Khan
View author publications
You can also search for this author in PubMed Google Scholar
Tania Habib
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Umair Khan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, M.U., Habib, T. Concurrent speakers localization using blind source separation and microphone array geometry. Multidim Syst Sign Process 32, 1159–1184 (2021). https://doi.org/10.1007/s11045-021-00776-x

Download citation

Received: 09 July 2020
Revised: 06 March 2021
Accepted: 22 March 2021
Published: 09 May 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11045-021-00776-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Concurrent speakers localization using blind source separation and microphone array geometry

Abstract

Access this article

Similar content being viewed by others

A recursive expectation-maximization algorithm for speaker tracking and separation

A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers

NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Concurrent speakers localization using blind source separation and microphone array geometry

Abstract

Access this article

Similar content being viewed by others

A recursive expectation-maximization algorithm for speaker tracking and separation

A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers

NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation