Skip to main content
Log in

Concurrent speakers localization using blind source separation and microphone array geometry

  • Published:
Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Abstract

Speaker localization has been an active topic of research due to its wide range of applications in multimedia and communication technologies. While traditional blind source separation algorithms are robust in reverberant environments, they are generally unable to localize more than two concurrent speakers. In this paper, a novel method for localization of concurrent speakers using blind source separation by exploiting microphone array geometry is presented. In this work, we used the TRINICON BSS (Buchner et al., in: 2004 IEEE international conference on acoustics, speech, and signal processing, IEEE, 2004) algorithm as the baseline for determining the raw direction of arrival estimates, the results have shown that the proposed algorithm is capable of localizing up to three concurrent speakers successfully by exploiting the redundancy in the microphone array. The algorithm is evaluated in real-world environments with background noise and reverberations such as computer labs and meeting rooms. The localization results were compared with the well-known Steered-Response Power Phase Transform (SRP-PHAT) algorithm using the root mean square error as an evaluation metric. The results for the two speakers and three concurrent speaker scenarios show that the proposed algorithm is more stable and robust as compared to the SRP-PHAT. Moreover, the proposed algorithm also shows the potential to track multiple simultaneous moving speakers, hence it can be used as a front-end by a speaker tracking algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://github.com/UmairKhanUET/bss-concurrentspeaker-localization/tree/master/corpus.

  2. http://wiki.seeedstudio.com/ReSpeaker_Core_v2.0/.

References

  • Brendel, A., Gannot, S., & Kellermann, W. (2018). Localization of multiple simultaneously active speakers in an acoustic sensor network. In 2018 IEEE 10th sensor array and multichannel signal processing workshop (SAM) (pp. 450–454). IEEE.

  • Brendel, A., & Kellermann, W. (2017). Localization of multiple simultaneously active sources in acoustic sensor networks using ADP. In 2017 IEEE 7th international workshop on computational advances in multi-sensor adaptive processing (CAMSAP) (pp. 1–5). IEEE.

  • Buchner, H., Aichner, R., & Kellermann, W. (2004). Trinicon: A versatile framework for multichannel blind signal processing. In 2004 IEEE international conference on acoustics, speech, and signal processing (Vol. 3, pp. 889–892). IEEE.

  • DiBiase, J., Silverman, H., & Brandstein, M. (2001). Microphone arrays: Signal processing techniques and applications. In Robust localization in reverberant rooms (pp. 157–180). Springer.

  • Ester, M., Kriegel, HP., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. 1996 Int. Conf. knowledge discovery and data mining (KDD’96) (pp 226–231).

  • Evers, C., Dorfan, Y., Gannot, S., & Naylor, P. A. (2017). Source tracking using moving microphone arrays for robot audition. 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6145–6149). IEEE.

  • Firdaus, S., & Uddin, M. A. (2015). A survey on clustering algorithms and complexity analysis. International Journal of Computer Science Issues, 12(2), 62.

    Google Scholar 

  • Jian, M., Kot, AC., & Er, M. (1998). Doa estimation of speech source with microphone arrays. In Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (ISCAS’98) (Cat. No. 98CH36187) (Vol. 5, pp. 293–296). IEEE.

  • Kim, U. H., Nakadai, K., & Okuno, H. G. (2013). Improved sound source localization and front-back disambiguation for humanoid robots with two ears. In International conference on industrial, engineering and other applications of applied intelligent systems (pp. 282–291). Springer.

  • Kondo, K., Mizuno, Y., Nishino, T., & Takeda, K. (2012). Practically efficient blind speech separation using frequency band selection based on magnitude squared coherence and a small dodecahedral microphone array. Journal of Electrical and Computer Engineering, 2012, 1–11.

    Article  MathSciNet  Google Scholar 

  • Lombard, A., Zheng, Y., Buchner, H., & Kellermann, W. (2010). TDOA estimation for multiple sound sources in noisy and reverberant environments using broadband independent component analysis. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1490–1503.

    Article  Google Scholar 

  • Lu, Y. C., & Cooke, M. (2011). Motion strategies for binaural localisation of speech sources in azimuth and distance by artificial listeners. Speech Communication, 53(5), 622–642.

    Article  Google Scholar 

  • Makino, S., Lee, T. W., & Sawada, H. (2007). Blind speech separation. Springer.

  • Mandel, M. I., & Barker, J. (2016). Multichannel spatial clustering for robust far-field automatic speech recognition in mismatched conditions. In INTERSPEECH, ISCA (pp. 1991–1995)

  • Marković, I., & Petrović, I. (2010). Speaker localization and tracking with a microphone array on a mobile robot using von Mises distribution and particle filtering. Robotics and Autonomous Systems, 58(11), 1185–1196.

    Article  Google Scholar 

  • McDonough Jr, J. W., Leutnant, V. S., Krishna, S. V. S. S. R., & Matsoukas, S., et al. (2017). Determining speaker direction using a spherical microphone array. US Patent 9,560,441

  • Nadiri, O., & Rafaely, B. (2014). Localization of multiple speakers under high reverberation using a spherical microphone array and the direct-path dominance test. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10), 1494–1505.

    Article  Google Scholar 

  • Nogueira, L. C., & Petraglia, M. R. (2015). Robust localization of multiple sound sources based on BSS algorithms. In 2015 IEEE 24th international symposium on industrial electronics (ISIE) (pp. 579–583). IEEE.

  • Rickard, S. (2006). Sparse sources are separated sources. In 2006 14th European signal processing conference (pp. 1–5). IEEE.

  • Schwartz, O., Dorfan, Y., Habets, E. A., & Gannot, S. (2016). Multi-speaker DOA estimation in reverberation conditions using expectation-maximization. In 2016 IEEE international workshop on acoustic signal enhancement (IWAENC) (pp. 1–5). IEEE.

  • Schwartz, O., Dorfan, Y., Taseska, M., Habets, E. A., & Gannot, S. (2017). DOA estimation in noisy environment with unknown noise power using the EM algorithm. In 2017 Hands-free speech communications and microphone arrays (HSCMA) (pp 86–90). IEEE.

  • Schwartz, O., & Gannot, S. (2013). Speaker tracking using recursive EM algorithms. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2), 392–402.

    Article  Google Scholar 

  • Strobel, N., Spors, S., & Rabenstein, R. (2001). Joint audio-video object localization and tracking. IEEE Signal Processing Magazine, 18(1), 22–31.

    Article  Google Scholar 

  • Wang, L., Reiss, J. D., & Cavallaro, A. (2016). Over-determined source separation and localization using distributed microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9), 1573–1588.

    Article  Google Scholar 

  • Zohourian, M., & Martin, R. (2016). Binaural speaker localization and separation based on a joint ITD/ILD model and head movement tracking. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 430–434). IEEE.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Umair Khan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, M.U., Habib, T. Concurrent speakers localization using blind source separation and microphone array geometry. Multidim Syst Sign Process 32, 1159–1184 (2021). https://doi.org/10.1007/s11045-021-00776-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11045-021-00776-x

Keywords

Navigation