Skip to main content
Log in

A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper addresses the topic of simultaneous speaker localization. The work is related to the generalized cross-correlation (GCC)-based methods for estimating the direction of multiple speakers. Considering the defects of GCC-based direction of arrival (DOA) estimation methods, we have applied several modifications to improve our previous subband processing-based system for the localization of simultaneous speakers. Three modifications have been presented in this paper. In the first step, the DOA estimation method is equipped with a front-end block that determines the number of speakers based on K-means clustering and silhouette criterion. This block provides the true number of speakers for the DOA estimator. Secondly, in order to eliminate the spatial aliasing, we propose a novel nested circular microphone array. In the proposed array design, each microphone pair is only used in appropriate subband according to its inter-microphone distance. In the third step, to overcome the weakness of GCC-phase transform (GCC-PHAT) in noisy and noisy-reverberant conditions, we propose a SNR estimation block. So, we can separate noisy and reverberant conditions and use PHAT filter for reverberant conditions and maximum likelihood filter for noisy situations. The proposed method has been evaluated on both simulated and real multi-speaker speech data in various environmental conditions and different number of speakers. Our evaluations in terms of DOA accuracy demonstrate the superiority of the proposed method compared to the fullband and baseline subband methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. J. Allen, D. Berkley, Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–950 (1979)

    Article  Google Scholar 

  2. T. Arai, Estimating number of speakers by the modulation characteristics of speech, in Proceedings of the ICASSP (2003), pp. 197–200

  3. A. Brutti, M. Omologo, P. Svaizer, Localization of multiple speakers based on a two step acoustic map analysis, in Proceedings of the ICASSP (2008), pp. 4349–4352

  4. H. Buchner, R. Aichner, W. Kellermann, Relation between blind system identification and convolutive blind source separation, in Proceedings of the Joint Workshop on Hands-Free Communication and Microphone Array (2005), d-3-d-4

  5. H. Buchner, R. Aichner, J. Stenglein, H. Teutsch, W. Kellermann, Simultaneous localization of multiple source using blind adaptive MIMO filtering, in Proceeding of the ICASSP (2005), pp. 97–100

  6. W. Cai, X. Zhao, Zh. Wu, Localization of multiple speech sources based on sub-band steered response power, in Proceeding of the International Conference on Electrical and Computer Engineering (ICECE) (2010), pp. 1246–1249

  7. O. Cetin, E. Shriberg, Analysis of overlaps in meetings by dialog factors, hot spots, speakers, and collection site: insights for automatic speech recognition, in Proceeding of the ICSLP (2006), pp. 293–296

  8. E.D. Claudio, R. Parisi, G. Orlandi, Multi-source localization in reverberant environments by ROOT-MUSIC and clustering, in Proceeding of the ICASSP (2000), pp. 921–924

  9. A. Dehghan, H.R. Abutalebi, Subband processing-based approach for the localisation of two simultaneous speakers. IET Signal Process. 8(9), 996–1008 (2014)

    Article  Google Scholar 

  10. A. Dehghan, H.R. Abutalebi, Combination of nested microphone array and Subband processing for multiple simultaneous speaker localization, in Proceeding of the 6th International Symposium on Telecommunications (IST) (2012), pp. 907–912

  11. A. Dehghan, H.R. Abutalebi, SRP-ML: A robust SRP-based speech source localization method for noisy environments, in Proceeding of the 18th Iranian Conference on Electrical Engineering (ICEE) (2010), pp. 2950–2955

  12. C. Faller, J. Merimaa, Source localization in complex listening situations: selection of binaural cues based on interaural coherence. J Acoust. Soc. Am. 116(5), 3075–3089 (2004)

    Article  Google Scholar 

  13. M.F. Fallon, S.J. Godsill, Acoustic source localization and tracking of a time-varying number of speakers. IEEE Trans. Audio Speech Lang. Process. 20(4), 1409–1415 (2010)

    Article  Google Scholar 

  14. J. Garofalo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT acoustic-phonetic continuous speech corpus, linguistic data consortium, Philadelphia. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1. Last accessed Dec 2014

  15. M. Hesam, H. Marvi, Improvement of sector based multiple speaker localization in a smart room, in Proceedings of the International Conference of Signal Processing (ICSP) (2010), pp. 470–473

  16. Y. Hikoa, M. Matsuo, N. Hamada, Multiple-speech-source-localization using advanced histogram mapping method. Acoust. Sci. Technol. 30(2), 143–146 (2009)

    Article  Google Scholar 

  17. J.S. Hu, C.Y. Chan, C.K. Wang, M.T. Lee, C.Y. Kuo, Simultaneous localization of a mobile robot and multiple sound sources using a microphone array. Adv. Robot. 25(1), 135–152 (2011)

    Article  Google Scholar 

  18. B. Kapralos, M.R.M. Jenkin, E. Milios, Audio-visual localization of multiple speakers in a video teleconferencing setting. Technical report, York University, Canada (2002), pp. 94–96

  19. H.D. Kim, K. Komatani, T. Ogata, H.G. Okuno, Evaluation of two-channel-based sound source localization using 3D moving sound creation tool, in Proceedings of the International Conference on Informatics Education and Research for Knowledge-Circulating Society (2008), pp. 209–212

  20. C.H. Knapp, G.C. Carter, The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)

    Article  Google Scholar 

  21. A. Kumar, P.V. Balakrishna, C. Prakesh, S.V. Gangashetty, Bessel features for estimating number of speakers from multispeaker speech signals, in Proceedings of the 18th International Conference on Systems, Signals and Image Processing (IWSSIP) (2011), pp. 1–4

  22. B. Kwon, Y. Park, Y.S. Park, Multiple sound source localization using the spatially mapped GGC function, in Proceedings of the ICROS-SICE International Conference (2009), pp. 1773–1776

  23. G. Lathoud, I.A. McCowan, A sector-based approach for localization of multiple speakers with microphone arrays, in Proceedings of the Workshop of Statistical and Perceptual Audio Processing (SAPA), Jeju, Korea (2004), pp. 5–10

  24. S.Y. Lee, H.M. Park, Multiple reverberant sound localization based on rigorous zero-crossing-based its selection. IEEE Sig. Process. Lett. 17(7), 671–674 (2010)

    Article  Google Scholar 

  25. A. Lombard, H. Buchner, W. Kellermann, Multidimensional localization of multiple sound sources using blind adaptive MIMO system identification, in Blind Speech Separation, ed. by S. Makino, T.W. Lee, S. Sawada (Springer, Berlin, 2007), pp. 101–147

    Google Scholar 

  26. A. Lombard, T. Rozenkrank, H. Buchner, W. Kellermann, Multidimensional localization of multiple sound sources using averaged directivity pattern of blind source separation systems, in Proceedings of the ICASSP (2009), pp. 233–236

  27. A. Lombard, Y. Zheng, W. Kellermann, Synthesis of ICA-based methods for localization of multiple broadband sound sources, in Proceedings of the ICASSP (2011), pp. 157–160

  28. M.I. Mandel, R.J. Weiss, D.P.W. Ellis, Model-based expectation maximization source separation and localization. IEEE Trans. Audio Speech Lang. Proc. 18(2), 382–394 (2010)

    Article  Google Scholar 

  29. J.B. McQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (University of California Press, 1967), pp. 281–297. Retrieved 2009

  30. H. Nakashima, M. Kawamoto, T. Mukai, A localization method for multiple sound sources by using coherence function, in Proceedings of the 18th European Signal Processing Conference (2010), pp. 130–134

  31. T. Nishiura, T. Yamada, S. Nakamura, K. Shikano, Localization of multiple sound sources based on a CSP analysis with a microphone array, in Proceedings of the ICASSP (2000), pp. 1053–1056

  32. J.R. Peter, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  33. S. Rickard, F. Dietrich, DOA estimation of many W-disjoint orthogonal sources from two mixtures using DUET, in Proceedings of the 10th IEEE Workshop on Statistical Signal and Array Processing (SSAP) (2000), pp. 311–314

  34. H. Sayoud, S. Ouamour, Proposal of a new condense parameter estimating the number of speakers—an experimental investigation. J. Inf. Hiding Multimed. Sig. Proc. 1(2), 101–109 (2010)

    Google Scholar 

  35. R.K. Swamy, K.S.R. Murty, B. Yegnanarayana, Determining number of speakers from multispeaker speech signals using excitation source information. Technical report, Centre for Language Technologies Research Centre, International Institute of Information Technology, Hyderabad - 500 032 (2007), pp. 481–484

  36. H. Wang, P. Chu, Voice source localization for automatic camera pointing system in videoconferencing, in Proceedings of the ICASSP (1997), pp. 187–190

  37. T. Wolff, M. Buck, G. Schmidt, A subband based acoustic source localization system for reverberant environments, in Proceedings of the ITG-Fachtagung Sparchkommunikation (2008), pp. 1–4

  38. Y.R. Zheng, R.A. Goubran, M. El-Tanany, Experimental evaluation of a nested microphone array with adaptive noise cancellers. IEEE Trans. Instrum. Meas. 53(3), 777–786 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Reza Abutalebi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dehghan Firoozabadi, A., Abutalebi, H.R. A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers. Circuits Syst Signal Process 35, 573–601 (2016). https://doi.org/10.1007/s00034-015-0077-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-015-0077-6

Keywords

Navigation