A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers

Dehghan Firoozabadi, Ali; Abutalebi, Hamid Reza

doi:10.1007/s00034-015-0077-6

A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers

Published: 19 May 2015

Volume 35, pages 573–601, (2016)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Ali Dehghan Firoozabadi¹ &
Hamid Reza Abutalebi¹

671 Accesses
7 Citations
Explore all metrics

Abstract

This paper addresses the topic of simultaneous speaker localization. The work is related to the generalized cross-correlation (GCC)-based methods for estimating the direction of multiple speakers. Considering the defects of GCC-based direction of arrival (DOA) estimation methods, we have applied several modifications to improve our previous subband processing-based system for the localization of simultaneous speakers. Three modifications have been presented in this paper. In the first step, the DOA estimation method is equipped with a front-end block that determines the number of speakers based on K-means clustering and silhouette criterion. This block provides the true number of speakers for the DOA estimator. Secondly, in order to eliminate the spatial aliasing, we propose a novel nested circular microphone array. In the proposed array design, each microphone pair is only used in appropriate subband according to its inter-microphone distance. In the third step, to overcome the weakness of GCC-phase transform (GCC-PHAT) in noisy and noisy-reverberant conditions, we propose a SNR estimation block. So, we can separate noisy and reverberant conditions and use PHAT filter for reverberant conditions and maximum likelihood filter for noisy situations. The proposed method has been evaluated on both simulated and real multi-speaker speech data in various environmental conditions and different number of speakers. Our evaluations in terms of DOA accuracy demonstrate the superiority of the proposed method compared to the fullband and baseline subband methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Concurrent speakers localization using blind source separation and microphone array geometry

Article 09 May 2021

Directional Clustering with Polyharmonic Phase Estimation for Enhanced Speaker Localization

A recursive expectation-maximization algorithm for speaker tracking and separation

Article Open access 04 December 2021

References

J. Allen, D. Berkley, Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–950 (1979)
Article Google Scholar
T. Arai, Estimating number of speakers by the modulation characteristics of speech, in Proceedings of the ICASSP (2003), pp. 197–200
A. Brutti, M. Omologo, P. Svaizer, Localization of multiple speakers based on a two step acoustic map analysis, in Proceedings of the ICASSP (2008), pp. 4349–4352
H. Buchner, R. Aichner, W. Kellermann, Relation between blind system identification and convolutive blind source separation, in Proceedings of the Joint Workshop on Hands-Free Communication and Microphone Array (2005), d-3-d-4
H. Buchner, R. Aichner, J. Stenglein, H. Teutsch, W. Kellermann, Simultaneous localization of multiple source using blind adaptive MIMO filtering, in Proceeding of the ICASSP (2005), pp. 97–100
W. Cai, X. Zhao, Zh. Wu, Localization of multiple speech sources based on sub-band steered response power, in Proceeding of the International Conference on Electrical and Computer Engineering (ICECE) (2010), pp. 1246–1249
O. Cetin, E. Shriberg, Analysis of overlaps in meetings by dialog factors, hot spots, speakers, and collection site: insights for automatic speech recognition, in Proceeding of the ICSLP (2006), pp. 293–296
E.D. Claudio, R. Parisi, G. Orlandi, Multi-source localization in reverberant environments by ROOT-MUSIC and clustering, in Proceeding of the ICASSP (2000), pp. 921–924
A. Dehghan, H.R. Abutalebi, Subband processing-based approach for the localisation of two simultaneous speakers. IET Signal Process. 8(9), 996–1008 (2014)
Article Google Scholar
A. Dehghan, H.R. Abutalebi, Combination of nested microphone array and Subband processing for multiple simultaneous speaker localization, in Proceeding of the 6th International Symposium on Telecommunications (IST) (2012), pp. 907–912
A. Dehghan, H.R. Abutalebi, SRP-ML: A robust SRP-based speech source localization method for noisy environments, in Proceeding of the 18th Iranian Conference on Electrical Engineering (ICEE) (2010), pp. 2950–2955
C. Faller, J. Merimaa, Source localization in complex listening situations: selection of binaural cues based on interaural coherence. J Acoust. Soc. Am. 116(5), 3075–3089 (2004)
Article Google Scholar
M.F. Fallon, S.J. Godsill, Acoustic source localization and tracking of a time-varying number of speakers. IEEE Trans. Audio Speech Lang. Process. 20(4), 1409–1415 (2010)
Article Google Scholar
J. Garofalo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT acoustic-phonetic continuous speech corpus, linguistic data consortium, Philadelphia. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1. Last accessed Dec 2014
M. Hesam, H. Marvi, Improvement of sector based multiple speaker localization in a smart room, in Proceedings of the International Conference of Signal Processing (ICSP) (2010), pp. 470–473
Y. Hikoa, M. Matsuo, N. Hamada, Multiple-speech-source-localization using advanced histogram mapping method. Acoust. Sci. Technol. 30(2), 143–146 (2009)
Article Google Scholar
J.S. Hu, C.Y. Chan, C.K. Wang, M.T. Lee, C.Y. Kuo, Simultaneous localization of a mobile robot and multiple sound sources using a microphone array. Adv. Robot. 25(1), 135–152 (2011)
Article Google Scholar
B. Kapralos, M.R.M. Jenkin, E. Milios, Audio-visual localization of multiple speakers in a video teleconferencing setting. Technical report, York University, Canada (2002), pp. 94–96
H.D. Kim, K. Komatani, T. Ogata, H.G. Okuno, Evaluation of two-channel-based sound source localization using 3D moving sound creation tool, in Proceedings of the International Conference on Informatics Education and Research for Knowledge-Circulating Society (2008), pp. 209–212
C.H. Knapp, G.C. Carter, The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)
Article Google Scholar
A. Kumar, P.V. Balakrishna, C. Prakesh, S.V. Gangashetty, Bessel features for estimating number of speakers from multispeaker speech signals, in Proceedings of the 18th International Conference on Systems, Signals and Image Processing (IWSSIP) (2011), pp. 1–4
B. Kwon, Y. Park, Y.S. Park, Multiple sound source localization using the spatially mapped GGC function, in Proceedings of the ICROS-SICE International Conference (2009), pp. 1773–1776
G. Lathoud, I.A. McCowan, A sector-based approach for localization of multiple speakers with microphone arrays, in Proceedings of the Workshop of Statistical and Perceptual Audio Processing (SAPA), Jeju, Korea (2004), pp. 5–10
S.Y. Lee, H.M. Park, Multiple reverberant sound localization based on rigorous zero-crossing-based its selection. IEEE Sig. Process. Lett. 17(7), 671–674 (2010)
Article Google Scholar
A. Lombard, H. Buchner, W. Kellermann, Multidimensional localization of multiple sound sources using blind adaptive MIMO system identification, in Blind Speech Separation, ed. by S. Makino, T.W. Lee, S. Sawada (Springer, Berlin, 2007), pp. 101–147
Google Scholar
A. Lombard, T. Rozenkrank, H. Buchner, W. Kellermann, Multidimensional localization of multiple sound sources using averaged directivity pattern of blind source separation systems, in Proceedings of the ICASSP (2009), pp. 233–236
A. Lombard, Y. Zheng, W. Kellermann, Synthesis of ICA-based methods for localization of multiple broadband sound sources, in Proceedings of the ICASSP (2011), pp. 157–160
M.I. Mandel, R.J. Weiss, D.P.W. Ellis, Model-based expectation maximization source separation and localization. IEEE Trans. Audio Speech Lang. Proc. 18(2), 382–394 (2010)
Article Google Scholar
J.B. McQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (University of California Press, 1967), pp. 281–297. Retrieved 2009
H. Nakashima, M. Kawamoto, T. Mukai, A localization method for multiple sound sources by using coherence function, in Proceedings of the 18th European Signal Processing Conference (2010), pp. 130–134
T. Nishiura, T. Yamada, S. Nakamura, K. Shikano, Localization of multiple sound sources based on a CSP analysis with a microphone array, in Proceedings of the ICASSP (2000), pp. 1053–1056
J.R. Peter, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
S. Rickard, F. Dietrich, DOA estimation of many W-disjoint orthogonal sources from two mixtures using DUET, in Proceedings of the 10th IEEE Workshop on Statistical Signal and Array Processing (SSAP) (2000), pp. 311–314
H. Sayoud, S. Ouamour, Proposal of a new condense parameter estimating the number of speakers—an experimental investigation. J. Inf. Hiding Multimed. Sig. Proc. 1(2), 101–109 (2010)
Google Scholar
R.K. Swamy, K.S.R. Murty, B. Yegnanarayana, Determining number of speakers from multispeaker speech signals using excitation source information. Technical report, Centre for Language Technologies Research Centre, International Institute of Information Technology, Hyderabad - 500 032 (2007), pp. 481–484
H. Wang, P. Chu, Voice source localization for automatic camera pointing system in videoconferencing, in Proceedings of the ICASSP (1997), pp. 187–190
T. Wolff, M. Buck, G. Schmidt, A subband based acoustic source localization system for reverberant environments, in Proceedings of the ITG-Fachtagung Sparchkommunikation (2008), pp. 1–4
Y.R. Zheng, R.A. Goubran, M. El-Tanany, Experimental evaluation of a nested microphone array with adaptive noise cancellers. IEEE Trans. Instrum. Meas. 53(3), 777–786 (2004)

Download references

Author information

Authors and Affiliations

Electrical and Computer Engineering Department, Yazd University, Pajuhesh St., Safaieh, Postal Box: 89195-741, Yazd, Iran
Ali Dehghan Firoozabadi & Hamid Reza Abutalebi

Authors

Ali Dehghan Firoozabadi
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Reza Abutalebi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamid Reza Abutalebi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dehghan Firoozabadi, A., Abutalebi, H.R. A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers. Circuits Syst Signal Process 35, 573–601 (2016). https://doi.org/10.1007/s00034-015-0077-6

Download citation

Received: 12 December 2014
Revised: 04 May 2015
Accepted: 05 May 2015
Published: 19 May 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s00034-015-0077-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers

Abstract

Access this article

Similar content being viewed by others

Concurrent speakers localization using blind source separation and microphone array geometry

Directional Clustering with Polyharmonic Phase Estimation for Enhanced Speaker Localization

A recursive expectation-maximization algorithm for speaker tracking and separation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers

Abstract

Access this article

Similar content being viewed by others

Concurrent speakers localization using blind source separation and microphone array geometry

Directional Clustering with Polyharmonic Phase Estimation for Enhanced Speaker Localization

A recursive expectation-maximization algorithm for speaker tracking and separation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation