Abstract
This paper addresses the topic of simultaneous speaker localization. The work is related to the generalized cross-correlation (GCC)-based methods for estimating the direction of multiple speakers. Considering the defects of GCC-based direction of arrival (DOA) estimation methods, we have applied several modifications to improve our previous subband processing-based system for the localization of simultaneous speakers. Three modifications have been presented in this paper. In the first step, the DOA estimation method is equipped with a front-end block that determines the number of speakers based on K-means clustering and silhouette criterion. This block provides the true number of speakers for the DOA estimator. Secondly, in order to eliminate the spatial aliasing, we propose a novel nested circular microphone array. In the proposed array design, each microphone pair is only used in appropriate subband according to its inter-microphone distance. In the third step, to overcome the weakness of GCC-phase transform (GCC-PHAT) in noisy and noisy-reverberant conditions, we propose a SNR estimation block. So, we can separate noisy and reverberant conditions and use PHAT filter for reverberant conditions and maximum likelihood filter for noisy situations. The proposed method has been evaluated on both simulated and real multi-speaker speech data in various environmental conditions and different number of speakers. Our evaluations in terms of DOA accuracy demonstrate the superiority of the proposed method compared to the fullband and baseline subband methods.
Similar content being viewed by others
References
J. Allen, D. Berkley, Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–950 (1979)
T. Arai, Estimating number of speakers by the modulation characteristics of speech, in Proceedings of the ICASSP (2003), pp. 197–200
A. Brutti, M. Omologo, P. Svaizer, Localization of multiple speakers based on a two step acoustic map analysis, in Proceedings of the ICASSP (2008), pp. 4349–4352
H. Buchner, R. Aichner, W. Kellermann, Relation between blind system identification and convolutive blind source separation, in Proceedings of the Joint Workshop on Hands-Free Communication and Microphone Array (2005), d-3-d-4
H. Buchner, R. Aichner, J. Stenglein, H. Teutsch, W. Kellermann, Simultaneous localization of multiple source using blind adaptive MIMO filtering, in Proceeding of the ICASSP (2005), pp. 97–100
W. Cai, X. Zhao, Zh. Wu, Localization of multiple speech sources based on sub-band steered response power, in Proceeding of the International Conference on Electrical and Computer Engineering (ICECE) (2010), pp. 1246–1249
O. Cetin, E. Shriberg, Analysis of overlaps in meetings by dialog factors, hot spots, speakers, and collection site: insights for automatic speech recognition, in Proceeding of the ICSLP (2006), pp. 293–296
E.D. Claudio, R. Parisi, G. Orlandi, Multi-source localization in reverberant environments by ROOT-MUSIC and clustering, in Proceeding of the ICASSP (2000), pp. 921–924
A. Dehghan, H.R. Abutalebi, Subband processing-based approach for the localisation of two simultaneous speakers. IET Signal Process. 8(9), 996–1008 (2014)
A. Dehghan, H.R. Abutalebi, Combination of nested microphone array and Subband processing for multiple simultaneous speaker localization, in Proceeding of the 6th International Symposium on Telecommunications (IST) (2012), pp. 907–912
A. Dehghan, H.R. Abutalebi, SRP-ML: A robust SRP-based speech source localization method for noisy environments, in Proceeding of the 18th Iranian Conference on Electrical Engineering (ICEE) (2010), pp. 2950–2955
C. Faller, J. Merimaa, Source localization in complex listening situations: selection of binaural cues based on interaural coherence. J Acoust. Soc. Am. 116(5), 3075–3089 (2004)
M.F. Fallon, S.J. Godsill, Acoustic source localization and tracking of a time-varying number of speakers. IEEE Trans. Audio Speech Lang. Process. 20(4), 1409–1415 (2010)
J. Garofalo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT acoustic-phonetic continuous speech corpus, linguistic data consortium, Philadelphia. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1. Last accessed Dec 2014
M. Hesam, H. Marvi, Improvement of sector based multiple speaker localization in a smart room, in Proceedings of the International Conference of Signal Processing (ICSP) (2010), pp. 470–473
Y. Hikoa, M. Matsuo, N. Hamada, Multiple-speech-source-localization using advanced histogram mapping method. Acoust. Sci. Technol. 30(2), 143–146 (2009)
J.S. Hu, C.Y. Chan, C.K. Wang, M.T. Lee, C.Y. Kuo, Simultaneous localization of a mobile robot and multiple sound sources using a microphone array. Adv. Robot. 25(1), 135–152 (2011)
B. Kapralos, M.R.M. Jenkin, E. Milios, Audio-visual localization of multiple speakers in a video teleconferencing setting. Technical report, York University, Canada (2002), pp. 94–96
H.D. Kim, K. Komatani, T. Ogata, H.G. Okuno, Evaluation of two-channel-based sound source localization using 3D moving sound creation tool, in Proceedings of the International Conference on Informatics Education and Research for Knowledge-Circulating Society (2008), pp. 209–212
C.H. Knapp, G.C. Carter, The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)
A. Kumar, P.V. Balakrishna, C. Prakesh, S.V. Gangashetty, Bessel features for estimating number of speakers from multispeaker speech signals, in Proceedings of the 18th International Conference on Systems, Signals and Image Processing (IWSSIP) (2011), pp. 1–4
B. Kwon, Y. Park, Y.S. Park, Multiple sound source localization using the spatially mapped GGC function, in Proceedings of the ICROS-SICE International Conference (2009), pp. 1773–1776
G. Lathoud, I.A. McCowan, A sector-based approach for localization of multiple speakers with microphone arrays, in Proceedings of the Workshop of Statistical and Perceptual Audio Processing (SAPA), Jeju, Korea (2004), pp. 5–10
S.Y. Lee, H.M. Park, Multiple reverberant sound localization based on rigorous zero-crossing-based its selection. IEEE Sig. Process. Lett. 17(7), 671–674 (2010)
A. Lombard, H. Buchner, W. Kellermann, Multidimensional localization of multiple sound sources using blind adaptive MIMO system identification, in Blind Speech Separation, ed. by S. Makino, T.W. Lee, S. Sawada (Springer, Berlin, 2007), pp. 101–147
A. Lombard, T. Rozenkrank, H. Buchner, W. Kellermann, Multidimensional localization of multiple sound sources using averaged directivity pattern of blind source separation systems, in Proceedings of the ICASSP (2009), pp. 233–236
A. Lombard, Y. Zheng, W. Kellermann, Synthesis of ICA-based methods for localization of multiple broadband sound sources, in Proceedings of the ICASSP (2011), pp. 157–160
M.I. Mandel, R.J. Weiss, D.P.W. Ellis, Model-based expectation maximization source separation and localization. IEEE Trans. Audio Speech Lang. Proc. 18(2), 382–394 (2010)
J.B. McQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (University of California Press, 1967), pp. 281–297. Retrieved 2009
H. Nakashima, M. Kawamoto, T. Mukai, A localization method for multiple sound sources by using coherence function, in Proceedings of the 18th European Signal Processing Conference (2010), pp. 130–134
T. Nishiura, T. Yamada, S. Nakamura, K. Shikano, Localization of multiple sound sources based on a CSP analysis with a microphone array, in Proceedings of the ICASSP (2000), pp. 1053–1056
J.R. Peter, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)
S. Rickard, F. Dietrich, DOA estimation of many W-disjoint orthogonal sources from two mixtures using DUET, in Proceedings of the 10th IEEE Workshop on Statistical Signal and Array Processing (SSAP) (2000), pp. 311–314
H. Sayoud, S. Ouamour, Proposal of a new condense parameter estimating the number of speakers—an experimental investigation. J. Inf. Hiding Multimed. Sig. Proc. 1(2), 101–109 (2010)
R.K. Swamy, K.S.R. Murty, B. Yegnanarayana, Determining number of speakers from multispeaker speech signals using excitation source information. Technical report, Centre for Language Technologies Research Centre, International Institute of Information Technology, Hyderabad - 500 032 (2007), pp. 481–484
H. Wang, P. Chu, Voice source localization for automatic camera pointing system in videoconferencing, in Proceedings of the ICASSP (1997), pp. 187–190
T. Wolff, M. Buck, G. Schmidt, A subband based acoustic source localization system for reverberant environments, in Proceedings of the ITG-Fachtagung Sparchkommunikation (2008), pp. 1–4
Y.R. Zheng, R.A. Goubran, M. El-Tanany, Experimental evaluation of a nested microphone array with adaptive noise cancellers. IEEE Trans. Instrum. Meas. 53(3), 777–786 (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dehghan Firoozabadi, A., Abutalebi, H.R. A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers. Circuits Syst Signal Process 35, 573–601 (2016). https://doi.org/10.1007/s00034-015-0077-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-015-0077-6