Abstract
In this work, a multiple sound source localization and counting method based on a relaxed sparsity of speech signal is presented. A soundfield microphone is adopted to overcome the redundancy and complexity of microphone array in this paper. After establishing an effective measure, the relaxed sparsity of speech signals is investigated. According to this relaxed sparsity, we can obtain an extensive assumption that “single-source” zones always exist among the soundfield microphone signals, which is validated by statistical analysis. Based on “single-source” zone detecting, the proposed method jointly estimates the number of active sources and their corresponding DOAs by applying a peak searching approach to the normalized histogram of estimated DOA. The cross distortions caused by multiple simultaneously occurring sources are solved by estimating DOA in these “single-source” zones. The evaluations reveal that the proposed method achieves a higher accuracy of DOA estimation and source counting compared with the existing techniques. Furthermore, the proposed method has higher efficiency and lower complexity, which makes it suitable for real-time applications.
Similar content being viewed by others
References
Argentieri S, Danes P(2007) Broadband variations of the music high-resolution method for sound source localization in robotics. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007. IROS 2007. pp 2009–2014
Asaei A, Taghizadeh MJ, Haghighatshoar S, Raj B, Bourlard H, Cevher V (2016) Binary sparse coding of convolutive mixtures for sound localization and separation via spatialization. IEEE Trans Signal Process 64(3):567–579
Bechler D, Kroschel K (2003) Considering the second peak in the gcc function for multi-source tdoa estimation with a microphone array. In: Proceedings of the International Workshop on Acoustic Echo and Noise Control (IWAENC ’03), pp 315–318
Bechler D, Schlosser MS, Kroschel K (2004) System for robust 3d speaker tracking using microphone array measurements. In: Proceedings 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004. (IROS 2004), vol 3, pp 2117–2122
Belloni F, Koivunen V (2003) Unitary root-music technique for uniform circular array. In: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology, 2003. ISSPIT 2003. pp 451–454
Benesty J, Chen J, Huang Y (2004) Time-delay estimation via linear interpolation and cross correlation. IEEE Trans Speech Audio Process 12(5):509–519
Blandin C, Ozerov A, Vincent E (1950) Multi-source tdoa estimation in reverberant audio using angular spectra and clustering. Signal Process 92(8):1950–1960
Campbell DR, Palomki KJ, Brown GJ (2005) A matlab simulation of “shoebox” room acoustics for use in research and teaching. Comput Inf Syst J 9(3):48–51
Cobos M, Lopez JJ, Martinez D (2011) Two-microphone multi-speaker localization based on a Laplacian mixture model. Digit Signal Process 21(1):66–76
Dmochowski J, Benesty J, Affes S (2007a) Direction of arrival estimation using the parameterized spatial correlation matrix. IEEE Trans Audio Speech Lang Process 15(4):1327–1339
Dmochowski JP, Benesty J, Affes S (2007b) Broadband music: Opportunities and challenges for multiple source localization. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2007, pp 18–21
Gunel B, Hacihabiboglu H, Kondoz AM (2008) Acoustic source separation of convolutive mixtures based on intensity vector statistics. IEEE Trans Audio Speech Lang Process 16(4):748–756
Ishi CT, Chatot O, Ishiguro H, Hagita N (2009a) Evaluation of a music-based real-time sound localization of multiple sound sources in real noisy environments. In :IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009. IROS 2009. pp 2027–2032
Ishi CT, Chatot O, Ishiguro H, Hagita N (2009b) Evaluation of a music-based real-time sound localization of multiple sound sources in real noisy environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009. IROS 2009. pp 2027–2032
Jia M, Yang Z, Bao C, Zheng X, Ritz C (2015) Encoding multiple audio objects using intra-object sparsity. IEEE/ACM Trans Audio Speech Lang Process 23(6):1082–1095
Karbasi A, Sugiyama A (2007) A new DOA estimation method using a circular microphone array. In: Signal Processing Conference, 2007 15th European, pp 778–782
Knapp C, Carter G (1976) The generalized correlation method for estimation of time delay. IEEE Trans Acoustics Speech Signal Process 24(4):320–327
Loesch B, Uhlich S, Yang B (2009) Multidimensional localization of multiple sound sources using frequency domain ica and an extended state coherence transform. In: IEEE/SP 15th Workshop on Statistical Signal Processing, 2009. SSP ’09. pp 677–680
Lombard A, Zheng Y, Buchner H, Kellermann W (2011) Tdoa estimation for multiple sound sources in noisy and reverberant environments using broadband independent component analysis. IEEE Trans Audio Speech Lang Process 19(6):1490–1503
Nakadai K, Matsuura D, Okuno HG, Kitano H (2003) Applying scattering theory to robot audition system: robust sound source localization and extraction. In: Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2003. (IROS 2003). vol 2, pp 1147–1152
Nesta F, Omologo M (2012) Generalized state coherence transform for multidimensional tdoa estimation of multiple sources. IEEE Trans Audio Speech Lang Process 20(1):246–260
Comon P, Jutten C (2010) Handbook of blind source separation: independent component analysis and applications. Academic Press, Elsevier, Burlington
Pavlidi D, Griffin A, Puigt M, Mouchtaris A (2013) Real-time multiple sound source localization and counting using a circular microphone array. IEEE Trans Audio Speech Lang Process 21(10):2193–2206
Pavlidi D, Puigt M, Griffin A, Mouchtaris A (2012) Real-time multiple sound source localization using a circular microphone array based on single-source confidence measures. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012. pp 2625–2628
Pulkki V (2007) Spatial sound reproduction with directional audio coding. J Audio Eng Soc 55(6):503–516
Ren M, Zou YX (2012) A novel multiple sparse source localization using triangular pyramid microphone array. IEEE Signal Process Lett 19(2):83–86
Sawada H, Mukai R, Araki S, Malcino S (2005) Multiple source localization using independent component analysis. In: Antennas and Propagation Society International Symposium, 2005 IEEE, vol 4B, pp 81–84
Schmidt R (1986) Multiple emitter location and signal parameter estimation. IEEE Trans Antennas Propag 34(3):276–280
Shiiki Y, Suyama K (2015) Omnidirectional sound source tracking based on sequential updating histogram. In: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp 1249–1256
Shujau M, Ritz CH, Burnett IS (2011) Separation of speech sources using an acoustic vector sensor. In: IEEE 13th International Workshop on Multimedia Signal Processing (MMSP), 2011, pp 1–6
Sound C (2015) Core sound TetraMic. http://www.core-sound.com/TetraMic/1.php. Online; Accessed 25 Sep 2015
Su D, Miro JV, Vidal-Calleja T (2015) Real-time sound source localisation for target tracking applications using an asynchronous microphone array. In: IEEE 10th Conference on Industrial Electronics and Applications (ICIEA), 2015, pp 1261–1266
Swartling M, Sllberg B, Grbi N (2011) Source localization for multiple speech sources using low complexity non-parametric source separation and clustering. Signal Process 91(8):1781–1788
Tim VDB, Evelyne C, Jan W (2011) Sound source localization using hearing aids with microphones placed behind-the-ear, in-the-canal, and in-the-pinna. Int J Audiol 50(3):164–176
Yi Z, Kuroda T (2014) Wearable sensor-based human activity recognition from environmental background sounds. J Ambient Intell Humaniz Comput 5(1):77–89
Zhang JX, Christensen MG, Dahl J, Jensen SH, Moonen M (2009) Robust implementation of the music algorithm. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009. ICASSP 2009., pp 3037–3040
Zheng X (2013) Soundfield navigation: separation, compression and transmission, doctoral dissertation. University of Wollongong, Wollongong
Zheng X, Ritz C, Xi J (2013) Collaborative blind source separation using location informed spatial microphones. IEEE Signal Process Lett 20(1):83–86
Zheng X, Ritz C, Xi J (2016) Encoding and communicating navigable speech soundfields. Multimed Tools Appl 75(9):5183–5204
Acknowledgments
This work has been supported by the National Natural Science Foundation of China (Nos. 61231015, 61201197), Specialized Research Fund for the Doctoral Program of Higher Education of the People’s Republic of China (No. 20121103120017), the Project supported by Beijing Postdoctoral Research Foundation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jia, M., Sun, J. & Bao, C. Real-time multiple sound source localization and counting using a soundfield microphone. J Ambient Intell Human Comput 8, 829–844 (2017). https://doi.org/10.1007/s12652-016-0388-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-016-0388-x