Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios

Nguyen, Quang; Choi, JongSuk

doi:10.1007/s10846-015-0313-0

Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios

Published: 25 November 2015

Volume 83, pages 239–251, (2016)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Quang Nguyen¹ &
JongSuk Choi^1,2

328 Accesses
8 Citations
6 Altmetric
Explore all metrics

Abstract

Robotic auditory attention mainly relies on sound source localization using a microphone array. Typically, the robot detects a sound source whenever it emits, estimates its direction, and then turns to that direction to pay attention. However, in scenarios where multiple sound sources emit simultaneously, the robot may have difficulty with selecting a single target source. This paper proposes a novel robot auditory attention system that is based on source distance perception (e.g., selection of the closest among localized sources). Microphone array consists of head- and base-arrays installed in the robot’s head and base, respectively. The difficulty in the attention among multiple sound sources is solved by estimating a binary mask for each source based on the azimuth localization of the head-array. For each individual source represented by a binary mask, elevations of head- and base-array are estimated and triangulated to obtain distance to the robot. Finally, the closest source is determined and its direction is used for controlling the robot. Experiment results clearly show the benefit of the proposed system, on real indoor recordings of two and three simultaneous sound sources, as well as real-time demonstration at a robot exhibition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Practical Robotic Auditory Perception and Approaching Methods Based on Small-sized Microphone Array

Article 21 April 2022

Sound Source Localization Strategy Based on Mobile Robot

Interactive Interface to Optimize Sound Source Localization with HARK

References

Aarabi, P., Zaky, S.: Robust sound localization using multi-source audiovisual information fusion. Information Fusion 2(3), 209–223 (2001)
Article Google Scholar
Aoki, M., Okamoto, M., Aoki, S., Matsui, H., Sakurai, T., Kaneda, Y.: Sound source segregation based on estimating incident angle of each frequency component of input signals acquired by multiple microphones. Acoust. Sci. Technol. 22(2), 149–157 (2001)
Article Google Scholar
Bechler, D., Kroschel, K., Karlsruhe, U.: Considering the second peak in the GCC function for multi-source TDOA estimation with a microphone array. In: Proceedings of the International Workshop on Acoustic Echo and Noise Control IWAENC, pp. 315–318 (2003)
Birchfield, S.T., Gangishetty, R.: Acoustic localization by interaural level difference. In: IEEE International Conference on Acoustics Speech and Signal Processing, vol. 4, pp. 1109–1112 (2005)
Blandin, C., Ozerov, A., Vincent, E.: Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Process. 92(8), 1950–1960 (2012)
Article Google Scholar
Brutti, A., Nesta, F.: Tracking of multidimensional TDOA for multiple sources with distributed microphone pairs. Comput. Speech Lang. 27(3), 660–682 (2013)
Article Google Scholar
Do, H., Silverman, H.F., Yu, Y.: A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1 (2007)
Galatas, G., Ferdous, S., Makedon, F.: Multi-modal person localization and emergency detection using the kinect. Int. J. Adv. Res. Artif. Intell. 2(1), 41–46 (2013)
Google Scholar
Georganti, E., May, T., van de Par, S., Mourjopoulos, J.: Sound source distance estimation in rooms based on statistical properties of binaural signals. IEEE Trans. Audio Speech Lang. Process. 21(8), 1727–1741 (2013). doi:10.1109/TASL.2013.2260155
Article Google Scholar
Hioka, Y., Niwa, K., Sakauchi, S., Furuya, K., Haneda, Y.: Estimating direct-to-reverberant energy ratio using D/R spatial correlation matrix model. IEEE Transactions on Audio Speech and Language Processing 19(8), 2374–2384 (2011)
Article Google Scholar
Hu, K., Wang, D.: An unsupervised approach to cochannel speech separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 122–131 (2013). doi:10.1109/TASL.2012.2215591
Article Google Scholar
Jeub, M., Nelke, C., Beaugeant, C., Vary, P.: Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals. In: European Signal Processing Conference (EUSIPCO), pp. 1347–1351 (2011)
Knapp, C., Carter, G.C.: The generalized correlation method for estimation of time delay. IEEE Trans. Audio Speech Lang. Process. 24(4), 320–327 (1976)
Article Google Scholar
Lee, B.g., Choi, J., Kim, D., Kim, M.: Sound source localization in reverberant environment using visual information. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3542–3547 (2010)
Loesch, B., Yang, B.: Blind source separation based on time-frequency sparseness in the presence of spatial aliasing. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 1–8 (2010)
Mandel, M., Bressler, S., Shinn-Cunningham, B., Ellis, D.: Evaluating source separation algorithms with reverberant speech. IEEE Trans. Audio Speech Lang. Process. 18(7), 1872–1883 (2010). doi:10.1109/TASL.2010.2052252
Article Google Scholar
Michaud, F., Cote, C., Letourneau, D., Brosseau, Y., Valin, J.M., Beaudry, E., Raievsky, C., Ponchon, A., Moisan, P., Lepage, P., Morin, Y., Gagnon, F., Giguere, P., Roux, M.A., Caron, S., Frenette, P., Kabanza F.: Spartacus attending the 2005. In: AAAI conference, vol. 22, pp. 369–383 (2007), doi:10.1007/s10514-006-9014-7
Nakadai, K., Takahashi, T., Okuno, H.G., Nakajima, H., Hasegawa, Y., Tsujino, H.: Design and implementation of robot audition system HARK –open source software for listening to three simultaneous speakers. Adv. Robot. 24(5-6), 739–761 (2010)
Article Google Scholar
Nguyen, Q., Choi, J.: Multiple sound sources localization with perception sensor network. In: International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 418–423 (2013), doi:10.1109/ROMAN.2013.6628515
Pavlidi, D., Griffin, A., Puigt, M., Mouchtaris, A.: Real-time multiple sound source localization and counting using a circular microphone array. IEEE Trans. Audio Speech Lang. Process. 21(10), 2193–2206 (2013)
Article Google Scholar
Sasaki, Y., Kagami, S., Mizoguchi, H.: Online short-term multiple sound source mapping for a mobile robot by robust motion triangulation. Adv. Robot. 23, 145–164 (2009)
Article Google Scholar
Schmidt, R.O.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)
Article Google Scholar
Shen, G., Hwang, D., Nguyen, Q., Choi, J.: Robust sound localization for various platform of robots using TDOA map adaptation. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3019–3024 (2012)
Silverman, B.W.: Density estimation for statistics and data analysis, vol 26. CRC press (1986)
Valin, J.M., Michaud, F., Rouat, J.: Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robot. Auton. Syst. 55(3), 216–228 (2007a)
Article Google Scholar
Valin, J.M., Yamamoto, S., Rouat, J., Michaud, F., Nakadai, K., Okuno, H.G.: Robust recognition of simultaneous speech by a mobile robot. IEEE Trans. Robot. 23(4), 742–752 (2007b)
Article Google Scholar
Vincent, E., Sini, A., Charpillet, F.: Audio source localization by optimal control of a mobile robot. In: IEEE 2015 International Conference on Acoustics Speech and Signal Processing (ICASSP) (2015)
Wang, D.: On ideal binary mask as the computational goal of auditory scene analysis. In: Speech separation by humans and machines, pp. 181–197 (2005)
Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Process. 52(7), 1830–1847 (2004)
Article MathSciNet Google Scholar
Zhang, C., Florêncio, D., Ba, D.E., Zhang, Z.: Maximum likelihood sound source localization and beamforming for directional microphone arrays in distributed meetings. IEEE Trans. Multimedia 10 (3), 538–548 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Korea Institute of Science and Technology (KIST), Seoul, South Korea
Quang Nguyen & JongSuk Choi
Korea University of Science and Technology (UST), Daejeon, South Korea
JongSuk Choi

Authors

Quang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
JongSuk Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quang Nguyen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, Q., Choi, J. Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios. J Intell Robot Syst 83, 239–251 (2016). https://doi.org/10.1007/s10846-015-0313-0

Download citation

Received: 15 June 2015
Accepted: 16 November 2015
Published: 25 November 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10846-015-0313-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios

Abstract

Access this article

Similar content being viewed by others

Practical Robotic Auditory Perception and Approaching Methods Based on Small-sized Microphone Array

Sound Source Localization Strategy Based on Mobile Robot

Interactive Interface to Optimize Sound Source Localization with HARK

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios

Abstract

Access this article

Similar content being viewed by others

Practical Robotic Auditory Perception and Approaching Methods Based on Small-sized Microphone Array

Sound Source Localization Strategy Based on Mobile Robot

Interactive Interface to Optimize Sound Source Localization with HARK

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation