Skip to main content
Log in

Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Robotic auditory attention mainly relies on sound source localization using a microphone array. Typically, the robot detects a sound source whenever it emits, estimates its direction, and then turns to that direction to pay attention. However, in scenarios where multiple sound sources emit simultaneously, the robot may have difficulty with selecting a single target source. This paper proposes a novel robot auditory attention system that is based on source distance perception (e.g., selection of the closest among localized sources). Microphone array consists of head- and base-arrays installed in the robot’s head and base, respectively. The difficulty in the attention among multiple sound sources is solved by estimating a binary mask for each source based on the azimuth localization of the head-array. For each individual source represented by a binary mask, elevations of head- and base-array are estimated and triangulated to obtain distance to the robot. Finally, the closest source is determined and its direction is used for controlling the robot. Experiment results clearly show the benefit of the proposed system, on real indoor recordings of two and three simultaneous sound sources, as well as real-time demonstration at a robot exhibition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aarabi, P., Zaky, S.: Robust sound localization using multi-source audiovisual information fusion. Information Fusion 2(3), 209–223 (2001)

    Article  Google Scholar 

  2. Aoki, M., Okamoto, M., Aoki, S., Matsui, H., Sakurai, T., Kaneda, Y.: Sound source segregation based on estimating incident angle of each frequency component of input signals acquired by multiple microphones. Acoust. Sci. Technol. 22(2), 149–157 (2001)

    Article  Google Scholar 

  3. Bechler, D., Kroschel, K., Karlsruhe, U.: Considering the second peak in the GCC function for multi-source TDOA estimation with a microphone array. In: Proceedings of the International Workshop on Acoustic Echo and Noise Control IWAENC, pp. 315–318 (2003)

  4. Birchfield, S.T., Gangishetty, R.: Acoustic localization by interaural level difference. In: IEEE International Conference on Acoustics Speech and Signal Processing, vol. 4, pp. 1109–1112 (2005)

  5. Blandin, C., Ozerov, A., Vincent, E.: Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Process. 92(8), 1950–1960 (2012)

    Article  Google Scholar 

  6. Brutti, A., Nesta, F.: Tracking of multidimensional TDOA for multiple sources with distributed microphone pairs. Comput. Speech Lang. 27(3), 660–682 (2013)

    Article  Google Scholar 

  7. Do, H., Silverman, H.F., Yu, Y.: A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1 (2007)

  8. Galatas, G., Ferdous, S., Makedon, F.: Multi-modal person localization and emergency detection using the kinect. Int. J. Adv. Res. Artif. Intell. 2(1), 41–46 (2013)

    Google Scholar 

  9. Georganti, E., May, T., van de Par, S., Mourjopoulos, J.: Sound source distance estimation in rooms based on statistical properties of binaural signals. IEEE Trans. Audio Speech Lang. Process. 21(8), 1727–1741 (2013). doi:10.1109/TASL.2013.2260155

    Article  Google Scholar 

  10. Hioka, Y., Niwa, K., Sakauchi, S., Furuya, K., Haneda, Y.: Estimating direct-to-reverberant energy ratio using D/R spatial correlation matrix model. IEEE Transactions on Audio Speech and Language Processing 19(8), 2374–2384 (2011)

    Article  Google Scholar 

  11. Hu, K., Wang, D.: An unsupervised approach to cochannel speech separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 122–131 (2013). doi:10.1109/TASL.2012.2215591

    Article  Google Scholar 

  12. Jeub, M., Nelke, C., Beaugeant, C., Vary, P.: Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals. In: European Signal Processing Conference (EUSIPCO), pp. 1347–1351 (2011)

  13. Knapp, C., Carter, G.C.: The generalized correlation method for estimation of time delay. IEEE Trans. Audio Speech Lang. Process. 24(4), 320–327 (1976)

    Article  Google Scholar 

  14. Lee, B.g., Choi, J., Kim, D., Kim, M.: Sound source localization in reverberant environment using visual information. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3542–3547 (2010)

  15. Loesch, B., Yang, B.: Blind source separation based on time-frequency sparseness in the presence of spatial aliasing. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 1–8 (2010)

  16. Mandel, M., Bressler, S., Shinn-Cunningham, B., Ellis, D.: Evaluating source separation algorithms with reverberant speech. IEEE Trans. Audio Speech Lang. Process. 18(7), 1872–1883 (2010). doi:10.1109/TASL.2010.2052252

    Article  Google Scholar 

  17. Michaud, F., Cote, C., Letourneau, D., Brosseau, Y., Valin, J.M., Beaudry, E., Raievsky, C., Ponchon, A., Moisan, P., Lepage, P., Morin, Y., Gagnon, F., Giguere, P., Roux, M.A., Caron, S., Frenette, P., Kabanza F.: Spartacus attending the 2005. In: AAAI conference, vol. 22, pp. 369–383 (2007), doi:10.1007/s10514-006-9014-7

  18. Nakadai, K., Takahashi, T., Okuno, H.G., Nakajima, H., Hasegawa, Y., Tsujino, H.: Design and implementation of robot audition system HARK –open source software for listening to three simultaneous speakers. Adv. Robot. 24(5-6), 739–761 (2010)

    Article  Google Scholar 

  19. Nguyen, Q., Choi, J.: Multiple sound sources localization with perception sensor network. In: International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 418–423 (2013), doi:10.1109/ROMAN.2013.6628515

  20. Pavlidi, D., Griffin, A., Puigt, M., Mouchtaris, A.: Real-time multiple sound source localization and counting using a circular microphone array. IEEE Trans. Audio Speech Lang. Process. 21(10), 2193–2206 (2013)

    Article  Google Scholar 

  21. Sasaki, Y., Kagami, S., Mizoguchi, H.: Online short-term multiple sound source mapping for a mobile robot by robust motion triangulation. Adv. Robot. 23, 145–164 (2009)

    Article  Google Scholar 

  22. Schmidt, R.O.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)

    Article  Google Scholar 

  23. Shen, G., Hwang, D., Nguyen, Q., Choi, J.: Robust sound localization for various platform of robots using TDOA map adaptation. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3019–3024 (2012)

  24. Silverman, B.W.: Density estimation for statistics and data analysis, vol 26. CRC press (1986)

  25. Valin, J.M., Michaud, F., Rouat, J.: Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robot. Auton. Syst. 55(3), 216–228 (2007a)

    Article  Google Scholar 

  26. Valin, J.M., Yamamoto, S., Rouat, J., Michaud, F., Nakadai, K., Okuno, H.G.: Robust recognition of simultaneous speech by a mobile robot. IEEE Trans. Robot. 23(4), 742–752 (2007b)

    Article  Google Scholar 

  27. Vincent, E., Sini, A., Charpillet, F.: Audio source localization by optimal control of a mobile robot. In: IEEE 2015 International Conference on Acoustics Speech and Signal Processing (ICASSP) (2015)

  28. Wang, D.: On ideal binary mask as the computational goal of auditory scene analysis. In: Speech separation by humans and machines, pp. 181–197 (2005)

  29. Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Process. 52(7), 1830–1847 (2004)

    Article  MathSciNet  Google Scholar 

  30. Zhang, C., Florêncio, D., Ba, D.E., Zhang, Z.: Maximum likelihood sound source localization and beamforming for directional microphone arrays in distributed meetings. IEEE Trans. Multimedia 10 (3), 538–548 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quang Nguyen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, Q., Choi, J. Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios. J Intell Robot Syst 83, 239–251 (2016). https://doi.org/10.1007/s10846-015-0313-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-015-0313-0

Keywords

Navigation