Abstract
Robotic auditory attention mainly relies on sound source localization using a microphone array. Typically, the robot detects a sound source whenever it emits, estimates its direction, and then turns to that direction to pay attention. However, in scenarios where multiple sound sources emit simultaneously, the robot may have difficulty with selecting a single target source. This paper proposes a novel robot auditory attention system that is based on source distance perception (e.g., selection of the closest among localized sources). Microphone array consists of head- and base-arrays installed in the robot’s head and base, respectively. The difficulty in the attention among multiple sound sources is solved by estimating a binary mask for each source based on the azimuth localization of the head-array. For each individual source represented by a binary mask, elevations of head- and base-array are estimated and triangulated to obtain distance to the robot. Finally, the closest source is determined and its direction is used for controlling the robot. Experiment results clearly show the benefit of the proposed system, on real indoor recordings of two and three simultaneous sound sources, as well as real-time demonstration at a robot exhibition.
Similar content being viewed by others
References
Aarabi, P., Zaky, S.: Robust sound localization using multi-source audiovisual information fusion. Information Fusion 2(3), 209–223 (2001)
Aoki, M., Okamoto, M., Aoki, S., Matsui, H., Sakurai, T., Kaneda, Y.: Sound source segregation based on estimating incident angle of each frequency component of input signals acquired by multiple microphones. Acoust. Sci. Technol. 22(2), 149–157 (2001)
Bechler, D., Kroschel, K., Karlsruhe, U.: Considering the second peak in the GCC function for multi-source TDOA estimation with a microphone array. In: Proceedings of the International Workshop on Acoustic Echo and Noise Control IWAENC, pp. 315–318 (2003)
Birchfield, S.T., Gangishetty, R.: Acoustic localization by interaural level difference. In: IEEE International Conference on Acoustics Speech and Signal Processing, vol. 4, pp. 1109–1112 (2005)
Blandin, C., Ozerov, A., Vincent, E.: Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Process. 92(8), 1950–1960 (2012)
Brutti, A., Nesta, F.: Tracking of multidimensional TDOA for multiple sources with distributed microphone pairs. Comput. Speech Lang. 27(3), 660–682 (2013)
Do, H., Silverman, H.F., Yu, Y.: A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1 (2007)
Galatas, G., Ferdous, S., Makedon, F.: Multi-modal person localization and emergency detection using the kinect. Int. J. Adv. Res. Artif. Intell. 2(1), 41–46 (2013)
Georganti, E., May, T., van de Par, S., Mourjopoulos, J.: Sound source distance estimation in rooms based on statistical properties of binaural signals. IEEE Trans. Audio Speech Lang. Process. 21(8), 1727–1741 (2013). doi:10.1109/TASL.2013.2260155
Hioka, Y., Niwa, K., Sakauchi, S., Furuya, K., Haneda, Y.: Estimating direct-to-reverberant energy ratio using D/R spatial correlation matrix model. IEEE Transactions on Audio Speech and Language Processing 19(8), 2374–2384 (2011)
Hu, K., Wang, D.: An unsupervised approach to cochannel speech separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 122–131 (2013). doi:10.1109/TASL.2012.2215591
Jeub, M., Nelke, C., Beaugeant, C., Vary, P.: Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals. In: European Signal Processing Conference (EUSIPCO), pp. 1347–1351 (2011)
Knapp, C., Carter, G.C.: The generalized correlation method for estimation of time delay. IEEE Trans. Audio Speech Lang. Process. 24(4), 320–327 (1976)
Lee, B.g., Choi, J., Kim, D., Kim, M.: Sound source localization in reverberant environment using visual information. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3542–3547 (2010)
Loesch, B., Yang, B.: Blind source separation based on time-frequency sparseness in the presence of spatial aliasing. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 1–8 (2010)
Mandel, M., Bressler, S., Shinn-Cunningham, B., Ellis, D.: Evaluating source separation algorithms with reverberant speech. IEEE Trans. Audio Speech Lang. Process. 18(7), 1872–1883 (2010). doi:10.1109/TASL.2010.2052252
Michaud, F., Cote, C., Letourneau, D., Brosseau, Y., Valin, J.M., Beaudry, E., Raievsky, C., Ponchon, A., Moisan, P., Lepage, P., Morin, Y., Gagnon, F., Giguere, P., Roux, M.A., Caron, S., Frenette, P., Kabanza F.: Spartacus attending the 2005. In: AAAI conference, vol. 22, pp. 369–383 (2007), doi:10.1007/s10514-006-9014-7
Nakadai, K., Takahashi, T., Okuno, H.G., Nakajima, H., Hasegawa, Y., Tsujino, H.: Design and implementation of robot audition system HARK –open source software for listening to three simultaneous speakers. Adv. Robot. 24(5-6), 739–761 (2010)
Nguyen, Q., Choi, J.: Multiple sound sources localization with perception sensor network. In: International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 418–423 (2013), doi:10.1109/ROMAN.2013.6628515
Pavlidi, D., Griffin, A., Puigt, M., Mouchtaris, A.: Real-time multiple sound source localization and counting using a circular microphone array. IEEE Trans. Audio Speech Lang. Process. 21(10), 2193–2206 (2013)
Sasaki, Y., Kagami, S., Mizoguchi, H.: Online short-term multiple sound source mapping for a mobile robot by robust motion triangulation. Adv. Robot. 23, 145–164 (2009)
Schmidt, R.O.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)
Shen, G., Hwang, D., Nguyen, Q., Choi, J.: Robust sound localization for various platform of robots using TDOA map adaptation. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3019–3024 (2012)
Silverman, B.W.: Density estimation for statistics and data analysis, vol 26. CRC press (1986)
Valin, J.M., Michaud, F., Rouat, J.: Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robot. Auton. Syst. 55(3), 216–228 (2007a)
Valin, J.M., Yamamoto, S., Rouat, J., Michaud, F., Nakadai, K., Okuno, H.G.: Robust recognition of simultaneous speech by a mobile robot. IEEE Trans. Robot. 23(4), 742–752 (2007b)
Vincent, E., Sini, A., Charpillet, F.: Audio source localization by optimal control of a mobile robot. In: IEEE 2015 International Conference on Acoustics Speech and Signal Processing (ICASSP) (2015)
Wang, D.: On ideal binary mask as the computational goal of auditory scene analysis. In: Speech separation by humans and machines, pp. 181–197 (2005)
Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Process. 52(7), 1830–1847 (2004)
Zhang, C., Florêncio, D., Ba, D.E., Zhang, Z.: Maximum likelihood sound source localization and beamforming for directional microphone arrays in distributed meetings. IEEE Trans. Multimedia 10 (3), 538–548 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nguyen, Q., Choi, J. Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios. J Intell Robot Syst 83, 239–251 (2016). https://doi.org/10.1007/s10846-015-0313-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10846-015-0313-0