Abstract
This paper presents a novel method for enabling a robot to determine the position of a sound source in three dimensions using just two microphones and interaction with its environment. The method uses the Parameter-Less Self-Organising Map (PLSOM) algorithm and Reinforcement Learning (RL) to achieve rapid, accurate response. We also introduce a method for directional filtering using the PLSOM. The presented system is compared to a similar system to evaluate its performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Avendano, C., Algazi, V. R., & Duda, R. O. (1999). A head-and-torso model for low-frequency binaural elevation effects. In Proceedings of workshop on applications of signal processing to audio and acoustics (pp. 179–182), October 1999.
Berglund, E., & Sitte, J. (2003). The parameter-less SOM algorithm. In ANZIIS (pp. 159–164).
Berglund, E., & Sitte, J. (2006). The parameter-less self-organising map algorithm. IEEE Transactions on Neural Networks, 17(2), 305–316.
Blauert, J. (1983). Spatial hearing. Cambridge: MIT Press.
Bregman, A. (1990). Auditory scene analysis. Massachusetts: MIT Press.
Brungart, D. S., & Rabiowit, W. R. (1996). Auditory localization in the near-field. In Proceedings of the ICAD, international community for auditory display.
Day, C. (2001). Researchers uncover the neural details of how Barn Owls locate sound sources. Physics Today, 54, 20–22.
Ge, S. S., Loh, A. P., & Guan, F. (2003). Sound localization based on mask diffraction. In ICRA ’03 (Vol. 2, pp. 1972–1977), September 2003.
Gosavi, A. (2003). Simulation-based optimization: parametric optimization techniques and reinforcement learning. Dordrecht: Kluwer.
Guentchev, K., & Weng, J. (1998). Learning based three dimensional sound localization using a compact non-coplanar array of microphones. In AAAI spring symposium on international environments.
Huang, J., Ohnishi, N., & Sugie, N. (1995). A biometric system for localization and separation of multiple sound sources. IEEE Transactions on Instrumentation and Measurement, 44(3), 733–738.
Huang, J., Ohnishi, N., & Sugie, N. (1997). Building ears for robots: sound localization and separation. Artificial Life and Robotics, 1(4), 157–163.
Iske, B., Rueckert, U., Sitte, J., & Malmstrom, K. (2000). A bootstrapping method for autonomous and in site learning of generic navigation behaviour. In Proceedings of the 15th international conference on pattern recognition (Vol. 4, pp. 656–659), Barcelona, Spain, September 2000.
Kitano, H., Okuno, H. G., Nakadai, K., Matsui, T., Hidai, K., & Lourens, T. (2002). SIG, the humanoid. http://www.symbio.jst.go.jp/symbio/SIG/.
Konishi, M. (1993). Listening with two ears. Scientific American, 268(4), 34–41. Deals with how the owl locate it’s prey by hearing. Of special interest to me is the layout of the owl’s ears and neural pathways. A lot of the information on the biology of owls is redundant.
Kuhn, G. F. (1987). Acoustics and measurements pertaining to directional hearing. In Directional hearing (pp. 3–25). New York: Springer.
Kumon, M., Shimoda, T., Kohzawa, R., Mizumoto, I., & Iwai, Z. (2005). Audio servo for robotic systems with pinnae. In International conference on intelligent robots and systems (pp. 885–890).
Mershon, D. H., & Bowers, J. N. (1979). Absolute and relative cues for the auditory perception of egocentric distance. Perception, 8, 311–322.
Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.
Moore, B. C. J. (1997). An introduction to the psychology of hearing (4th ed.). New York: Academic Press.
Nakadai, K. (2004). Private communication.
Nakadai, K., Lourens, T., Okuno, H. G., & Kitano, H. (2000a). Active audition for humanoid. In AAAI-2000 (pp. 832–839).
Nakadai, K., Okuno, H. G., Laurens, T., & Kitano, H. (2000b). Humanoid active audition system. In IEEE-RAS international conference on humanoid robots.
Nakadai, K., Hidai, K., Mizoguchi, H., Okuno, H. G., & Kitano, H. (2001). Real-time auditory and visual multiple-object tracking for humanoids. In IJCAI (pp. 1425–1436).
Nakadai, K., Okuno, H., & Kitano, H. (2002a). Realtime sound source localization and separation for robot audition. In Proceedings IEEE international conference on spoken language processing (pp. 193–196).
Nakadai, K., Okuno, H. G., & Kitano, H. (2002b). Exploiting auditory fovea in humanoid-human interaction. In Proceedings of the eighteenth national conference on artificial intelligence (pp. 431–438).
Nakadai, K., Matsuura, D., Okuno, H. G., & Kitano, H. (2003a). Applying scattering theory to robot audition system: robust sound source localization and extraction. In Proceedings of the 20003 IEEE/RSJ international conference on intelligent robots and systems (pp. 1147–1152).
Nakadai, K., Okuno, H. G., & Kitano, H. (2003b). Robot recognizes three simultaneous speech by active audition. In ICRA ’03 (Vol. 1, pp. 398–405).
Nakadai, K., Okuno, H. G., & Kitano, H. (2003c). Robot recognizes three simultaneous speech by active audition. In ICRA ’03 (Vol. 1, pp. 398–405).
Nakashima, H., Mukai, T., & Ohnishi, N. (2002). Self-organization of a sound source localization robot by perceptual cycle. In Proceedings of the 9th international conference neural information processing (Vol. 2, pp. 834–838).
Nakatani, T., Okuno, H. G., & Kawabata, T. (1994). Auditory stream segregation in auditory scene analysis with a multi-agent system. In AAAI-94 (pp. 100–107).
Obata, K., Noguchi, K., & Tadokoro, Y. (2003). A new sound source location algorithm based on formant frequency for sound image localization. In Proceedings 2003 international conference on multimedia and expo (Vol. 1, pp. 729–732), July 2003.
Rabinkin, D., Renomeron, R., Dahl, A., French, J., Flanagan, J., & Bianchi, M. (1996a). A DSP implementation of source location using microphone arrays. Proceedings of the SPIE, 2846, 88–99.
Rabinkin, D., Renomeron, R., French, J., & Flanagan, J. (1996b). Estimation of wavefront arrival delay using the crosspower spectrum phase technique. In Proceedings of 132nd meeting of the ASA.
Reid, G., & Milios, E. (1999). Active stereo sound localization.
Reid, G., & Milios, E. (2003). Active stereo sound localization. Journal of the Acoustical Society of America, 113(1), 185–193.
Rucci, M., Edelman, G., & Wray, J. (1999). Adaptation of orienting behavior: from the barn owl to a robotic system. IEEE Transactions on Robotics and Automation, 15(1), 15.
Shaw, E. A. G. (1997). Acoustical features of the human external ear. In Binaural and spatial hearing in real and virtual environments (pp. 49–75). Mahwah: Lawrence Erlbaum Associates.
Sitte, J., Malmstrom, K., & Iske, B. (2000). Perception stimulated generation of simple navigation behaviour. In Proceedings of SPIE: Vol. 4195. Mobile robots, Boston, MA, USA (pp. 228–239).
Sony (2005). Open-R and Aibo documentation. http://openr.aibo.com/openr/eng/index.php4.
Strutt, 3rd Baron Rayleigh, J. W. (1896). The theory of sound (2nd ed.). London: Macmillan.
Sutton, R. S. (Ed.). (1992). Reinforcement learning. Dordrecht: Kluwer Academic.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press.
Tamai, Y., Kagami, S., Amemiya, Y., Sasaki, Y., Mizoguchi, H., & Takano, T. (2004). Circular microphone array for robot’s audition. In Proceedings of IEEE (pp. 565–570).
Tamai, Y., Sasaki, Y., Kagami, S., & Mizoguchi, H. (2005). Three ring microphone array for 3D sound localization and separation for mobile robot audition. In Proceedings of international conference on intelligent robots and systems (pp. 903–908).
Yamamoto, K., Asano, F., van Rooijen, W. F. G., Ling, E. Y. L., Yamada, T., & Kitawaki, N. (2003). Estimation of the number of sound sources using support vector machines and its application to sound source separation. In ICASSP ’03 (Vol. 5, pp. 485–488), April 2003.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Berglund, E., Sitte, J. & Wyeth, G. Active audition using the parameter-less self-organising map. Auton Robot 24, 401–417 (2008). https://doi.org/10.1007/s10514-008-9084-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-008-9084-9