Skip to main content
Log in

Active audition using the parameter-less self-organising map

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

This paper presents a novel method for enabling a robot to determine the position of a sound source in three dimensions using just two microphones and interaction with its environment. The method uses the Parameter-Less Self-Organising Map (PLSOM) algorithm and Reinforcement Learning (RL) to achieve rapid, accurate response. We also introduce a method for directional filtering using the PLSOM. The presented system is compared to a similar system to evaluate its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Avendano, C., Algazi, V. R., & Duda, R. O. (1999). A head-and-torso model for low-frequency binaural elevation effects. In Proceedings of workshop on applications of signal processing to audio and acoustics (pp. 179–182), October 1999.

  • Berglund, E., & Sitte, J. (2003). The parameter-less SOM algorithm. In ANZIIS (pp. 159–164).

  • Berglund, E., & Sitte, J. (2006). The parameter-less self-organising map algorithm. IEEE Transactions on Neural Networks, 17(2), 305–316.

    Article  Google Scholar 

  • Blauert, J. (1983). Spatial hearing. Cambridge: MIT Press.

    Google Scholar 

  • Bregman, A. (1990). Auditory scene analysis. Massachusetts: MIT Press.

    Google Scholar 

  • Brungart, D. S., & Rabiowit, W. R. (1996). Auditory localization in the near-field. In Proceedings of the ICAD, international community for auditory display.

  • Day, C. (2001). Researchers uncover the neural details of how Barn Owls locate sound sources. Physics Today, 54, 20–22.

    Article  Google Scholar 

  • Ge, S. S., Loh, A. P., & Guan, F. (2003). Sound localization based on mask diffraction. In ICRA ’03 (Vol. 2, pp. 1972–1977), September 2003.

  • Gosavi, A. (2003). Simulation-based optimization: parametric optimization techniques and reinforcement learning. Dordrecht: Kluwer.

    MATH  Google Scholar 

  • Guentchev, K., & Weng, J. (1998). Learning based three dimensional sound localization using a compact non-coplanar array of microphones. In AAAI spring symposium on international environments.

  • Huang, J., Ohnishi, N., & Sugie, N. (1995). A biometric system for localization and separation of multiple sound sources. IEEE Transactions on Instrumentation and Measurement, 44(3), 733–738.

    Article  Google Scholar 

  • Huang, J., Ohnishi, N., & Sugie, N. (1997). Building ears for robots: sound localization and separation. Artificial Life and Robotics, 1(4), 157–163.

    Article  Google Scholar 

  • Iske, B., Rueckert, U., Sitte, J., & Malmstrom, K. (2000). A bootstrapping method for autonomous and in site learning of generic navigation behaviour. In Proceedings of the 15th international conference on pattern recognition (Vol. 4, pp. 656–659), Barcelona, Spain, September 2000.

  • Kitano, H., Okuno, H. G., Nakadai, K., Matsui, T., Hidai, K., & Lourens, T. (2002). SIG, the humanoid. http://www.symbio.jst.go.jp/symbio/SIG/.

  • Konishi, M. (1993). Listening with two ears. Scientific American, 268(4), 34–41. Deals with how the owl locate it’s prey by hearing. Of special interest to me is the layout of the owl’s ears and neural pathways. A lot of the information on the biology of owls is redundant.

    Article  Google Scholar 

  • Kuhn, G. F. (1987). Acoustics and measurements pertaining to directional hearing. In Directional hearing (pp. 3–25). New York: Springer.

    Google Scholar 

  • Kumon, M., Shimoda, T., Kohzawa, R., Mizumoto, I., & Iwai, Z. (2005). Audio servo for robotic systems with pinnae. In International conference on intelligent robots and systems (pp. 885–890).

  • Mershon, D. H., & Bowers, J. N. (1979). Absolute and relative cues for the auditory perception of egocentric distance. Perception, 8, 311–322.

    Article  Google Scholar 

  • Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Moore, B. C. J. (1997). An introduction to the psychology of hearing (4th ed.). New York: Academic Press.

    Google Scholar 

  • Nakadai, K. (2004). Private communication.

  • Nakadai, K., Lourens, T., Okuno, H. G., & Kitano, H. (2000a). Active audition for humanoid. In AAAI-2000 (pp. 832–839).

  • Nakadai, K., Okuno, H. G., Laurens, T., & Kitano, H. (2000b). Humanoid active audition system. In IEEE-RAS international conference on humanoid robots.

  • Nakadai, K., Hidai, K., Mizoguchi, H., Okuno, H. G., & Kitano, H. (2001). Real-time auditory and visual multiple-object tracking for humanoids. In IJCAI (pp. 1425–1436).

  • Nakadai, K., Okuno, H., & Kitano, H. (2002a). Realtime sound source localization and separation for robot audition. In Proceedings IEEE international conference on spoken language processing (pp. 193–196).

  • Nakadai, K., Okuno, H. G., & Kitano, H. (2002b). Exploiting auditory fovea in humanoid-human interaction. In Proceedings of the eighteenth national conference on artificial intelligence (pp. 431–438).

  • Nakadai, K., Matsuura, D., Okuno, H. G., & Kitano, H. (2003a). Applying scattering theory to robot audition system: robust sound source localization and extraction. In Proceedings of the 20003 IEEE/RSJ international conference on intelligent robots and systems (pp. 1147–1152).

  • Nakadai, K., Okuno, H. G., & Kitano, H. (2003b). Robot recognizes three simultaneous speech by active audition. In ICRA ’03 (Vol. 1, pp. 398–405).

  • Nakadai, K., Okuno, H. G., & Kitano, H. (2003c). Robot recognizes three simultaneous speech by active audition. In ICRA ’03 (Vol. 1, pp. 398–405).

  • Nakashima, H., Mukai, T., & Ohnishi, N. (2002). Self-organization of a sound source localization robot by perceptual cycle. In Proceedings of the 9th international conference neural information processing (Vol. 2, pp. 834–838).

  • Nakatani, T., Okuno, H. G., & Kawabata, T. (1994). Auditory stream segregation in auditory scene analysis with a multi-agent system. In AAAI-94 (pp. 100–107).

  • Obata, K., Noguchi, K., & Tadokoro, Y. (2003). A new sound source location algorithm based on formant frequency for sound image localization. In Proceedings 2003 international conference on multimedia and expo (Vol. 1, pp. 729–732), July 2003.

  • Rabinkin, D., Renomeron, R., Dahl, A., French, J., Flanagan, J., & Bianchi, M. (1996a). A DSP implementation of source location using microphone arrays. Proceedings of the SPIE, 2846, 88–99.

    Article  Google Scholar 

  • Rabinkin, D., Renomeron, R., French, J., & Flanagan, J. (1996b). Estimation of wavefront arrival delay using the crosspower spectrum phase technique. In Proceedings of 132nd meeting of the ASA.

  • Reid, G., & Milios, E. (1999). Active stereo sound localization.

  • Reid, G., & Milios, E. (2003). Active stereo sound localization. Journal of the Acoustical Society of America, 113(1), 185–193.

    Article  Google Scholar 

  • Rucci, M., Edelman, G., & Wray, J. (1999). Adaptation of orienting behavior: from the barn owl to a robotic system. IEEE Transactions on Robotics and Automation, 15(1), 15.

    Article  Google Scholar 

  • Shaw, E. A. G. (1997). Acoustical features of the human external ear. In Binaural and spatial hearing in real and virtual environments (pp. 49–75). Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Sitte, J., Malmstrom, K., & Iske, B. (2000). Perception stimulated generation of simple navigation behaviour. In Proceedings of SPIE: Vol. 4195. Mobile robots, Boston, MA, USA (pp. 228–239).

  • Sony (2005). Open-R and Aibo documentation. http://openr.aibo.com/openr/eng/index.php4.

  • Strutt, 3rd Baron Rayleigh, J. W. (1896). The theory of sound (2nd ed.). London: Macmillan.

    MATH  Google Scholar 

  • Sutton, R. S. (Ed.). (1992). Reinforcement learning. Dordrecht: Kluwer Academic.

    Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press.

    Google Scholar 

  • Tamai, Y., Kagami, S., Amemiya, Y., Sasaki, Y., Mizoguchi, H., & Takano, T. (2004). Circular microphone array for robot’s audition. In Proceedings of IEEE (pp. 565–570).

  • Tamai, Y., Sasaki, Y., Kagami, S., & Mizoguchi, H. (2005). Three ring microphone array for 3D sound localization and separation for mobile robot audition. In Proceedings of international conference on intelligent robots and systems (pp. 903–908).

  • Yamamoto, K., Asano, F., van Rooijen, W. F. G., Ling, E. Y. L., Yamada, T., & Kitawaki, N. (2003). Estimation of the number of sound sources using support vector machines and its application to sound source separation. In ICASSP ’03 (Vol. 5, pp. 485–488), April 2003.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik Berglund.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berglund, E., Sitte, J. & Wyeth, G. Active audition using the parameter-less self-organising map. Auton Robot 24, 401–417 (2008). https://doi.org/10.1007/s10514-008-9084-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-008-9084-9

Keywords

Navigation