Skip to main content

Advertisement

Log in

Sound source localization for auditory perception of a humanoid robot using deep neural networks

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper presents an estimation of the sound source location using deep neural networks in order to provide auditory perception of a humanoid robot. Estimation of a moving sound source is crucial for a humanoid robot to improve functionality in some environments where the robot’s camera cannot operate. It plays an important role, especially in a recovery scenario with no visual contact. In this study, the data of the sound source around the robot were recorded by four microphones placed on the humanoid robot’s head. A wheeled robot was used to obtain the sound source with odometry. Recorded sound dataset and collected odometry dataset were used as input data and target data, respectively. The discrete wavelet transform (DWT) was applied for pre-processing of the input data. After pre-processing, the obtained matrices were applied as inputs of the proposed convolutional neural network (CNN), long short-term memory (LSTM), bidirectional long-short-term memory (biLSTM), and multilayer perceptron (MLP) networks to estimate the sound source location around the humanoid robot. As a result of all tests for the estimation models created by proposed networks, the \(R^2\) metrics of the biLSTM structure were obtained as approximately 0.97. This study showed experimentally that humanoid robots can sense the position of sound source in the environment with sufficient accuracy like many living creatures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Kahraman C, Bolturk E, Onar SC, Oztaysi B (2021) Modeling Humanoid Robots Using Fuzzy Set Extensions, pp. 99–119. Springer International Publishing, Cham

  2. Saeedvand S, Jafari M, Aghdasi HS, Baltes J (2019) A comprehensive survey on humanoid robot development. The Knowledge Engineering Review 34(20):1–18

    Google Scholar 

  3. Chesher C, Andreallo F (2021) Eye machines: Robot eye, vision and gaze. Inter J Soc Robotics

  4. Dai Y, Gao S (2021) A flexible multi-functional smart skin for force, touch position, proximity, and humidity sensing for humanoid robots. IEEE Sens J 21(23):26355–26363

    Article  Google Scholar 

  5. Rojas-Quintero JA, Rodríguez-Liñán MC (2021) A literature review of sensor heads for humanoid robots. Robot Auton Syst 143:103834

    Article  Google Scholar 

  6. Yan F, Iliyasu AM, Hirota K (2021) Emotion space modelling for social robots. Eng Appl Art Intell 100:104178

    Article  Google Scholar 

  7. Plack CJ (2018) The Sense of Hearing

  8. Bingol MC, Omur A (2020) Performing predefined tasks using the human-robot interaction on speech recognition for an industrial robot. Eng Appl Artif Intell 95:103903

    Article  Google Scholar 

  9. Natera MAS, Rodriguez-Osorio RM, de Haro Ariet L, Perez MS, (2012) Calibration proposal for new antenna array architectures and technologies for space communications. IEEE Antennas Wireless Propagat Lett 11:1129–1132

  10. Gergen S, Nagathil A, Martin R (2015) Classification of reverberant audio signals using clustered ad hoc distributed microphones. Signal Process 107:21–32

    Article  Google Scholar 

  11. Thomas F, Ros L (2005) Revisiting trilateration for robot localization. IEEE Transact Robot 21(1):93–101

    Article  Google Scholar 

  12. Shaukat MA, Shaukat HR, Qadir Z, Munawar HS, Kouzani AZ, Mahmud MAP (2021) Cluster analysis and model comparison using smart meter data. Sensors 21(9):1–21

    Article  Google Scholar 

  13. de Jesús Rubio J (2021) Stability analysis of the modified levenberg-marquardt algorithm for the artificial neural network training. IEEE Transact Neural Netw Learn Syst 32(8):3510–3524

    Article  MathSciNet  Google Scholar 

  14. de Jesús Rubio J, Lughofer E, Pieper J, Cruz P, Martinez DI, Ochoa G, Islas MA, Garcia E (2021) Adapting h-infinity controller for the desired reference tracking of the sphere position in the maglev process. Inform Sci 569:669–686

    Article  MathSciNet  Google Scholar 

  15. Chiang HS, Chen MY, Huang YJ (2019) Wavelet-based eeg processing for epilepsy detection using fuzzy entropy and associative petri net. IEEE Access 7:103255–103262

    Article  Google Scholar 

  16. de Jesús Rubio J, Islas MA, Ochoa G, Cruz DR, Garcia E, Pacheco J (2022) Convergent newton method and neural network for the electric energy usage prediction. Inform Sci 585:89–112

  17. Christudas F, Dhanraj AV (2020) System identification using long short term memory recurrent neural networks for real time conical tank system. Romanian J Inform Sci Technol, 23(T):T57–T77

  18. Zhao JY, Gong J, Ma ST, Lu ZM, Chu SC, Roddick JF (2019) Finger vein recognition scheme based on convolutional neural network using curvature gray image. J. Netw. Intell. 4(3):114–123

  19. Albu A, Precup RE, Teban TA (2019) Results and challenges of artificial neural networks used for decision-making and control in medical applications. Facta Universitatis, Series: Mech Eng 17(3):285–308

    Article  Google Scholar 

  20. Yalta N, Nakadai K, Ogata T (2017) Sound source localization using deep learning models. J Robot Mech 29(1):37–48

    Article  Google Scholar 

  21. Ma W, Liu X (2019) Phased microphone array for sound source localization with deep learning. Aerosp Syst 2(2):71–81

    Article  Google Scholar 

  22. Nakamura E, Kageyama Y, Hirose S (2022) Lstm-based japanese speaker identification using an omnidirectional camera and voice information. IEEJ Transact Electrical Electron Eng 17(5):674–684

    Article  Google Scholar 

  23. Yu W, Yu H, Wang D, Du J, Zhang M (2021) Sl-bilstm: A signal-based bidirectional lstm network for over-the-horizon target localization. Math Probl Eng, 1

  24. Desai D, Mehendale N (2021) A review on sound source localization systems. Available at SSRN 3891373

  25. Huang L, Liu G, Wang Y, Yuan H, Chen T (2022) Fire detection in video surveillances using convolutional neural networks and wavelet transform. Eng Appl Artif Intell 110:104737

    Article  Google Scholar 

  26. Janse PV, Magre SB, Kurzekar PK, Deshmukh R (2014) A comparative study between mfcc and dwt feature extraction technique. Inter J Eng Res Technol 3(1):3124–3127

    Google Scholar 

  27. Irie RE (1995) Robust sound localization: An application of an auditory perception system for a humanoid robot. Massachusetts Institute of Technology, Master Thesis

  28. Grumiaux PA, Kitić S, Girin L, Guérin A (2022) A Survey of Sound Source Localization with Deep Learning Methods. J Acoust Soc Am 152(107):107–151

    Article  Google Scholar 

  29. Hirvonen T (2015) Classification of spatial audio location and content using convolutional neural networks. J Audio Eng Soc 1(1):9294

    Google Scholar 

  30. Chakrabarty S, Habets EAP (2017) Broadband doa estimation using convolutional neural networks trained with noise signals. In 2017 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), October 2017

  31. Chakrabarty S, Habets E AP (2017) Multi-speaker localization using convolutional neural network trained with noise. 31st Conference on Neural Information Processing Systems (NIPS 2017), December 2017

  32. He W, Motlicek P, Odobez JM (2018) Deep neural networks for multiple speaker detection and localization. 2018 IEEE International Conference on Robotics and Automation (ICRA), May 2018

  33. Thuillier E, Gamper H, Tashev IJ (2018) Spatial audio feature discovery with convolutional neural networks. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 6797–6801

  34. Quan N, Laurent G, Gérard B, Frédéric E (2018) Nguyen Duc-Canh, Learning Autonomous Sensorimotor, for Sound Source Localization by a Humanoid Robot. In IROS, (2018) Workshop on crossmodal learning for intelligent robotics in conjunction with IEEE/RSJ IROS. Spain, October, Madrid, p 2018

  35. Vera-Diaz JM, Pizarro D, Macias-Guarasa J (2018) Towards end-to-end acoustic localization using deep learning: From audio signals to source position coordinates. Sensors 18(10):3418

    Article  Google Scholar 

  36. Vecchiotti P, Ma N, Squartini S, Brown GJ (2019) End-to-end binaural sound localisation from the raw waveform. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2019

  37. Siami-Namini S, Tavakoli N, Namin AS (2019) A comparative analysis of forecasting financial time series using arima, lstm, and bilstm. https://doi.org/10.48550/arXiv.1911.09512

  38. Wilson J, Lin MiC (2011) 3d-mov: Audio-visual lstm autoencoder for 3d reconstruction of multiple objects from video. CoRR

  39. Wang Z, Li J, Yan Y (2018) Target speaker localization based on the complex watson mixture model and time-frequency selection neural network. Appl Sci 8(11):2326

    Article  Google Scholar 

  40. Kim Y, Ling H (2011) Direction of arrival estimation of humans with a small sensor array using an artificial neural network. Progress Electro Res B 27:127–149

    Article  Google Scholar 

  41. Youssef K, Argentieri S, Zarader JL (2013) A learning-based approach to robust binaural sound localization. In 2013 IEEE/RSJ international conference on intelligent robots and systems, pages 2927–2932

  42. He W, Motlicek P, Odobez JM (2018) Deep neural networks for multiple speaker detection and localization. In 2018 IEEE international conference on robotics and automation (ICRA), pages 74–79. IEEE

  43. Argentieri S, Danès P, Souères P (2015) A survey on sound source localization in robotics: From binaural to array processing methods. Comput Speech Lang 34(1):87–112

    Article  Google Scholar 

  44. Li X, Girin L, Badeig F, Horaud R (2016) Reverberant sound localization with a robot head based on direct-path relative transfer function. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2016

  45. Boztaş G, AydoğmuÃ Ö (2021) Implementation of pure pursuit algorithm for nonholonomic mobile robot using robot operating system. Balkan J Electric Comput Eng 9:337–341

    Article  Google Scholar 

  46. Dehghan Firoozabadi A, Irarrazaval P, Adasme P, Zabala-Blanco D, Palacios-Játiva P, Azurdia-Meza C (2020) 3d multiple sound source localization by proposed cuboids nested microphone array in combination with adaptive wavelet-based subband gevd. Electronics 9(5):867

    Article  Google Scholar 

  47. Bingol MC, Aydogmus O (2020) Practical application of a safe human-robot interaction software. Industrial Robot: Inter J Robotics Res Appl 47(3):359–368

    Article  Google Scholar 

  48. Johnston J (1980) A filter family designed for use in quadrature mirror filter banks. In ICASSP ’80. IEEE international conference on acoustics, speech, and signal processing, vol 5, pages 291–294

  49. Bingol MC (2021) Development of artificial intelligence-based self-programmable robot software compatible with industry 4.0 using human-robot interaction. Firat University, Ph.D

  50. Hu Y, Huber A EG Jithendar A, Shih-Chii L (2018) Overcoming the vanishing gradient problem in plain recurrent networks. 6th International Conference on Learning Representations (ICLR 2018), May 2018

  51. Rascon C, Meza I (2017) Localization of sound sources in robotics: a review. Robotics Auto Syst 96:184–210

    Article  Google Scholar 

  52. Yiwere M, Rhee EJ (2017) Distance estimation and localization of sound sources in reverberant conditions using deep neural networks. Int J Appl Eng Res 12(22):12384–12389

    Google Scholar 

  53. Go YJ, Choi JS (2021) An acoustic source localization method using a drone-mounted phased microphone array. Drones 5(3):75

    Article  Google Scholar 

  54. Li X, Shen Miao, Wang W, Liu H (2012) Real-time sound source localization for a mobile robot based on the guided spectral-temporal position method. Inter J Adv Robotic Syst 9(3):78

    Article  Google Scholar 

  55. Tan TH, Lin YT, Chang YL, Alkhaleefah M (2021) Sound source localization using a convolutional neural network and regression model. Sensors 21(23):8031

    Article  Google Scholar 

Download references

Acknowledgements

The author would like to thank the FIRAT University Scientific Research Projects Unit (FUBAP) for their financial support for the current study (Project No:TEKF.21.24).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Boztas.

Ethics declarations

Conflict of interest

The authors have NO affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boztas, G. Sound source localization for auditory perception of a humanoid robot using deep neural networks. Neural Comput & Applic 35, 6801–6811 (2023). https://doi.org/10.1007/s00521-022-08047-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-08047-x

Keywords