Abstract
This paper presents an estimation of the sound source location using deep neural networks in order to provide auditory perception of a humanoid robot. Estimation of a moving sound source is crucial for a humanoid robot to improve functionality in some environments where the robot’s camera cannot operate. It plays an important role, especially in a recovery scenario with no visual contact. In this study, the data of the sound source around the robot were recorded by four microphones placed on the humanoid robot’s head. A wheeled robot was used to obtain the sound source with odometry. Recorded sound dataset and collected odometry dataset were used as input data and target data, respectively. The discrete wavelet transform (DWT) was applied for pre-processing of the input data. After pre-processing, the obtained matrices were applied as inputs of the proposed convolutional neural network (CNN), long short-term memory (LSTM), bidirectional long-short-term memory (biLSTM), and multilayer perceptron (MLP) networks to estimate the sound source location around the humanoid robot. As a result of all tests for the estimation models created by proposed networks, the \(R^2\) metrics of the biLSTM structure were obtained as approximately 0.97. This study showed experimentally that humanoid robots can sense the position of sound source in the environment with sufficient accuracy like many living creatures.










Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Kahraman C, Bolturk E, Onar SC, Oztaysi B (2021) Modeling Humanoid Robots Using Fuzzy Set Extensions, pp. 99–119. Springer International Publishing, Cham
Saeedvand S, Jafari M, Aghdasi HS, Baltes J (2019) A comprehensive survey on humanoid robot development. The Knowledge Engineering Review 34(20):1–18
Chesher C, Andreallo F (2021) Eye machines: Robot eye, vision and gaze. Inter J Soc Robotics
Dai Y, Gao S (2021) A flexible multi-functional smart skin for force, touch position, proximity, and humidity sensing for humanoid robots. IEEE Sens J 21(23):26355–26363
Rojas-Quintero JA, Rodríguez-Liñán MC (2021) A literature review of sensor heads for humanoid robots. Robot Auton Syst 143:103834
Yan F, Iliyasu AM, Hirota K (2021) Emotion space modelling for social robots. Eng Appl Art Intell 100:104178
Plack CJ (2018) The Sense of Hearing
Bingol MC, Omur A (2020) Performing predefined tasks using the human-robot interaction on speech recognition for an industrial robot. Eng Appl Artif Intell 95:103903
Natera MAS, Rodriguez-Osorio RM, de Haro Ariet L, Perez MS, (2012) Calibration proposal for new antenna array architectures and technologies for space communications. IEEE Antennas Wireless Propagat Lett 11:1129–1132
Gergen S, Nagathil A, Martin R (2015) Classification of reverberant audio signals using clustered ad hoc distributed microphones. Signal Process 107:21–32
Thomas F, Ros L (2005) Revisiting trilateration for robot localization. IEEE Transact Robot 21(1):93–101
Shaukat MA, Shaukat HR, Qadir Z, Munawar HS, Kouzani AZ, Mahmud MAP (2021) Cluster analysis and model comparison using smart meter data. Sensors 21(9):1–21
de Jesús Rubio J (2021) Stability analysis of the modified levenberg-marquardt algorithm for the artificial neural network training. IEEE Transact Neural Netw Learn Syst 32(8):3510–3524
de Jesús Rubio J, Lughofer E, Pieper J, Cruz P, Martinez DI, Ochoa G, Islas MA, Garcia E (2021) Adapting h-infinity controller for the desired reference tracking of the sphere position in the maglev process. Inform Sci 569:669–686
Chiang HS, Chen MY, Huang YJ (2019) Wavelet-based eeg processing for epilepsy detection using fuzzy entropy and associative petri net. IEEE Access 7:103255–103262
de Jesús Rubio J, Islas MA, Ochoa G, Cruz DR, Garcia E, Pacheco J (2022) Convergent newton method and neural network for the electric energy usage prediction. Inform Sci 585:89–112
Christudas F, Dhanraj AV (2020) System identification using long short term memory recurrent neural networks for real time conical tank system. Romanian J Inform Sci Technol, 23(T):T57–T77
Zhao JY, Gong J, Ma ST, Lu ZM, Chu SC, Roddick JF (2019) Finger vein recognition scheme based on convolutional neural network using curvature gray image. J. Netw. Intell. 4(3):114–123
Albu A, Precup RE, Teban TA (2019) Results and challenges of artificial neural networks used for decision-making and control in medical applications. Facta Universitatis, Series: Mech Eng 17(3):285–308
Yalta N, Nakadai K, Ogata T (2017) Sound source localization using deep learning models. J Robot Mech 29(1):37–48
Ma W, Liu X (2019) Phased microphone array for sound source localization with deep learning. Aerosp Syst 2(2):71–81
Nakamura E, Kageyama Y, Hirose S (2022) Lstm-based japanese speaker identification using an omnidirectional camera and voice information. IEEJ Transact Electrical Electron Eng 17(5):674–684
Yu W, Yu H, Wang D, Du J, Zhang M (2021) Sl-bilstm: A signal-based bidirectional lstm network for over-the-horizon target localization. Math Probl Eng, 1
Desai D, Mehendale N (2021) A review on sound source localization systems. Available at SSRN 3891373
Huang L, Liu G, Wang Y, Yuan H, Chen T (2022) Fire detection in video surveillances using convolutional neural networks and wavelet transform. Eng Appl Artif Intell 110:104737
Janse PV, Magre SB, Kurzekar PK, Deshmukh R (2014) A comparative study between mfcc and dwt feature extraction technique. Inter J Eng Res Technol 3(1):3124–3127
Irie RE (1995) Robust sound localization: An application of an auditory perception system for a humanoid robot. Massachusetts Institute of Technology, Master Thesis
Grumiaux PA, Kitić S, Girin L, Guérin A (2022) A Survey of Sound Source Localization with Deep Learning Methods. J Acoust Soc Am 152(107):107–151
Hirvonen T (2015) Classification of spatial audio location and content using convolutional neural networks. J Audio Eng Soc 1(1):9294
Chakrabarty S, Habets EAP (2017) Broadband doa estimation using convolutional neural networks trained with noise signals. In 2017 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), October 2017
Chakrabarty S, Habets E AP (2017) Multi-speaker localization using convolutional neural network trained with noise. 31st Conference on Neural Information Processing Systems (NIPS 2017), December 2017
He W, Motlicek P, Odobez JM (2018) Deep neural networks for multiple speaker detection and localization. 2018 IEEE International Conference on Robotics and Automation (ICRA), May 2018
Thuillier E, Gamper H, Tashev IJ (2018) Spatial audio feature discovery with convolutional neural networks. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 6797–6801
Quan N, Laurent G, Gérard B, Frédéric E (2018) Nguyen Duc-Canh, Learning Autonomous Sensorimotor, for Sound Source Localization by a Humanoid Robot. In IROS, (2018) Workshop on crossmodal learning for intelligent robotics in conjunction with IEEE/RSJ IROS. Spain, October, Madrid, p 2018
Vera-Diaz JM, Pizarro D, Macias-Guarasa J (2018) Towards end-to-end acoustic localization using deep learning: From audio signals to source position coordinates. Sensors 18(10):3418
Vecchiotti P, Ma N, Squartini S, Brown GJ (2019) End-to-end binaural sound localisation from the raw waveform. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2019
Siami-Namini S, Tavakoli N, Namin AS (2019) A comparative analysis of forecasting financial time series using arima, lstm, and bilstm. https://doi.org/10.48550/arXiv.1911.09512
Wilson J, Lin MiC (2011) 3d-mov: Audio-visual lstm autoencoder for 3d reconstruction of multiple objects from video. CoRR
Wang Z, Li J, Yan Y (2018) Target speaker localization based on the complex watson mixture model and time-frequency selection neural network. Appl Sci 8(11):2326
Kim Y, Ling H (2011) Direction of arrival estimation of humans with a small sensor array using an artificial neural network. Progress Electro Res B 27:127–149
Youssef K, Argentieri S, Zarader JL (2013) A learning-based approach to robust binaural sound localization. In 2013 IEEE/RSJ international conference on intelligent robots and systems, pages 2927–2932
He W, Motlicek P, Odobez JM (2018) Deep neural networks for multiple speaker detection and localization. In 2018 IEEE international conference on robotics and automation (ICRA), pages 74–79. IEEE
Argentieri S, Danès P, Souères P (2015) A survey on sound source localization in robotics: From binaural to array processing methods. Comput Speech Lang 34(1):87–112
Li X, Girin L, Badeig F, Horaud R (2016) Reverberant sound localization with a robot head based on direct-path relative transfer function. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2016
Boztaş G, AydoğmuÃ Ö (2021) Implementation of pure pursuit algorithm for nonholonomic mobile robot using robot operating system. Balkan J Electric Comput Eng 9:337–341
Dehghan Firoozabadi A, Irarrazaval P, Adasme P, Zabala-Blanco D, Palacios-Játiva P, Azurdia-Meza C (2020) 3d multiple sound source localization by proposed cuboids nested microphone array in combination with adaptive wavelet-based subband gevd. Electronics 9(5):867
Bingol MC, Aydogmus O (2020) Practical application of a safe human-robot interaction software. Industrial Robot: Inter J Robotics Res Appl 47(3):359–368
Johnston J (1980) A filter family designed for use in quadrature mirror filter banks. In ICASSP ’80. IEEE international conference on acoustics, speech, and signal processing, vol 5, pages 291–294
Bingol MC (2021) Development of artificial intelligence-based self-programmable robot software compatible with industry 4.0 using human-robot interaction. Firat University, Ph.D
Hu Y, Huber A EG Jithendar A, Shih-Chii L (2018) Overcoming the vanishing gradient problem in plain recurrent networks. 6th International Conference on Learning Representations (ICLR 2018), May 2018
Rascon C, Meza I (2017) Localization of sound sources in robotics: a review. Robotics Auto Syst 96:184–210
Yiwere M, Rhee EJ (2017) Distance estimation and localization of sound sources in reverberant conditions using deep neural networks. Int J Appl Eng Res 12(22):12384–12389
Go YJ, Choi JS (2021) An acoustic source localization method using a drone-mounted phased microphone array. Drones 5(3):75
Li X, Shen Miao, Wang W, Liu H (2012) Real-time sound source localization for a mobile robot based on the guided spectral-temporal position method. Inter J Adv Robotic Syst 9(3):78
Tan TH, Lin YT, Chang YL, Alkhaleefah M (2021) Sound source localization using a convolutional neural network and regression model. Sensors 21(23):8031
Acknowledgements
The author would like to thank the FIRAT University Scientific Research Projects Unit (FUBAP) for their financial support for the current study (Project No:TEKF.21.24).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have NO affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Boztas, G. Sound source localization for auditory perception of a humanoid robot using deep neural networks. Neural Comput & Applic 35, 6801–6811 (2023). https://doi.org/10.1007/s00521-022-08047-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-08047-x