Abstract
In this study, a novel sound localization approach is proposed that provides 3D coordinates of the real moving speaker. Sound recordings of a real user indoor environment were used for the proposed study. Four conventional microphones simultaneously recorded speech signals as the user moved between 14 predetermined locations. For extracting environment noise from recorded sound signals and accurately determining the origin of speech, z-score-based peak detection approach is used. The delays between acquired speech signals are calculated with the generalized cross-correlation phase transform approach. The determined delays are transformed into a special distance matrix, and each of these matrices is assigned to a particular speaker location in 3D space. A novel lightweight convolutional neural network-based deep regression network structure was constructed in order to learn the relationship between these distance matrices and real 3D location information. As a result, the sound localization problem has been transformed from an iterative solution to an innovative regression problem structure. With the low-cost traditional microphone structures and hardware used in this approach, the position of moving speaker is determined with high accuracy compared to the particle swarm optimization-based time difference of arrival approach. According to the performance comparison, the average localization deviation of 45.826 cm obtained in the time difference of arrival-based sound source localization approach was reduced to 16.298 cm in the proposed approach.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Risoud M et al (2018) Sound source localization. Eur Annal Otorhinolaryngol Head Neck Dis. https://doi.org/10.1016/j.anorl.2018.04.009
Rascon C, Meza I (2017) Localization of sound sources in robotics: a review. Rob Auton Syst 96:184–210. https://doi.org/10.1016/j.robot.2017.07.011
Valenzise G, Gerosa L, Tagliasacchi M, Antonacci F, Sarti A (2007) “Scream and gunshot detection and localization for audio-surveillance systems,” In: 2007 IEEE conference on advanced video and signal based surveillance, AVSS 2007 proceedings, 2007, pp 21–26 https://doi.org/10.1109/AVSS.2007.4425280
White MJ, Nykaza ET, Hulva A (2017) Localization and source assignment of blast noises from a military training installation. J Acoust Soc Am 141(5):3985–3985. https://doi.org/10.1121/1.4989110
Saeidi A, Almasganj F (2017) 3D heart sound source localization via combinational subspace methods for long-term heart monitoring. Biomed Signal Process Control 31:434–443. https://doi.org/10.1016/j.bspc.2016.08.001
Senocak A, Tae-Hyun O, Kim J, Yang M-H, Kweon IS (2021) Learning to localize sound sources in visual scenes: analysis and applications. IEEE Transact Pattern Anal Mach Intell 43(5):1605–1619. https://doi.org/10.1109/TPAMI.2019.2952095
Do HM, Pham M, Sheng W, Yang D, Liu M (2018) RiSH: a robot-integrated smart home for elderly care. Rob Auton Syst 101:74–92. https://doi.org/10.1016/j.robot.2017.12.008
An I, Son M, Manocha D, Yoon SE (2018) “Reflection-Aware Sound Source Localization,” https://doi.org/10.1109/ICRA.2018.8461268
Purwins H, Li B, Virtanen T, Schlüter J, Chang SY, Sainath T (2019) Deep learning for audio signal processing. IEEE J Sel Top Signal Process 13(2):206–219. https://doi.org/10.1109/JSTSP.2019.2908700
Bianco MJ et al (2019) Machine learning in acoustics: theory and applications. J Acoust Soc Am 146(5):3590–3628. https://doi.org/10.1121/1.5133944
Subramanian AS, Weng C, Watanabe S, Yu M, Yu D (2022) Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition. Comput Speech Lang 75:101360. https://doi.org/10.1016/j.csl.2022.101360
Adavanne S, Politis A, Nikunen J, Virtanen T (2019) Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J Sel Top Signal Process 13(1):34–48. https://doi.org/10.1109/JSTSP.2018.2885636
Jonathan Sheaffer (2013) “From source to brain: Modelling sound propagation and localisation in rooms,” University of Salford
Tardif E, Murray MM, Meylan R, Spierer L, Clarke S (2006) The spatio-temporal brain dynamics of processing and integrating sound localization cues in humans. Brain Res 1092(1):161–176. https://doi.org/10.1016/j.brainres.2006.03.095
Fastl H, Zwicker E (2007) Psychoacoustics. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68888-4
Tachikawa T, Yatabe K, Oikawa Y (2018) 3D sound source localization based on coherence-adjusted monopole dictionary and modified convex clustering. Appl Acoust 139:267–281. https://doi.org/10.1016/j.apacoust.2018.04.033
Grumiaux P-A, Kitić S, Girin L, Guérin A (2022) A survey of sound source localization with deep learning methods. J Acoust Soc Am 152(1):107–151. https://doi.org/10.1121/10.0011809
Wang Z-Q, Zhang X, Wang DL (2019) Robust speaker localization guided by deep Learning-based time-frequency masking. IEEE/ACM Transact Audio Speech Lang Process 27(1):178–188. https://doi.org/10.1109/TASLP.2018.2876169
Chakrabarty S, Habets EA (2019) Multi-speaker DOA estimation using deep convolutional networks trained with noise signals. IEEE J Sel Top Signal Process 13(1):8–21. https://doi.org/10.1109/JSTSP.2019.2901664
Rui Y, Zhou Z, Cai X, Dong L (2022) A novel robust method for acoustic emission source location using DBSCAN principle. Measurement 191:110812. https://doi.org/10.1016/j.measurement.2022.110812
Zhang X, Wang DL (2017) Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Transact Audio Speech Lang Processing 25(5):1075–1084. https://doi.org/10.1109/TASLP.2017.2687104
Li X-L (2021) On correcting the phase bias of GCC in spatially correlated noise fields. Signal Process 180:107859
Zhong X-l, Xie B-S (2014) Head-related transfer functions and virtual auditory display. In: Glotin H (ed) Soundscape semiotics-localisation and categorisation. InTech. https://doi.org/10.5772/56907
Brinkmann F, Lindau A, Weinzerl S, van de Par S, Müller-Trapet M, Opdam R, Vorländer M (2017) A high resolution and full-spherical head-related transfer function database for different head-above-torso orientations. J Audio Eng Soc 65(10):841–848. https://doi.org/10.17743/jaes.2017.0033
Li J, Biao W, Yao D, Yan Y (2021) A mixed-order modeling approach for head-related transfer function in the spherical harmonic domain. Appl Acoust 176:107828. https://doi.org/10.1016/j.apacoust.2020.107828
Carlile S (2014) The plastic ear and perceptual relearning in auditory spatial perception. Front Neurosci. https://doi.org/10.3389/fnins.2014.00237
Kraljevic L, Russo M, Stella M, Sikora M (2020) Free-field TDOA-AOA sound source localization using three soundfield microphones. IEEE Access 8:87749–87761. https://doi.org/10.1109/ACCESS.2020.2993076
Liu H, Chen Y, Lin Y, Xiao Q (2021) A multiple sources localization method based on TDOA without association ambiguity for near and far mixed field sources. Circuits Syst Signal Process 40(8):4018–4046
Catalbas MC, Dobrisek S (2017) 3D moving sound source localization via conventional microphones. Elektronika ir Elektrotechnika. https://doi.org/10.5755/j01.eie.23.4.18724
Li X, Deng ZD, Rauchenstein LT, Carlson TJ (2016) Contributed review: Source-localization algorithms and applications using time of arrival and time difference of arrival measurements. Rev Sci Instrum 87(4):041502
Liu H, Chen Y, Huang Y, Cheng X, Xiao Q (2021) Study on the localization method of multi-aperture acoustic array based on TDOA. IEEE Sens J 21(12):13805–13814
Lee R, Kang M-S, Kim B-H, Park K-H, Lee SQ, Park H-M (2020) Sound source localization based on GCC-PHAT with diffuseness mask in noisy and reverberant environments. IEEE Access 8:7373–7382. https://doi.org/10.1109/ACCESS.2019.2963768
Hayber SE, Keser S (2020) 3D sound source localization with fiber optic sensor array based on genetic algorithm. Opt Fiber Technol 57:102229
Chen H, Ballal T, Saeed N, Alouini M-S, Al-Naffouri TY (2020) A joint TDOA-PDOA localization approach using particle swarm optimization. IEEE Wirel Commun Lett 9(8):1240–1244. https://doi.org/10.1109/LWC.2020.2986756
Lathuiliere S, Mesejo P, Alameda-Pineda X, Horaud R (2020) A comprehensive analysis of deep regression. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2910523
Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A Survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/tnnls.2021.3084827
Seeliger K et al (2018) Convolutional neural network-based encoding and decoding of visual object recognition in space and time. Neuroimage. https://doi.org/10.1016/j.neuroimage.2017.07.018
Aceto G, Ciuonzo D, Montieri A, Pescape A (2019) Mobile encrypted traffic classification using deep learning: experimental evaluation, lessons learned, and challenges. IEEE Trans Netw Serv Manage 16(2):445–458. https://doi.org/10.1109/TNSM.2019.2899085
O’Shea T, Hoydis J (2017) An introduction to deep learning for the physical layer. IEEE Transact Cogn Commun Netw 3(4):563–575. https://doi.org/10.1109/TCCN.2017.2758370
Liang P, Deng C, Wu J, Yang Z (2020) Intelligent fault diagnosis of rotating machinery via wavelet transform, generative adversarial nets and convolutional neural network. Measurement 159:107768. https://doi.org/10.1016/j.measurement.2020.107768
Catalbas MC, Cegovnik T, Sodnik J, Gulten A (2018) “Driver fatigue detection based on saccadic eye movements,” In: 2017 10th international conference on electrical and electronics engineering, ELECO 2017, vol 2018 January
JP van Brakel (2022) “Peak signal detection in realtime timeseries data.” https://stackoverflow.com/questions/22583391/peak-signal-detection-in-realtime-timeseries-data (Accessed July 25, 2022)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Catalbas, M.C., Dobrisek, S. Dynamic speaker localization based on a novel lightweight R–CNN model. Neural Comput & Applic 35, 10589–10603 (2023). https://doi.org/10.1007/s00521-023-08251-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08251-3