Skip to main content
Log in

Dynamic speaker localization based on a novel lightweight R–CNN model

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this study, a novel sound localization approach is proposed that provides 3D coordinates of the real moving speaker. Sound recordings of a real user indoor environment were used for the proposed study. Four conventional microphones simultaneously recorded speech signals as the user moved between 14 predetermined locations. For extracting environment noise from recorded sound signals and accurately determining the origin of speech, z-score-based peak detection approach is used. The delays between acquired speech signals are calculated with the generalized cross-correlation phase transform approach. The determined delays are transformed into a special distance matrix, and each of these matrices is assigned to a particular speaker location in 3D space. A novel lightweight convolutional neural network-based deep regression network structure was constructed in order to learn the relationship between these distance matrices and real 3D location information. As a result, the sound localization problem has been transformed from an iterative solution to an innovative regression problem structure. With the low-cost traditional microphone structures and hardware used in this approach, the position of moving speaker is determined with high accuracy compared to the particle swarm optimization-based time difference of arrival approach. According to the performance comparison, the average localization deviation of 45.826 cm obtained in the time difference of arrival-based sound source localization approach was reduced to 16.298 cm in the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9.
Fig. 10.
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Risoud M et al (2018) Sound source localization. Eur Annal Otorhinolaryngol Head Neck Dis. https://doi.org/10.1016/j.anorl.2018.04.009

    Article  Google Scholar 

  2. Rascon C, Meza I (2017) Localization of sound sources in robotics: a review. Rob Auton Syst 96:184–210. https://doi.org/10.1016/j.robot.2017.07.011

    Article  Google Scholar 

  3. Valenzise G, Gerosa L, Tagliasacchi M, Antonacci F, Sarti A (2007) “Scream and gunshot detection and localization for audio-surveillance systems,” In: 2007 IEEE conference on advanced video and signal based surveillance, AVSS 2007 proceedings, 2007, pp 21–26 https://doi.org/10.1109/AVSS.2007.4425280

  4. White MJ, Nykaza ET, Hulva A (2017) Localization and source assignment of blast noises from a military training installation. J Acoust Soc Am 141(5):3985–3985. https://doi.org/10.1121/1.4989110

    Article  Google Scholar 

  5. Saeidi A, Almasganj F (2017) 3D heart sound source localization via combinational subspace methods for long-term heart monitoring. Biomed Signal Process Control 31:434–443. https://doi.org/10.1016/j.bspc.2016.08.001

    Article  Google Scholar 

  6. Senocak A, Tae-Hyun O, Kim J, Yang M-H, Kweon IS (2021) Learning to localize sound sources in visual scenes: analysis and applications. IEEE Transact Pattern Anal Mach Intell 43(5):1605–1619. https://doi.org/10.1109/TPAMI.2019.2952095

    Article  Google Scholar 

  7. Do HM, Pham M, Sheng W, Yang D, Liu M (2018) RiSH: a robot-integrated smart home for elderly care. Rob Auton Syst 101:74–92. https://doi.org/10.1016/j.robot.2017.12.008

    Article  Google Scholar 

  8. An I, Son M, Manocha D, Yoon SE (2018) “Reflection-Aware Sound Source Localization,” https://doi.org/10.1109/ICRA.2018.8461268

  9. Purwins H, Li B, Virtanen T, Schlüter J, Chang SY, Sainath T (2019) Deep learning for audio signal processing. IEEE J Sel Top Signal Process 13(2):206–219. https://doi.org/10.1109/JSTSP.2019.2908700

    Article  Google Scholar 

  10. Bianco MJ et al (2019) Machine learning in acoustics: theory and applications. J Acoust Soc Am 146(5):3590–3628. https://doi.org/10.1121/1.5133944

    Article  Google Scholar 

  11. Subramanian AS, Weng C, Watanabe S, Yu M, Yu D (2022) Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition. Comput Speech Lang 75:101360. https://doi.org/10.1016/j.csl.2022.101360

    Article  Google Scholar 

  12. Adavanne S, Politis A, Nikunen J, Virtanen T (2019) Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J Sel Top Signal Process 13(1):34–48. https://doi.org/10.1109/JSTSP.2018.2885636

    Article  Google Scholar 

  13. Jonathan Sheaffer (2013) “From source to brain: Modelling sound propagation and localisation in rooms,” University of Salford

  14. Tardif E, Murray MM, Meylan R, Spierer L, Clarke S (2006) The spatio-temporal brain dynamics of processing and integrating sound localization cues in humans. Brain Res 1092(1):161–176. https://doi.org/10.1016/j.brainres.2006.03.095

    Article  Google Scholar 

  15. Fastl H, Zwicker E (2007) Psychoacoustics. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68888-4

    Book  Google Scholar 

  16. Tachikawa T, Yatabe K, Oikawa Y (2018) 3D sound source localization based on coherence-adjusted monopole dictionary and modified convex clustering. Appl Acoust 139:267–281. https://doi.org/10.1016/j.apacoust.2018.04.033

    Article  Google Scholar 

  17. Grumiaux P-A, Kitić S, Girin L, Guérin A (2022) A survey of sound source localization with deep learning methods. J Acoust Soc Am 152(1):107–151. https://doi.org/10.1121/10.0011809

    Article  Google Scholar 

  18. Wang Z-Q, Zhang X, Wang DL (2019) Robust speaker localization guided by deep Learning-based time-frequency masking. IEEE/ACM Transact Audio Speech Lang Process 27(1):178–188. https://doi.org/10.1109/TASLP.2018.2876169

    Article  Google Scholar 

  19. Chakrabarty S, Habets EA (2019) Multi-speaker DOA estimation using deep convolutional networks trained with noise signals. IEEE J Sel Top Signal Process 13(1):8–21. https://doi.org/10.1109/JSTSP.2019.2901664

    Article  Google Scholar 

  20. Rui Y, Zhou Z, Cai X, Dong L (2022) A novel robust method for acoustic emission source location using DBSCAN principle. Measurement 191:110812. https://doi.org/10.1016/j.measurement.2022.110812

    Article  Google Scholar 

  21. Zhang X, Wang DL (2017) Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Transact Audio Speech Lang Processing 25(5):1075–1084. https://doi.org/10.1109/TASLP.2017.2687104

    Article  Google Scholar 

  22. Li X-L (2021) On correcting the phase bias of GCC in spatially correlated noise fields. Signal Process 180:107859

    Article  Google Scholar 

  23. Zhong X-l, Xie B-S (2014) Head-related transfer functions and virtual auditory display. In: Glotin H (ed) Soundscape semiotics-localisation and categorisation. InTech. https://doi.org/10.5772/56907

    Chapter  Google Scholar 

  24. Brinkmann F, Lindau A, Weinzerl S, van de Par S, Müller-Trapet M, Opdam R, Vorländer M (2017) A high resolution and full-spherical head-related transfer function database for different head-above-torso orientations. J Audio Eng Soc 65(10):841–848. https://doi.org/10.17743/jaes.2017.0033

    Article  Google Scholar 

  25. Li J, Biao W, Yao D, Yan Y (2021) A mixed-order modeling approach for head-related transfer function in the spherical harmonic domain. Appl Acoust 176:107828. https://doi.org/10.1016/j.apacoust.2020.107828

    Article  Google Scholar 

  26. Carlile S (2014) The plastic ear and perceptual relearning in auditory spatial perception. Front Neurosci. https://doi.org/10.3389/fnins.2014.00237

    Article  Google Scholar 

  27. Kraljevic L, Russo M, Stella M, Sikora M (2020) Free-field TDOA-AOA sound source localization using three soundfield microphones. IEEE Access 8:87749–87761. https://doi.org/10.1109/ACCESS.2020.2993076

    Article  Google Scholar 

  28. Liu H, Chen Y, Lin Y, Xiao Q (2021) A multiple sources localization method based on TDOA without association ambiguity for near and far mixed field sources. Circuits Syst Signal Process 40(8):4018–4046

    Article  Google Scholar 

  29. Catalbas MC, Dobrisek S (2017) 3D moving sound source localization via conventional microphones. Elektronika ir Elektrotechnika. https://doi.org/10.5755/j01.eie.23.4.18724

    Article  Google Scholar 

  30. Li X, Deng ZD, Rauchenstein LT, Carlson TJ (2016) Contributed review: Source-localization algorithms and applications using time of arrival and time difference of arrival measurements. Rev Sci Instrum 87(4):041502

    Article  Google Scholar 

  31. Liu H, Chen Y, Huang Y, Cheng X, Xiao Q (2021) Study on the localization method of multi-aperture acoustic array based on TDOA. IEEE Sens J 21(12):13805–13814

    Article  Google Scholar 

  32. Lee R, Kang M-S, Kim B-H, Park K-H, Lee SQ, Park H-M (2020) Sound source localization based on GCC-PHAT with diffuseness mask in noisy and reverberant environments. IEEE Access 8:7373–7382. https://doi.org/10.1109/ACCESS.2019.2963768

    Article  Google Scholar 

  33. Hayber SE, Keser S (2020) 3D sound source localization with fiber optic sensor array based on genetic algorithm. Opt Fiber Technol 57:102229

    Article  Google Scholar 

  34. Chen H, Ballal T, Saeed N, Alouini M-S, Al-Naffouri TY (2020) A joint TDOA-PDOA localization approach using particle swarm optimization. IEEE Wirel Commun Lett 9(8):1240–1244. https://doi.org/10.1109/LWC.2020.2986756

    Article  Google Scholar 

  35. Lathuiliere S, Mesejo P, Alameda-Pineda X, Horaud R (2020) A comprehensive analysis of deep regression. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2910523

    Article  Google Scholar 

  36. Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A Survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/tnnls.2021.3084827

    Article  Google Scholar 

  37. Seeliger K et al (2018) Convolutional neural network-based encoding and decoding of visual object recognition in space and time. Neuroimage. https://doi.org/10.1016/j.neuroimage.2017.07.018

    Article  Google Scholar 

  38. Aceto G, Ciuonzo D, Montieri A, Pescape A (2019) Mobile encrypted traffic classification using deep learning: experimental evaluation, lessons learned, and challenges. IEEE Trans Netw Serv Manage 16(2):445–458. https://doi.org/10.1109/TNSM.2019.2899085

    Article  Google Scholar 

  39. O’Shea T, Hoydis J (2017) An introduction to deep learning for the physical layer. IEEE Transact Cogn Commun Netw 3(4):563–575. https://doi.org/10.1109/TCCN.2017.2758370

    Article  Google Scholar 

  40. Liang P, Deng C, Wu J, Yang Z (2020) Intelligent fault diagnosis of rotating machinery via wavelet transform, generative adversarial nets and convolutional neural network. Measurement 159:107768. https://doi.org/10.1016/j.measurement.2020.107768

    Article  Google Scholar 

  41. Catalbas MC, Cegovnik T, Sodnik J, Gulten A (2018) “Driver fatigue detection based on saccadic eye movements,” In: 2017 10th international conference on electrical and electronics engineering, ELECO 2017, vol 2018 January

  42. JP van Brakel (2022) “Peak signal detection in realtime timeseries data.” https://stackoverflow.com/questions/22583391/peak-signal-detection-in-realtime-timeseries-data (Accessed July 25, 2022)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehmet Cem Catalbas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Catalbas, M.C., Dobrisek, S. Dynamic speaker localization based on a novel lightweight R–CNN model. Neural Comput & Applic 35, 10589–10603 (2023). https://doi.org/10.1007/s00521-023-08251-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08251-3

Keywords

Navigation