Dynamic speaker localization based on a novel lightweight R–CNN model

Catalbas, Mehmet Cem; Dobrisek, Simon

doi:10.1007/s00521-023-08251-3

Dynamic speaker localization based on a novel lightweight R–CNN model

Original Article
Published: 21 January 2023

Volume 35, pages 10589–10603, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

505 Accesses
1 Altmetric
Explore all metrics

Abstract

In this study, a novel sound localization approach is proposed that provides 3D coordinates of the real moving speaker. Sound recordings of a real user indoor environment were used for the proposed study. Four conventional microphones simultaneously recorded speech signals as the user moved between 14 predetermined locations. For extracting environment noise from recorded sound signals and accurately determining the origin of speech, z-score-based peak detection approach is used. The delays between acquired speech signals are calculated with the generalized cross-correlation phase transform approach. The determined delays are transformed into a special distance matrix, and each of these matrices is assigned to a particular speaker location in 3D space. A novel lightweight convolutional neural network-based deep regression network structure was constructed in order to learn the relationship between these distance matrices and real 3D location information. As a result, the sound localization problem has been transformed from an iterative solution to an innovative regression problem structure. With the low-cost traditional microphone structures and hardware used in this approach, the position of moving speaker is determined with high accuracy compared to the particle swarm optimization-based time difference of arrival approach. According to the performance comparison, the average localization deviation of 45.826 cm obtained in the time difference of arrival-based sound source localization approach was reduced to 16.298 cm in the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review on Sound Source Localization Systems

Article 05 May 2022

Dynamically localizing multiple speakers based on the time-frequency domain

Article Open access 08 April 2021

Sound source localization for auditory perception of a humanoid robot using deep neural networks

Article 29 November 2022

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Risoud M et al (2018) Sound source localization. Eur Annal Otorhinolaryngol Head Neck Dis. https://doi.org/10.1016/j.anorl.2018.04.009
Article Google Scholar
Rascon C, Meza I (2017) Localization of sound sources in robotics: a review. Rob Auton Syst 96:184–210. https://doi.org/10.1016/j.robot.2017.07.011
Article Google Scholar
Valenzise G, Gerosa L, Tagliasacchi M, Antonacci F, Sarti A (2007) “Scream and gunshot detection and localization for audio-surveillance systems,” In: 2007 IEEE conference on advanced video and signal based surveillance, AVSS 2007 proceedings, 2007, pp 21–26 https://doi.org/10.1109/AVSS.2007.4425280
White MJ, Nykaza ET, Hulva A (2017) Localization and source assignment of blast noises from a military training installation. J Acoust Soc Am 141(5):3985–3985. https://doi.org/10.1121/1.4989110
Article Google Scholar
Saeidi A, Almasganj F (2017) 3D heart sound source localization via combinational subspace methods for long-term heart monitoring. Biomed Signal Process Control 31:434–443. https://doi.org/10.1016/j.bspc.2016.08.001
Article Google Scholar
Senocak A, Tae-Hyun O, Kim J, Yang M-H, Kweon IS (2021) Learning to localize sound sources in visual scenes: analysis and applications. IEEE Transact Pattern Anal Mach Intell 43(5):1605–1619. https://doi.org/10.1109/TPAMI.2019.2952095
Article Google Scholar
Do HM, Pham M, Sheng W, Yang D, Liu M (2018) RiSH: a robot-integrated smart home for elderly care. Rob Auton Syst 101:74–92. https://doi.org/10.1016/j.robot.2017.12.008
Article Google Scholar
An I, Son M, Manocha D, Yoon SE (2018) “Reflection-Aware Sound Source Localization,” https://doi.org/10.1109/ICRA.2018.8461268
Purwins H, Li B, Virtanen T, Schlüter J, Chang SY, Sainath T (2019) Deep learning for audio signal processing. IEEE J Sel Top Signal Process 13(2):206–219. https://doi.org/10.1109/JSTSP.2019.2908700
Article Google Scholar
Bianco MJ et al (2019) Machine learning in acoustics: theory and applications. J Acoust Soc Am 146(5):3590–3628. https://doi.org/10.1121/1.5133944
Article Google Scholar
Subramanian AS, Weng C, Watanabe S, Yu M, Yu D (2022) Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition. Comput Speech Lang 75:101360. https://doi.org/10.1016/j.csl.2022.101360
Article Google Scholar
Adavanne S, Politis A, Nikunen J, Virtanen T (2019) Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J Sel Top Signal Process 13(1):34–48. https://doi.org/10.1109/JSTSP.2018.2885636
Article Google Scholar
Jonathan Sheaffer (2013) “From source to brain: Modelling sound propagation and localisation in rooms,” University of Salford
Tardif E, Murray MM, Meylan R, Spierer L, Clarke S (2006) The spatio-temporal brain dynamics of processing and integrating sound localization cues in humans. Brain Res 1092(1):161–176. https://doi.org/10.1016/j.brainres.2006.03.095
Article Google Scholar
Fastl H, Zwicker E (2007) Psychoacoustics. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68888-4
Book Google Scholar
Tachikawa T, Yatabe K, Oikawa Y (2018) 3D sound source localization based on coherence-adjusted monopole dictionary and modified convex clustering. Appl Acoust 139:267–281. https://doi.org/10.1016/j.apacoust.2018.04.033
Article Google Scholar
Grumiaux P-A, Kitić S, Girin L, Guérin A (2022) A survey of sound source localization with deep learning methods. J Acoust Soc Am 152(1):107–151. https://doi.org/10.1121/10.0011809
Article Google Scholar
Wang Z-Q, Zhang X, Wang DL (2019) Robust speaker localization guided by deep Learning-based time-frequency masking. IEEE/ACM Transact Audio Speech Lang Process 27(1):178–188. https://doi.org/10.1109/TASLP.2018.2876169
Article Google Scholar
Chakrabarty S, Habets EA (2019) Multi-speaker DOA estimation using deep convolutional networks trained with noise signals. IEEE J Sel Top Signal Process 13(1):8–21. https://doi.org/10.1109/JSTSP.2019.2901664
Article Google Scholar
Rui Y, Zhou Z, Cai X, Dong L (2022) A novel robust method for acoustic emission source location using DBSCAN principle. Measurement 191:110812. https://doi.org/10.1016/j.measurement.2022.110812
Article Google Scholar
Zhang X, Wang DL (2017) Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Transact Audio Speech Lang Processing 25(5):1075–1084. https://doi.org/10.1109/TASLP.2017.2687104
Article Google Scholar
Li X-L (2021) On correcting the phase bias of GCC in spatially correlated noise fields. Signal Process 180:107859
Article Google Scholar
Zhong X-l, Xie B-S (2014) Head-related transfer functions and virtual auditory display. In: Glotin H (ed) Soundscape semiotics-localisation and categorisation. InTech. https://doi.org/10.5772/56907
Chapter Google Scholar
Brinkmann F, Lindau A, Weinzerl S, van de Par S, Müller-Trapet M, Opdam R, Vorländer M (2017) A high resolution and full-spherical head-related transfer function database for different head-above-torso orientations. J Audio Eng Soc 65(10):841–848. https://doi.org/10.17743/jaes.2017.0033
Article Google Scholar
Li J, Biao W, Yao D, Yan Y (2021) A mixed-order modeling approach for head-related transfer function in the spherical harmonic domain. Appl Acoust 176:107828. https://doi.org/10.1016/j.apacoust.2020.107828
Article Google Scholar
Carlile S (2014) The plastic ear and perceptual relearning in auditory spatial perception. Front Neurosci. https://doi.org/10.3389/fnins.2014.00237
Article Google Scholar
Kraljevic L, Russo M, Stella M, Sikora M (2020) Free-field TDOA-AOA sound source localization using three soundfield microphones. IEEE Access 8:87749–87761. https://doi.org/10.1109/ACCESS.2020.2993076
Article Google Scholar
Liu H, Chen Y, Lin Y, Xiao Q (2021) A multiple sources localization method based on TDOA without association ambiguity for near and far mixed field sources. Circuits Syst Signal Process 40(8):4018–4046
Article Google Scholar
Catalbas MC, Dobrisek S (2017) 3D moving sound source localization via conventional microphones. Elektronika ir Elektrotechnika. https://doi.org/10.5755/j01.eie.23.4.18724
Article Google Scholar
Li X, Deng ZD, Rauchenstein LT, Carlson TJ (2016) Contributed review: Source-localization algorithms and applications using time of arrival and time difference of arrival measurements. Rev Sci Instrum 87(4):041502
Article Google Scholar
Liu H, Chen Y, Huang Y, Cheng X, Xiao Q (2021) Study on the localization method of multi-aperture acoustic array based on TDOA. IEEE Sens J 21(12):13805–13814
Article Google Scholar
Lee R, Kang M-S, Kim B-H, Park K-H, Lee SQ, Park H-M (2020) Sound source localization based on GCC-PHAT with diffuseness mask in noisy and reverberant environments. IEEE Access 8:7373–7382. https://doi.org/10.1109/ACCESS.2019.2963768
Article Google Scholar
Hayber SE, Keser S (2020) 3D sound source localization with fiber optic sensor array based on genetic algorithm. Opt Fiber Technol 57:102229
Article Google Scholar
Chen H, Ballal T, Saeed N, Alouini M-S, Al-Naffouri TY (2020) A joint TDOA-PDOA localization approach using particle swarm optimization. IEEE Wirel Commun Lett 9(8):1240–1244. https://doi.org/10.1109/LWC.2020.2986756
Article Google Scholar
Lathuiliere S, Mesejo P, Alameda-Pineda X, Horaud R (2020) A comprehensive analysis of deep regression. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2910523
Article Google Scholar
Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A Survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/tnnls.2021.3084827
Article Google Scholar
Seeliger K et al (2018) Convolutional neural network-based encoding and decoding of visual object recognition in space and time. Neuroimage. https://doi.org/10.1016/j.neuroimage.2017.07.018
Article Google Scholar
Aceto G, Ciuonzo D, Montieri A, Pescape A (2019) Mobile encrypted traffic classification using deep learning: experimental evaluation, lessons learned, and challenges. IEEE Trans Netw Serv Manage 16(2):445–458. https://doi.org/10.1109/TNSM.2019.2899085
Article Google Scholar
O’Shea T, Hoydis J (2017) An introduction to deep learning for the physical layer. IEEE Transact Cogn Commun Netw 3(4):563–575. https://doi.org/10.1109/TCCN.2017.2758370
Article Google Scholar
Liang P, Deng C, Wu J, Yang Z (2020) Intelligent fault diagnosis of rotating machinery via wavelet transform, generative adversarial nets and convolutional neural network. Measurement 159:107768. https://doi.org/10.1016/j.measurement.2020.107768
Article Google Scholar
Catalbas MC, Cegovnik T, Sodnik J, Gulten A (2018) “Driver fatigue detection based on saccadic eye movements,” In: 2017 10th international conference on electrical and electronics engineering, ELECO 2017, vol 2018 January
JP van Brakel (2022) “Peak signal detection in realtime timeseries data.” https://stackoverflow.com/questions/22583391/peak-signal-detection-in-realtime-timeseries-data (Accessed July 25, 2022)

Download references

Author information

Authors and Affiliations

1st Organized Industrial Zone Vocational School, Department of Electronics and Automation, Ankara University, Ankara, Turkey
Mehmet Cem Catalbas
Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
Simon Dobrisek

Authors

Mehmet Cem Catalbas
View author publications
You can also search for this author in PubMed Google Scholar
Simon Dobrisek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehmet Cem Catalbas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Catalbas, M.C., Dobrisek, S. Dynamic speaker localization based on a novel lightweight R–CNN model. Neural Comput & Applic 35, 10589–10603 (2023). https://doi.org/10.1007/s00521-023-08251-3

Download citation

Received: 13 April 2022
Accepted: 06 January 2023
Published: 21 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00521-023-08251-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic speaker localization based on a novel lightweight R–CNN model

Abstract

Access this article

Similar content being viewed by others

A Review on Sound Source Localization Systems

Dynamically localizing multiple speakers based on the time-frequency domain

Sound source localization for auditory perception of a humanoid robot using deep neural networks

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic speaker localization based on a novel lightweight R–CNN model

Abstract

Access this article

Similar content being viewed by others

A Review on Sound Source Localization Systems

Dynamically localizing multiple speakers based on the time-frequency domain

Sound source localization for auditory perception of a humanoid robot using deep neural networks

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation