Polyphonic sound event localization and detection using channel-wise FusionNet

V., Spoorthy; Kooolagudi, Shashidhar G.

doi:10.1007/s10489-024-05438-6

Polyphonic sound event localization and detection using channel-wise FusionNet

Published: 13 April 2024

Volume 54, pages 5015–5026, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

476 Accesses
1 Altmetric
Explore all metrics

Abstract

Sound Event Localization and Detection (SELD) is the task of spatial and temporal localization of various sound events and their classification. Commonly, multitask models are used to perform SELD. In this work, a deep learning network model named channel-wise ‘FusionNet’ is designed to perform the SELD task. The novel fusion layer is introduced into the regular Deep Neural Network (DNN), where the input is fed channel-wise, and the outputs of all channels are fused to form a new feature representation. The key contribution of this work is the neural network model which helps to retain the channel-wise information from the multichannel input along with the spatial and temporal information. The proposed network utilizes separable convolution blocks in the convolution layers, therefore, the complexity of the model is low in terms of both time and space. The features used as input are Mel-band energies for Sound Event Detection (SED) and intensity vectors for the Direction-of-Arrival (DOA) estimation. The proposed network’s fusion layer provides a better representation of features for both SED and DOA estimation tasks. Experiments are performed on the recordings of the First-order Ambisonic (FOA) array format of the TAU-NIGENS Spatial Sound Events 2020 dataset. An improved performance is achieved in terms of Error Rate (ER), DOA error, and Frame Recall (FR) has been observed in comparison to the state-of-the-art SELD systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GLFER-Net: a polyphonic sound source localization and detection network based on global-local feature extraction and recalibration

Article Open access 26 June 2024

Robust Sound Event Classification with Local Time-Frequency Information and Convolutional Neural Networks

Attention mechanism combined with residual recurrent neural network for sound event detection and localization

Article Open access 05 December 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets discussed in the manuscript are publicly available for research purposes. (URL: https://dcase.community/workshop2020/proceedings.) [28]

References

Adavanne S, Parascandolo G, Pertilä P, et al (2016) Sound event detection in multichannel audio using spatial and harmonic features. In: Workshop on detection and classification of acoustic scenes and events, pp 6–10
Adavanne S, Politis A, Nikunen J et al (2018) Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J Sel Topics Signal Process 13(1):34–48
Article Google Scholar
Akbacak M, Hansen JH (2007) Environmental sniffing: noise knowledge estimation for robust speech systems. IEEE Transactions on Audio, Speech, and Language Processing 15(2):465–477
Article Google Scholar
Aletta F, Kang J, Astolfi A et al (2016) Differences in soundscape appreciation of walking sounds from different footpath materials in urban parks. Sustain Cities Soc 27:367–376
Article Google Scholar
Benesty J, Chen J, Huang Y (2004) Time-delay estimation via linear interpolation and cross correlation. IEEE Trans on Speech Audio Process 12(5):509–519
Article Google Scholar
Cakir E, Heittola T, Huttunen H, et al (2015) Polyphonic sound event detection using multi label deep neural networks. In: The international joint conference on neural networks (IJCNN). IEEE, pp 1–7
Cakır E, Parascandolo G, Heittola T et al (2017) Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25(6):1291–1303
Article Google Scholar
Cao Y, Iqbal T, Kong Q, et al (2020) Event-independent network for polyphonic sound event localization and detection. Tech. rep., DCASE2020 Challenge
Carletti V, Foggia P, Percannella G, et al (2013) Audio surveillance using a bag of aural words classifier. In: the 10th International conference on advanced video and signal based surveillance. IEEE, pp 81–86
Chakrabarty S, Habets E (2017) Multi-speaker localization using convolutional neural network trained with noise. In: Workshop on machine learning for audio processing, pp 1–5
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 1800–1807
Chu S, Narayanan S, Kuo CJ, et al (2006) Where am I? Scene recognition for mobile robots using audio features. In: The International conference on multimedia and expo. IEEE, pp 885–888
DiBiase JH (2000) A High-accuracy, Low-latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays. Brown University Providence, RI
DiBiase JH, Silverman HF, Brandstein MS (2001) Robust localization in reverberant rooms. In: Microphone arrays. Springer, p 157–180
Hayashi T, Watanabe S, Toda T et al (2017) Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25(11):2059–2070
Article Google Scholar
Hirvonen T (2015) Classification of spatial audio location and content using convolutional neural networks
Huang Y, Benesty J, Elko GW et al (2001) Real-time passive source localization: A practical linear-correction least-squares approach. IEEE Trans Speech Audio Process 9(8):943–956
Article Google Scholar
Huang Z, Liu C, Fei H et al (2020) Urban sound classification based on 2-order dense convolutional network using dual features. Appl Acoust 164(107):243
Google Scholar
Jayalakshmi S, Chandrakala S, Nedunchelian R (2018) Global statistical features-based approach for acoustic event detection. Appl Acoust 139:113–118
Article Google Scholar
Jeong IY, Lee S, Han Y, et al (2017) Audio event detection using multiple-input convolutional neural network. Detection and Classification of Acoustic Scenes and Events (DCASE) pp 51–54
Kapka S, Lewandowski M (2019) Sound source detection, localization and classification using consecutive ensemble of CRNN models. Tech. rep., Detection Classification Acoustic Scenes Events Workshop
LiHong P, Xue Z, Ping C, et al (2019) Polyphonic sound event detection and localization using a two-stage strategy. Tech. rep., Detection Classification Acoustic Scenes Events Workshop
Lopatka K, Kotus J, Czyzewski A (2016) Detection, classification and localization of acoustic events in the presence of background noise for acoustic surveillance of hazardous situations. Multimed Tools Appl 75(17):10,407–10,439
Mesaros A, Heittola T, Eronen A, et al (2010) Acoustic event detection in real life recordings. In: The 18th european signal processing conference. IEEE, pp 1267–1271
Mesaros A, Adavanne S, Politis A, et al (2019) Joint measurement of localization and detection of sound events. In: IEEE Workshop on applications of signal processing to audio and acoustics (WASPAA)
Phan H, Hertel L, Maass M, et al (2016) Robust audio event recognition with 1-max pooling convolutional neural networks. arXiv
Phan H, Pham L, Koch P, et al (2020) Audio event detection and localization with multitask regression network. Tech. rep., DCASE2020 Challenge
Politis A, Adavanne S, Virtanen T (2020) A dataset of reverberant spatial sound scenes with moving sources for sound event localization and detection. In: Proceedings of the workshop on detection and classification of acoustic scenes and events
Politis A, Mesaros A, Adavanne S et al (2020) Overview and evaluation of sound event localization and detection in dcase 2019. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29:684–698
Article Google Scholar
Spoorthy V, Koolagudi SG (2023) Polyphonic sound event detection using Mel-Pseudo constant Q-Transform and deep neural network. IETE Journal of Research pp 1–13
Spoorthy V, Koolagudi SG (2023) A transpose-SELDNet for polyphonic sound event localization and detection. In: 2023 IEEE 8th international conference for convergence in technology (I2CT). IEEE, pp 1–6
Wang Q, Wu H, Jing Z, et al (2020) The USTC-IFLYTEK system for sound event localization and detection of DCASE2020 challenge. Tech. rep., DCASE2020 Challenge
Weiping Z, Jiantao Y, Xiaotao X, et al (2017) Acoustic scene classification using deep convolutional neural network and multiple spectrograms fusion. Detection and Classification of Acoustic Scenes and Events (DCASE)
Zhang H, McLoughlin I, Song Y (2015) Robust sound event recognition using convolutional neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 559–563
Zöhrer M, Pernkopf F (2017) Virtual adversarial training and data augmentation for acoustic event detection with gated recurrent neural networks. In: Interspeech, pp 493–497

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, 575025, India
Spoorthy V. & Shashidhar G. Kooolagudi
Department of Artificial Intelligence and Machine Learning, Charotar University of Science and Technology, Anand, Gujarat, 388450, India
Spoorthy V.

Authors

Spoorthy V.
View author publications
You can also search for this author inPubMed Google Scholar
Shashidhar G. Kooolagudi
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Conceptualization: Spoorthy. V, Shashidhar G. Koolagudi; Methodology: Spoorthy. V; Formal analysis and investigation: Spoorthy. V; Writing - original draft preparation: Spoorthy. V; Writing - review and editing: Shashidhar G. Koolagudi; Supervision: Shashidhar G. Koolagudi

Corresponding author

Correspondence to Spoorthy V..

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

V., S., Kooolagudi, S.G. Polyphonic sound event localization and detection using channel-wise FusionNet. Appl Intell 54, 5015–5026 (2024). https://doi.org/10.1007/s10489-024-05438-6

Download citation

Accepted: 30 March 2024
Published: 13 April 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10489-024-05438-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Polyphonic sound event localization and detection using channel-wise FusionNet

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GLFER-Net: a polyphonic sound source localization and detection network based on global-local feature extraction and recalibration

Robust Sound Event Classification with Local Time-Frequency Information and Convolutional Neural Networks

Attention mechanism combined with residual recurrent neural network for sound event detection and localization

Explore related subjects

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now