Acoustic scene classification using projection Kervolutional neural network

Mulimani, Manjunath; Nandi, Ritika; Koolagudi, Shashidhar G

doi:10.1007/s11042-022-13763-6

Acoustic scene classification using projection Kervolutional neural network

Published: 14 September 2022

Volume 82, pages 9447–9457, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Manjunath Mulimani ORCID: orcid.org/0000-0001-9927-1123¹,
Ritika Nandi¹ &
Shashidhar G Koolagudi²

271 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, a novel Projection Kervolutional Neural Network (ProKNN) is proposed for Acoustic Scene Classification (ASC). ProKNN is a combination of two special filters known as the left and right projection layers and Kervolutional Neural Network (KNN). KNN replaces the linearity of the Convolutional Neural Network (CNN) with a non-linear polynomial kernel. We extend the ProKNN to learn from the features of two channels of audio recordings in the initial stage. The performance of the ProKNN is evaluated on the two publicly available datasets: TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets. Results show that the proposed ProKNN outperforms the existing systems with an absolute improvement of accuracy of 8% and 14% on TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets respectively, as compared to the baseline model of Detection and Classification of Acoustic Scene and Events (DCASE) - 2018 challenge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Article 12 August 2023

Acoustic Scene Classification Using Convolutional Neural Network

Sample Dropout for Audio Scene Classification Using Multi-scale Dense Connected Convolutional Neural Network

Data Availability

The datasets used in this study are publicly available at the Zenodo repository at the following addresses.

– TUT Urban Acoustic Scenes 2018 development dataset: https://doi.org/10.5281/zenodo.1228142

– TUT Urban Acoustic Scenes Mobile 2018 development dataset: https://doi.org/10.5281/zenodo.1228235

References

Barchiesi D, Giannoulis D, Stowell D, Plumbley MD (2015) Acoustic scene classification: Classifying environments from the sounds they produce. IEEE Signal Proc Mag 32(3):16–34
Article Google Scholar
Basbug AM, Sert M (2019) Acoustic scene classification using spatial pyramid pooling with convolutional neural network. In: 2019 IEEE 13th international conference on semantic computing (ICSC). IEEE, pp 128–131
Bisot V, Serizel R, Essid S, Richard G (2017) Feature learning with matrix factorization applied to acoustic scene classification. IEEE/ACM Trans Audio Speech Lang Process 25(6):1216–1229
Article Google Scholar
Chu S, Narayanan S, Kuo C-CJ, Mataric MJ (2006) Where am I? scene recognition for mobile robots using audio features. In: IEEE International conference on multimedia and expo. IEEE, pp 885–888
Damnjanovic I, Reiss J, Barry D (2008) Enabling access to sound archives through integration, enrichment and retrieval. In: IEEE international conference on multimedia and expo (ICME). IEEE Computer Society
Geiger JT, Björn S, Rigoll G (2013) Large-scale audio feature extraction and svm for acoustic scene classification. In: 2013 IEEE workshop on applications of signal processing to audio and acoustics. IEEE, pp 1–4
Han Y, Park J, Lee K (2017) Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In: The detection and classification of acoustic scenes and events (DCASE), pp 1–5
Heittola T, Çakır E, Virtanen T (2018) The machine learning approach for analysis of sound scenes and events. In: Computational analysis of sound scenes and events. Springer, pp 13–40
Koutini K, Eghbal-zadeh H, Widmer G, Kepler J (2019) CP-JKU submissions to DCASE’19 Acoustic scene classification and audio tagging with receptive-field-regularized cnns. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), New York, NY, USA, pp 25–26
Mesaros A, Heittola T, Virtanen T (2016) Tut database for acoustic scene classification and sound event detection. In: 24th european signal processing conference (EUSIPCO). IEEE, pp 1128–1132
Mesaros A, Heittola T, Benetos E, Foster P, Lagrange M, Virtanen T, Plumbley MD (2017) Detection and classification of acoustic scenes and events: Outcome of the dcase 2016 challenge. IEEE/ACM Trans Audio Speech Lang Process 26(2):379–393
Article Google Scholar
Mesaros A, Heittola T, Ellis D (2018a) Datasets and evaluation. In: Computational analysis of sound scenes and events. Springer, pp 147–179
Mesaros A, Heittola T, Virtanen T (2018b) A multi-device dataset for urban acoustic scene classification. arXiv:1807.09840
Mulimani M, Koolagudi SG (2018) Robust acoustic event classification using bag-of-visual-words. In: Interspeech, pp 3319–3322
Mulimani M, Koolagudi SG (2019a) Extraction of mapreduce-based features from spectrograms for audio-based surveillance. Digit Sig Process 87:1–9
Article Google Scholar
Mulimani M, Koolagudi SG (2019b) Segmentation and characterization of acoustic event spectrograms using singular value decomposition. Expert Syst Appl 120:413–425
Article Google Scholar
Manjunath M, Kademani AB, Koolagudi SG (2020) A deep neural network-driven feature learning method for polyphonic acoustic event detection from real-life recordings. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 291–295
Oppenheim AV (1970) Speech spectrograms using the Fast Fourier Transform. IEEE spectrum 7 (8):57–62
Article Google Scholar
Phaye SSR, Benetos E, Wang Y (2019) Subspectralnet–using sub-spectrogram based convolutional neural networks for acoustic scene classification. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 825–829
Ren Z, Kong Q, Han J, Plumbley MD, Schuller BW (2019) Attention-based atrous convolutional neural networks: Visualisation and understanding perspectives of acoustic scenes. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 56–60
Schilit B, Adams N, Want R (1994) Context-aware computing applications. In: Workshop on mobile computing systems and applications. IEEE, pp 85–90
Schmitt M, Ringeval F, Schuller BW (2016) At the border of acoustics and linguistics: Bag-of-audio-words for the recognition of emotions in speech. In: Interspeech, pp 495–499
Wang C, Yang J, Xie L, Yuan J (2019) Kervolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 31–40
Xu Y, Li WJ, Lee KK (2008) Intelligent wearable interfaces John Wiley & Sons
Zhang T, Zheng W, Cui Z, Zong Y, Yan J, Yan K (2016) A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans Multimed 18(12):2528–2536
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, 576 104, India
Manjunath Mulimani & Ritika Nandi
Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, 575 025, India
Shashidhar G Koolagudi

Authors

Manjunath Mulimani
View author publications
You can also search for this author in PubMed Google Scholar
Ritika Nandi
View author publications
You can also search for this author in PubMed Google Scholar
Shashidhar G Koolagudi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manjunath Mulimani.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mulimani, M., Nandi, R. & Koolagudi, S.G. Acoustic scene classification using projection Kervolutional neural network. Multimed Tools Appl 82, 9447–9457 (2023). https://doi.org/10.1007/s11042-022-13763-6

Download citation

Received: 03 April 2021
Revised: 23 June 2022
Accepted: 01 September 2022
Published: 14 September 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11042-022-13763-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Acoustic scene classification using projection Kervolutional neural network

Abstract

Access this article

Similar content being viewed by others

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Acoustic Scene Classification Using Convolutional Neural Network

Sample Dropout for Audio Scene Classification Using Multi-scale Dense Connected Convolutional Neural Network

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Acoustic scene classification using projection Kervolutional neural network

Abstract

Access this article

Similar content being viewed by others

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Acoustic Scene Classification Using Convolutional Neural Network

Sample Dropout for Audio Scene Classification Using Multi-scale Dense Connected Convolutional Neural Network

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation