Skip to main content
Log in

Acoustic scene classification using projection Kervolutional neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, a novel Projection Kervolutional Neural Network (ProKNN) is proposed for Acoustic Scene Classification (ASC). ProKNN is a combination of two special filters known as the left and right projection layers and Kervolutional Neural Network (KNN). KNN replaces the linearity of the Convolutional Neural Network (CNN) with a non-linear polynomial kernel. We extend the ProKNN to learn from the features of two channels of audio recordings in the initial stage. The performance of the ProKNN is evaluated on the two publicly available datasets: TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets. Results show that the proposed ProKNN outperforms the existing systems with an absolute improvement of accuracy of 8% and 14% on TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets respectively, as compared to the baseline model of Detection and Classification of Acoustic Scene and Events (DCASE) - 2018 challenge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data Availability

The datasets used in this study are publicly available at the Zenodo repository at the following addresses.

– TUT Urban Acoustic Scenes 2018 development dataset: https://doi.org/10.5281/zenodo.1228142

– TUT Urban Acoustic Scenes Mobile 2018 development dataset: https://doi.org/10.5281/zenodo.1228235

References

  1. Barchiesi D, Giannoulis D, Stowell D, Plumbley MD (2015) Acoustic scene classification: Classifying environments from the sounds they produce. IEEE Signal Proc Mag 32(3):16–34

    Article  Google Scholar 

  2. Basbug AM, Sert M (2019) Acoustic scene classification using spatial pyramid pooling with convolutional neural network. In: 2019 IEEE 13th international conference on semantic computing (ICSC). IEEE, pp 128–131

  3. Bisot V, Serizel R, Essid S, Richard G (2017) Feature learning with matrix factorization applied to acoustic scene classification. IEEE/ACM Trans Audio Speech Lang Process 25(6):1216–1229

    Article  Google Scholar 

  4. Chu S, Narayanan S, Kuo C-CJ, Mataric MJ (2006) Where am I? scene recognition for mobile robots using audio features. In: IEEE International conference on multimedia and expo. IEEE, pp 885–888

  5. Damnjanovic I, Reiss J, Barry D (2008) Enabling access to sound archives through integration, enrichment and retrieval. In: IEEE international conference on multimedia and expo (ICME). IEEE Computer Society

  6. Geiger JT, Björn S, Rigoll G (2013) Large-scale audio feature extraction and svm for acoustic scene classification. In: 2013 IEEE workshop on applications of signal processing to audio and acoustics. IEEE, pp 1–4

  7. Han Y, Park J, Lee K (2017) Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In: The detection and classification of acoustic scenes and events (DCASE), pp 1–5

  8. Heittola T, Çakır E, Virtanen T (2018) The machine learning approach for analysis of sound scenes and events. In: Computational analysis of sound scenes and events. Springer, pp 13–40

  9. Koutini K, Eghbal-zadeh H, Widmer G, Kepler J (2019) CP-JKU submissions to DCASE’19 Acoustic scene classification and audio tagging with receptive-field-regularized cnns. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), New York, NY, USA, pp 25–26

  10. Mesaros A, Heittola T, Virtanen T (2016) Tut database for acoustic scene classification and sound event detection. In: 24th european signal processing conference (EUSIPCO). IEEE, pp 1128–1132

  11. Mesaros A, Heittola T, Benetos E, Foster P, Lagrange M, Virtanen T, Plumbley MD (2017) Detection and classification of acoustic scenes and events: Outcome of the dcase 2016 challenge. IEEE/ACM Trans Audio Speech Lang Process 26(2):379–393

    Article  Google Scholar 

  12. Mesaros A, Heittola T, Ellis D (2018a) Datasets and evaluation. In: Computational analysis of sound scenes and events. Springer, pp 147–179

  13. Mesaros A, Heittola T, Virtanen T (2018b) A multi-device dataset for urban acoustic scene classification. arXiv:1807.09840

  14. Mulimani M, Koolagudi SG (2018) Robust acoustic event classification using bag-of-visual-words. In: Interspeech, pp 3319–3322

  15. Mulimani M, Koolagudi SG (2019a) Extraction of mapreduce-based features from spectrograms for audio-based surveillance. Digit Sig Process 87:1–9

    Article  Google Scholar 

  16. Mulimani M, Koolagudi SG (2019b) Segmentation and characterization of acoustic event spectrograms using singular value decomposition. Expert Syst Appl 120:413–425

    Article  Google Scholar 

  17. Manjunath M, Kademani AB, Koolagudi SG (2020) A deep neural network-driven feature learning method for polyphonic acoustic event detection from real-life recordings. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 291–295

  18. Oppenheim AV (1970) Speech spectrograms using the Fast Fourier Transform. IEEE spectrum 7 (8):57–62

    Article  Google Scholar 

  19. Phaye SSR, Benetos E, Wang Y (2019) Subspectralnet–using sub-spectrogram based convolutional neural networks for acoustic scene classification. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 825–829

  20. Ren Z, Kong Q, Han J, Plumbley MD, Schuller BW (2019) Attention-based atrous convolutional neural networks: Visualisation and understanding perspectives of acoustic scenes. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 56–60

  21. Schilit B, Adams N, Want R (1994) Context-aware computing applications. In: Workshop on mobile computing systems and applications. IEEE, pp 85–90

  22. Schmitt M, Ringeval F, Schuller BW (2016) At the border of acoustics and linguistics: Bag-of-audio-words for the recognition of emotions in speech. In: Interspeech, pp 495–499

  23. Wang C, Yang J, Xie L, Yuan J (2019) Kervolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 31–40

  24. Xu Y, Li WJ, Lee KK (2008) Intelligent wearable interfaces John Wiley & Sons

  25. Zhang T, Zheng W, Cui Z, Zong Y, Yan J, Yan K (2016) A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans Multimed 18(12):2528–2536

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manjunath Mulimani.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mulimani, M., Nandi, R. & Koolagudi, S.G. Acoustic scene classification using projection Kervolutional neural network. Multimed Tools Appl 82, 9447–9457 (2023). https://doi.org/10.1007/s11042-022-13763-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13763-6

Keywords

Navigation