Abstract
In this paper, a novel Projection Kervolutional Neural Network (ProKNN) is proposed for Acoustic Scene Classification (ASC). ProKNN is a combination of two special filters known as the left and right projection layers and Kervolutional Neural Network (KNN). KNN replaces the linearity of the Convolutional Neural Network (CNN) with a non-linear polynomial kernel. We extend the ProKNN to learn from the features of two channels of audio recordings in the initial stage. The performance of the ProKNN is evaluated on the two publicly available datasets: TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets. Results show that the proposed ProKNN outperforms the existing systems with an absolute improvement of accuracy of 8% and 14% on TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets respectively, as compared to the baseline model of Detection and Classification of Acoustic Scene and Events (DCASE) - 2018 challenge.
Similar content being viewed by others
Data Availability
The datasets used in this study are publicly available at the Zenodo repository at the following addresses.
– TUT Urban Acoustic Scenes 2018 development dataset: https://doi.org/10.5281/zenodo.1228142
– TUT Urban Acoustic Scenes Mobile 2018 development dataset: https://doi.org/10.5281/zenodo.1228235
References
Barchiesi D, Giannoulis D, Stowell D, Plumbley MD (2015) Acoustic scene classification: Classifying environments from the sounds they produce. IEEE Signal Proc Mag 32(3):16–34
Basbug AM, Sert M (2019) Acoustic scene classification using spatial pyramid pooling with convolutional neural network. In: 2019 IEEE 13th international conference on semantic computing (ICSC). IEEE, pp 128–131
Bisot V, Serizel R, Essid S, Richard G (2017) Feature learning with matrix factorization applied to acoustic scene classification. IEEE/ACM Trans Audio Speech Lang Process 25(6):1216–1229
Chu S, Narayanan S, Kuo C-CJ, Mataric MJ (2006) Where am I? scene recognition for mobile robots using audio features. In: IEEE International conference on multimedia and expo. IEEE, pp 885–888
Damnjanovic I, Reiss J, Barry D (2008) Enabling access to sound archives through integration, enrichment and retrieval. In: IEEE international conference on multimedia and expo (ICME). IEEE Computer Society
Geiger JT, Björn S, Rigoll G (2013) Large-scale audio feature extraction and svm for acoustic scene classification. In: 2013 IEEE workshop on applications of signal processing to audio and acoustics. IEEE, pp 1–4
Han Y, Park J, Lee K (2017) Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In: The detection and classification of acoustic scenes and events (DCASE), pp 1–5
Heittola T, Çakır E, Virtanen T (2018) The machine learning approach for analysis of sound scenes and events. In: Computational analysis of sound scenes and events. Springer, pp 13–40
Koutini K, Eghbal-zadeh H, Widmer G, Kepler J (2019) CP-JKU submissions to DCASE’19 Acoustic scene classification and audio tagging with receptive-field-regularized cnns. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), New York, NY, USA, pp 25–26
Mesaros A, Heittola T, Virtanen T (2016) Tut database for acoustic scene classification and sound event detection. In: 24th european signal processing conference (EUSIPCO). IEEE, pp 1128–1132
Mesaros A, Heittola T, Benetos E, Foster P, Lagrange M, Virtanen T, Plumbley MD (2017) Detection and classification of acoustic scenes and events: Outcome of the dcase 2016 challenge. IEEE/ACM Trans Audio Speech Lang Process 26(2):379–393
Mesaros A, Heittola T, Ellis D (2018a) Datasets and evaluation. In: Computational analysis of sound scenes and events. Springer, pp 147–179
Mesaros A, Heittola T, Virtanen T (2018b) A multi-device dataset for urban acoustic scene classification. arXiv:1807.09840
Mulimani M, Koolagudi SG (2018) Robust acoustic event classification using bag-of-visual-words. In: Interspeech, pp 3319–3322
Mulimani M, Koolagudi SG (2019a) Extraction of mapreduce-based features from spectrograms for audio-based surveillance. Digit Sig Process 87:1–9
Mulimani M, Koolagudi SG (2019b) Segmentation and characterization of acoustic event spectrograms using singular value decomposition. Expert Syst Appl 120:413–425
Manjunath M, Kademani AB, Koolagudi SG (2020) A deep neural network-driven feature learning method for polyphonic acoustic event detection from real-life recordings. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 291–295
Oppenheim AV (1970) Speech spectrograms using the Fast Fourier Transform. IEEE spectrum 7 (8):57–62
Phaye SSR, Benetos E, Wang Y (2019) Subspectralnet–using sub-spectrogram based convolutional neural networks for acoustic scene classification. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 825–829
Ren Z, Kong Q, Han J, Plumbley MD, Schuller BW (2019) Attention-based atrous convolutional neural networks: Visualisation and understanding perspectives of acoustic scenes. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 56–60
Schilit B, Adams N, Want R (1994) Context-aware computing applications. In: Workshop on mobile computing systems and applications. IEEE, pp 85–90
Schmitt M, Ringeval F, Schuller BW (2016) At the border of acoustics and linguistics: Bag-of-audio-words for the recognition of emotions in speech. In: Interspeech, pp 495–499
Wang C, Yang J, Xie L, Yuan J (2019) Kervolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 31–40
Xu Y, Li WJ, Lee KK (2008) Intelligent wearable interfaces John Wiley & Sons
Zhang T, Zheng W, Cui Z, Zong Y, Yan J, Yan K (2016) A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans Multimed 18(12):2528–2536
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mulimani, M., Nandi, R. & Koolagudi, S.G. Acoustic scene classification using projection Kervolutional neural network. Multimed Tools Appl 82, 9447–9457 (2023). https://doi.org/10.1007/s11042-022-13763-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13763-6