Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Spoorthy, Venkatesh; Koolagudi, Shashidhar G.

doi:10.1007/s00034-023-02478-0

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Published: 12 August 2023

Volume 43, pages 388–407, (2024)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

280 Accesses
Explore all metrics

Abstract

Identifying a scene based on the environment in which the related audio is recorded is known as acoustic scene classification (ASC). In this paper, a bi-level light-weight Convolutional Neural Network (CNN)-based model is presented to perform ASC. The proposed approach performs classification in two levels. The scenes are classified into three broad categories in the first level as indoor, outdoor, and transportation scenes. The three classes are further categorized into individual scenes in the second level. The proposed approach is implemented using three features: log Mel band energies, harmonic spectrograms and percussive spectrograms. To perform the classification, three CNN classifiers, namely, MobileNetV2, Squeeze-and-Excitation Net (SENet), and a combination of these two architectures, known as SE-MobileNet are used. The proposed combined model encashes the advantages of both MobileNetV2 and SENet architectures. Extensive experiments are conducted on DCASE 2020 (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development and DCASE 2016 ASC datasets. The proposed SE-MobileNet model resulted in a classification accuracy of 96.9% and 86.6% for the first and second levels, respectively, on DCASE 2020 dataset, and 97.6% and 88.4%, respectively, on DCASE 2016 dataset. The proposed model is reported to be better in terms of both complexity and accuracy as compared to the state-of-the-art low-complexity ASC systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

Deep learning for time series classification: a review

Article 02 March 2019

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

Availability of Data and Materials

The datasets discussed in the manuscript are publicly available for research purposes.

Code availability

Not applicable.

References

S. Abidin, R. Togneri, F. Sohel, Spectrotemporal analysis using local binary pattern variants for acoustic scene classification. IEEE/ACM Trans. Audio Speech Lang. Process. 26(11), 2112–2121 (2018)
Article Google Scholar
S. Abidin, X. Xia, R. Togneri et al., Local binary pattern with random forest for acoustic scene classification. In International Conference on Multimedia and Expo (ICME) (IEEE, 2018), pp. 1–6
A.K. Aggarwal, P. Jaidka, Segmentation of crop images for crop yield prediction. Int. J. Biol. Biomed. 7, 1–5 (2022)
Google Scholar
M.A. Alamir, A novel acoustic scene classification model using the late fusion of convolutional neural networks and different ensemble classifiers. Appl. Acoust. 175, 1–8 (2021)
Article Google Scholar
F. Arabnezhad, B. Nasersharif, Acoustic scene classification using binaural representation and classifier combination. In 9th International Conference on Computer and Knowledge Engineering (ICCKE) (IEEE, 2019), pp. 351–355
B.T. Atmaja, M. Akagi, Multitask learning and multistage fusion for dimensional audiovisual emotion recognition. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020), pp. 4482–4486
D. Barchiesi, D. Giannoulis, D. Stowell et al., Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process. Mag. 32(3), 16–34 (2015)
Article Google Scholar
M.J. Bianco, P. Gerstoft, J. Traer et al., Machine learning in acoustics: theory and applications. J. Acoust. Soc. Am. 146(5), 3590–3628 (2019)
Article Google Scholar
V. Bisot, S. Essid, G. Richard, HOG and subband power distribution image features for acoustic scene classification. In 23rd European Signal Processing Conference (EUSIPCO) (IEEE, 2015), pp. 719–723
V. Bisot, R. Serizel, S. Essid et al., Acoustic scene classification with matrix factorization for unsupervised feature learning. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2016), pp. 6445–6449
J. Chen, D. Zhang, M. Suzauddola et al., Identification of plant disease images via a squeeze-and-excitation Mobilenet model and twice transfer learning. IET Image Processing (2021), pp. 1115–1127
T. Heittola, A. Mesaros, T. Virtanen, Acoustic scene classification in DCASE 2020 challenge: generalization across devices and low complexity solutions. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE) (2020), pp. 56– 60
A. Howard, M. Sandler, G. Chu et al., Searching for mobilenetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), pp. 1314–1324
A.G. Howard, M. Zhu, B. Chen et al., MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. (2017), pp. 1–9
H. Hu, C.H. Yang, X. Xia et al., A two-stage approach to device-robust acoustic scene classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021), pp. 845–849
J. Hu, L. Shen, S. Albanie et al., Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)
Article Google Scholar
R. Hyder, S. Ghaffarzadegan, Z. Feng et al., Acoustic scene classification using a CNN-supervector system trained with auditory and spectrogram image features. In Interspeech (2017), pp. 3073–3077
F.N. Iandola, M.W., Moskewicz, K., Ashraf et al., SqueezeNet: Alexnet-level accuracy with 50x fewer parameters and \(<\)1MB model size. In Proceedings of the 5th International Conference on Learning Representations (ICLR) (2017), pp. 1–13
J. Kim, K. Lee, Empirical study on ensemble method of deep neural networks for acoustic scene classification. In Proc of IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) (2016), pp. 1–4
D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR) (2015), pp. 1–15
K. Koutini, F. Henkel, H. Eghbal-zadeh et al., CP-JKU submissions to DCASE’20: low-complexity cross-device acoustic scene classification with RF-regularized CNNs. In Detection and Classification of Acoustic Scenes and Events DCASE Challenge (2020), pp. 1–5
S. Lee, M. Kim, S. Shin et al., Ensemble-guided model for performance enhancement in model-complexity-limited acoustic scene classification. Appl. Sci. 12(1), 1–15 (2021)
Article Google Scholar
Y. Leng, W. Zhao, C. Lin et al., LDA-based data augmentation algorithm for acoustic scene classification. Knowl.-Based Syst. 195, 1–9 (2020)
Article Google Scholar
X. Li, S. Zhang, B. Jiang et al., Dac: data-free automatic acceleration of convolutional networks. In Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2019), pp. 1598–1606
V. Libal, B. Ramabhadran, N. Mana et al., Multimodal classification of activities of daily living inside smart homes. In International Work-Conference on Artificial Neural Networks (Springer, 2009), pp. 687–694
T. Lin, P. Goyal, R. Girshick et al., Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 2980–2988
P. LopezMeyer, J.A. Del Hoyo Ontiveros, H. Lu et al., Low-memory convolutional neural networks for acoustic scene classification. In Detection and Classification of Acoustic Scenes and Events DCASE Challenge (2020), pp. 1–5
A. Mesaros, T. Heittola, T. Virtanen. A multi-device dataset for urban acoustic scene classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE2018) (2018), pp. 9–13
A. Mesaros, T. Heittola, T. Virtanen. Acoustic scene classification: an overview of DCASE 2017 challenge entries. In International Workshop on Acoustic Signal Enhancement (IWAENC) (IEEE, 2018), pp. 411–415
Z. Mo, D. Luo, T. Wen et al., FPGA implementation for odor identification with depthwise separable convolutional neural network. Sensors 21(3), 1–19 (2021)
Article Google Scholar
N. Moritz, J. Schröder, S. Goetze et al., Acoustic scene classification using time-delay neural networks and amplitude modulation filter bank features. In Detection and Classification of Acoustic Scenes and Events Workshop (2016), pp. 1–4
M. Mulimani, S.G. Koolagudi, Robust acoustic event classification using fusion fisher vector features. Appl. Acoust. 155, 130–138 (2019)
Article Google Scholar
N. Pajusco, R. Huang, N. Farrugia. Lightweight convolutional neural networks on binaural waveforms for low complexity acoustic scene classification. In Detection and Classification of Acoustic Scenes and Events DCASE Challenge (2020), pp. 1–5
Z. Pan, Y. Ge, Y.C. Zhou et al., Cognitive acoustic analytics service for Internet of Things. In International Conference on Cognitive Computing (ICCC) (IEEE, 2017), pp. 96–103
S. Park, S. Mun,Y. Lee et al., Acoustic scene classification based on convolutional neural network using double image features. In Detection and Classification of Acoustic Scenes and Events Workshop (2017), pp. 98–102
C. Paseddula, S.V. Gangashetty, Late fusion framework for acoustic scene classification using LPCC, SCMC, and log-mel band energies with deep neural networks. Appl. Acoust. 172, 1–12 (2021)
Article Google Scholar
L. Pham, I. McLoughlin, H. Phan et al., Bag-of-features models based on C-DNN network for acoustic scene classification. In Audio Engineering Society Conference: AES International Conference on Audio Forensics, Audio Engineering Society (2019), pp. 1–12
L. Pham, H. Phan, T. Nguyen et al., Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework. Digit. Signal Process. 110, 1–10 (2021)
Article Google Scholar
L. Pham, H. Tang, A. Jalali et al., A low-complexity deep learning framework for acoustic scene classification. In Data Science—Analytics and Applications (2022), pp. 26–32
L.D. Pham, I.V. McLoughlin, H. Phan et al., A robust framework for acoustic scene classification. In Interspeech (2019), pp. 3634–3638
H. Phan, L. Hertel, M. Maass et al., Label tree embeddings for acoustic scene classification. In Proceedings of the 24th ACM International Conference on Multimedia (2016), pp. 486–490
S.S.R. Phaye, E. Benetos, Y. Wang, Subspectralnet-using sub-spectrogram based convolutional neural networks for acoustic scene classification, in International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2019), pp. 825–829
Z. Ren, K. Qian, Z. Zhang et al., Deep scalogram representations for acoustic scene classification. IEEE/CAA J. Autom. Sin. 5(3), 662–669 (2018)
Article Google Scholar
M. Sandler, A. Howard, M. Zhu et al. Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp 4510–4520
J. Sharma, O. Granmo, M. Goodwin. Environment sound classification using multiple feature channels and attention based deep convolutional neural network. In Interspeech (2020), pp. 1186–1190
C. Shi, H. Yang, Y. Liu et al. Low-complexity acoustic scene classification using data generation based on primary ambient extraction. In International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) (IEEE, 2021), pp. 1–5
L. Sifre, S. Mallat. Rigid-motion scattering for texture classification. Computing Research Repository (CoRR) (2014), pp. 1–19
N. Soni, D. Aggarwal, D. Vij et al., Acoustic scene classification for personal commuting mode: detecting polluting vs. non polluting vehicles. In 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (IEEE, 2018), pp. 274–279
J.K. Soonshin Seo, MobileNet using coordinate attention and fusions for low-complexity acoustic scene classification with multiple devices. In Detection and Classification of Acoustic Scenes and Events DCASE Challenge (2021), pp. 1–5
V. Spoorthy, S.G. Koolagudi, Device robust acoustic scene classification using adaptive noise reduction and convolutional recurrent attention neural network. In Proceedings of Speech and Computer: 24th International Conference, SPECOM, Gurugram, India (Springer, 2022), pp. 688–699
V. Spoorthy, M, Mulimani, S.G. Koolagudi. Acoustic scene classification using deep learning architectures. In 6th International Conference for Convergence in Technology, I2CT (Institute of Electrical and Electronics Engineers Inc., 2021), pp. 1–5
V. Spoorthy, M. Mulimani, S.G. Koolagudi, Acoustic scene classification using deep fisher network. Digit. Signal Process. 139, 1–13 (2023)
Google Scholar
Y. Su, K. Zhang, J. Wang et al., Performance analysis of multiple aggregated acoustic features for environment sound classification. Appl. Acoust. 158, 1–11 (2020)
Article Google Scholar
A. Tsanousa, G. Meditskos, S. Vrochidis et al., A weighted late fusion framework for recognizing human activity from wearable sensors, in International Conference on Information, Intelligence, Systems and Applications (IISA) (IEEE, 2019), pp. 1–8
S. Waldekar, G. Saha, Classification of audio scenes with novel features in a fused system framework. Digit. Signal Process. 75, 71–82 (2018)
Article MathSciNet Google Scholar
S. Waldekar, G. Saha, Analysis and classification of acoustic scenes with wavelet transform-based mel-scaled features. Multimed. Tools Appl. 79(11), 7911–7926 (2020)
Article Google Scholar
J. Xiang, M.F. McKinney, K. Fitz et al., Evaluation of sound classification algorithms for hearing aid applications, in International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2010), pp. 185–188
J. Xiao, S.A. Suab, X. Chen et al., Enhancing assessment of corn growth performance using unmanned aerial vehicles (UAVs) and deep learning. Measurement 214, 1–17 (2023)
Article Google Scholar
Y. Xu, Q. Huang, W. Wang et al., Hierarchical learning for DNN-based acoustic scene classification, in Detection and Classification of Acoustic Scenes and Events (DCASE) workshop (2016), pp. 1–5
L, Yang., X, Chen., L, Tao. Acoustic scene classification using multi-scale features. Detection and Classification of Acoustic Scenes and Events DCASE Challenge (2018), pp. 29–33
L. Yang, L. Tao, X. Chen et al., Multi-scale semantic feature fusion and data augmentation for acoustic scene classification. Appl. Acoust. 163, 1–10 (2020)
Article Google Scholar
T. Zhang, J. Liang, B. Ding, Acoustic scene classification using deep CNN with fine-resolution feature. Expert Syst. Appl. 143, 1–9 (2020)
Article Google Scholar

Download references

Funding

Doctoral fellowship was received from the Ministry of Human Resource Development (MHRD), Government of India to support this research work.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, India
Venkatesh Spoorthy & Shashidhar G. Koolagudi

Authors

Venkatesh Spoorthy
View author publications
You can also search for this author in PubMed Google Scholar
Shashidhar G. Koolagudi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization was contributed by SV, SGK. Literature review was contributed by SV. Writing— original draft preparation, was contributed by SV. Writing—review and editing, was contributed by SGK. Supervision was contributed by SGK.

Corresponding author

Correspondence to Venkatesh Spoorthy.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical Approval

The authors adhere to all the ethics declaration and guarantee that no discrepancies have occurred in the manuscript.

Consent for Publication

The authors and co-authors of the manuscript provide consent for publication in Circuits, Systems, and Signal Processing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Spoorthy, V., Koolagudi, S.G. Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model. Circuits Syst Signal Process 43, 388–407 (2024). https://doi.org/10.1007/s00034-023-02478-0

Download citation

Received: 24 December 2022
Revised: 27 July 2023
Accepted: 27 July 2023
Published: 12 August 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00034-023-02478-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Deep learning for time series classification: a review

Convolutional neural network: a review of models, methodologies and applications to object detection

Availability of Data and Materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Consent for Publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Deep learning for time series classification: a review

Convolutional neural network: a review of models, methodologies and applications to object detection

Availability of Data and Materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Consent for Publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation