Skip to main content
Log in

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Identifying a scene based on the environment in which the related audio is recorded is known as acoustic scene classification (ASC). In this paper, a bi-level light-weight Convolutional Neural Network (CNN)-based model is presented to perform ASC. The proposed approach performs classification in two levels. The scenes are classified into three broad categories in the first level as indoor, outdoor, and transportation scenes. The three classes are further categorized into individual scenes in the second level. The proposed approach is implemented using three features: log Mel band energies, harmonic spectrograms and percussive spectrograms. To perform the classification, three CNN classifiers, namely, MobileNetV2, Squeeze-and-Excitation Net (SENet), and a combination of these two architectures, known as SE-MobileNet are used. The proposed combined model encashes the advantages of both MobileNetV2 and SENet architectures. Extensive experiments are conducted on DCASE 2020 (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development and DCASE 2016 ASC datasets. The proposed SE-MobileNet model resulted in a classification accuracy of 96.9% and 86.6% for the first and second levels, respectively, on DCASE 2020 dataset, and 97.6% and 88.4%, respectively, on DCASE 2016 dataset. The proposed model is reported to be better in terms of both complexity and accuracy as compared to the state-of-the-art low-complexity ASC systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of Data and Materials

The datasets discussed in the manuscript are publicly available for research purposes.

Code availability

Not applicable.

References

  1. S. Abidin, R. Togneri, F. Sohel, Spectrotemporal analysis using local binary pattern variants for acoustic scene classification. IEEE/ACM Trans. Audio Speech Lang. Process. 26(11), 2112–2121 (2018)

    Article  Google Scholar 

  2. S. Abidin, X. Xia, R. Togneri et al., Local binary pattern with random forest for acoustic scene classification. In International Conference on Multimedia and Expo (ICME) (IEEE, 2018), pp. 1–6

  3. A.K. Aggarwal, P. Jaidka, Segmentation of crop images for crop yield prediction. Int. J. Biol. Biomed. 7, 1–5 (2022)

    Google Scholar 

  4. M.A. Alamir, A novel acoustic scene classification model using the late fusion of convolutional neural networks and different ensemble classifiers. Appl. Acoust. 175, 1–8 (2021)

    Article  Google Scholar 

  5. F. Arabnezhad, B. Nasersharif, Acoustic scene classification using binaural representation and classifier combination. In 9th International Conference on Computer and Knowledge Engineering (ICCKE) (IEEE, 2019), pp. 351–355

  6. B.T. Atmaja, M. Akagi, Multitask learning and multistage fusion for dimensional audiovisual emotion recognition. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020), pp. 4482–4486

  7. D. Barchiesi, D. Giannoulis, D. Stowell et al., Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process. Mag. 32(3), 16–34 (2015)

    Article  Google Scholar 

  8. M.J. Bianco, P. Gerstoft, J. Traer et al., Machine learning in acoustics: theory and applications. J. Acoust. Soc. Am. 146(5), 3590–3628 (2019)

    Article  Google Scholar 

  9. V. Bisot, S. Essid, G. Richard, HOG and subband power distribution image features for acoustic scene classification. In 23rd European Signal Processing Conference (EUSIPCO) (IEEE, 2015), pp. 719–723

  10. V. Bisot, R. Serizel, S. Essid et al., Acoustic scene classification with matrix factorization for unsupervised feature learning. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2016), pp. 6445–6449

  11. J. Chen, D. Zhang, M. Suzauddola et al., Identification of plant disease images via a squeeze-and-excitation Mobilenet model and twice transfer learning. IET Image Processing (2021), pp. 1115–1127

  12. T. Heittola, A. Mesaros, T. Virtanen, Acoustic scene classification in DCASE 2020 challenge: generalization across devices and low complexity solutions. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE) (2020), pp. 56– 60

  13. A. Howard, M. Sandler, G. Chu et al., Searching for mobilenetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), pp. 1314–1324

  14. A.G. Howard, M. Zhu, B. Chen et al., MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. (2017), pp. 1–9

  15. H. Hu, C.H. Yang, X. Xia et al., A two-stage approach to device-robust acoustic scene classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021), pp. 845–849

  16. J. Hu, L. Shen, S. Albanie et al., Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)

    Article  Google Scholar 

  17. R. Hyder, S. Ghaffarzadegan, Z. Feng et al., Acoustic scene classification using a CNN-supervector system trained with auditory and spectrogram image features. In Interspeech (2017), pp. 3073–3077

  18. F.N. Iandola, M.W., Moskewicz, K., Ashraf et al., SqueezeNet: Alexnet-level accuracy with 50x fewer parameters and \(<\)1MB model size. In Proceedings of the 5th International Conference on Learning Representations (ICLR) (2017), pp. 1–13

  19. J. Kim, K. Lee, Empirical study on ensemble method of deep neural networks for acoustic scene classification. In Proc of IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) (2016), pp. 1–4

  20. D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR) (2015), pp. 1–15

  21. K. Koutini, F. Henkel, H. Eghbal-zadeh et al., CP-JKU submissions to DCASE’20: low-complexity cross-device acoustic scene classification with RF-regularized CNNs. In Detection and Classification of Acoustic Scenes and Events DCASE Challenge (2020), pp. 1–5

  22. S. Lee, M. Kim, S. Shin et al., Ensemble-guided model for performance enhancement in model-complexity-limited acoustic scene classification. Appl. Sci. 12(1), 1–15 (2021)

    Article  Google Scholar 

  23. Y. Leng, W. Zhao, C. Lin et al., LDA-based data augmentation algorithm for acoustic scene classification. Knowl.-Based Syst. 195, 1–9 (2020)

    Article  Google Scholar 

  24. X. Li, S. Zhang, B. Jiang et al., Dac: data-free automatic acceleration of convolutional networks. In Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2019), pp. 1598–1606

  25. V. Libal, B. Ramabhadran, N. Mana et al., Multimodal classification of activities of daily living inside smart homes. In International Work-Conference on Artificial Neural Networks (Springer, 2009), pp. 687–694

  26. T. Lin, P. Goyal, R. Girshick et al., Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 2980–2988

  27. P. LopezMeyer, J.A. Del Hoyo Ontiveros, H. Lu et al., Low-memory convolutional neural networks for acoustic scene classification. In Detection and Classification of Acoustic Scenes and Events DCASE Challenge (2020), pp. 1–5

  28. A. Mesaros, T. Heittola, T. Virtanen. A multi-device dataset for urban acoustic scene classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE2018) (2018), pp. 9–13

  29. A. Mesaros, T. Heittola, T. Virtanen. Acoustic scene classification: an overview of DCASE 2017 challenge entries. In International Workshop on Acoustic Signal Enhancement (IWAENC) (IEEE, 2018), pp. 411–415

  30. Z. Mo, D. Luo, T. Wen et al., FPGA implementation for odor identification with depthwise separable convolutional neural network. Sensors 21(3), 1–19 (2021)

    Article  Google Scholar 

  31. N. Moritz, J. Schröder, S. Goetze et al., Acoustic scene classification using time-delay neural networks and amplitude modulation filter bank features. In Detection and Classification of Acoustic Scenes and Events Workshop (2016), pp. 1–4

  32. M. Mulimani, S.G. Koolagudi, Robust acoustic event classification using fusion fisher vector features. Appl. Acoust. 155, 130–138 (2019)

    Article  Google Scholar 

  33. N. Pajusco, R. Huang, N. Farrugia. Lightweight convolutional neural networks on binaural waveforms for low complexity acoustic scene classification. In Detection and Classification of Acoustic Scenes and Events DCASE Challenge (2020), pp. 1–5

  34. Z. Pan, Y. Ge, Y.C. Zhou et al., Cognitive acoustic analytics service for Internet of Things. In International Conference on Cognitive Computing (ICCC) (IEEE, 2017), pp. 96–103

  35. S. Park, S. Mun,Y. Lee et al., Acoustic scene classification based on convolutional neural network using double image features. In Detection and Classification of Acoustic Scenes and Events Workshop (2017), pp. 98–102

  36. C. Paseddula, S.V. Gangashetty, Late fusion framework for acoustic scene classification using LPCC, SCMC, and log-mel band energies with deep neural networks. Appl. Acoust. 172, 1–12 (2021)

    Article  Google Scholar 

  37. L. Pham, I. McLoughlin, H. Phan et al., Bag-of-features models based on C-DNN network for acoustic scene classification. In Audio Engineering Society Conference: AES International Conference on Audio Forensics, Audio Engineering Society (2019), pp. 1–12

  38. L. Pham, H. Phan, T. Nguyen et al., Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework. Digit. Signal Process. 110, 1–10 (2021)

    Article  Google Scholar 

  39. L. Pham, H. Tang, A. Jalali et al., A low-complexity deep learning framework for acoustic scene classification. In Data Science—Analytics and Applications (2022), pp. 26–32

  40. L.D. Pham, I.V. McLoughlin, H. Phan et al., A robust framework for acoustic scene classification. In Interspeech (2019), pp. 3634–3638

  41. H. Phan, L. Hertel, M. Maass et al., Label tree embeddings for acoustic scene classification. In Proceedings of the 24th ACM International Conference on Multimedia (2016), pp. 486–490

  42. S.S.R. Phaye, E. Benetos, Y. Wang, Subspectralnet-using sub-spectrogram based convolutional neural networks for acoustic scene classification, in International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2019), pp. 825–829

  43. Z. Ren, K. Qian, Z. Zhang et al., Deep scalogram representations for acoustic scene classification. IEEE/CAA J. Autom. Sin. 5(3), 662–669 (2018)

    Article  Google Scholar 

  44. M. Sandler, A. Howard, M. Zhu et al. Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp 4510–4520

  45. J. Sharma, O. Granmo, M. Goodwin. Environment sound classification using multiple feature channels and attention based deep convolutional neural network. In Interspeech (2020), pp. 1186–1190

  46. C. Shi, H. Yang, Y. Liu et al. Low-complexity acoustic scene classification using data generation based on primary ambient extraction. In International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) (IEEE, 2021), pp. 1–5

  47. L. Sifre, S. Mallat. Rigid-motion scattering for texture classification. Computing Research Repository (CoRR) (2014), pp. 1–19

  48. N. Soni, D. Aggarwal, D. Vij et al., Acoustic scene classification for personal commuting mode: detecting polluting vs. non polluting vehicles. In 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (IEEE, 2018), pp. 274–279

  49. J.K. Soonshin Seo, MobileNet using coordinate attention and fusions for low-complexity acoustic scene classification with multiple devices. In Detection and Classification of Acoustic Scenes and Events DCASE Challenge (2021), pp. 1–5

  50. V. Spoorthy, S.G. Koolagudi, Device robust acoustic scene classification using adaptive noise reduction and convolutional recurrent attention neural network. In Proceedings of Speech and Computer: 24th International Conference, SPECOM, Gurugram, India (Springer, 2022), pp. 688–699

  51. V. Spoorthy, M, Mulimani, S.G. Koolagudi. Acoustic scene classification using deep learning architectures. In 6th International Conference for Convergence in Technology, I2CT (Institute of Electrical and Electronics Engineers Inc., 2021), pp. 1–5

  52. V. Spoorthy, M. Mulimani, S.G. Koolagudi, Acoustic scene classification using deep fisher network. Digit. Signal Process. 139, 1–13 (2023)

    Google Scholar 

  53. Y. Su, K. Zhang, J. Wang et al., Performance analysis of multiple aggregated acoustic features for environment sound classification. Appl. Acoust. 158, 1–11 (2020)

    Article  Google Scholar 

  54. A. Tsanousa, G. Meditskos, S. Vrochidis et al., A weighted late fusion framework for recognizing human activity from wearable sensors, in International Conference on Information, Intelligence, Systems and Applications (IISA) (IEEE, 2019), pp. 1–8

  55. S. Waldekar, G. Saha, Classification of audio scenes with novel features in a fused system framework. Digit. Signal Process. 75, 71–82 (2018)

    Article  MathSciNet  Google Scholar 

  56. S. Waldekar, G. Saha, Analysis and classification of acoustic scenes with wavelet transform-based mel-scaled features. Multimed. Tools Appl. 79(11), 7911–7926 (2020)

    Article  Google Scholar 

  57. J. Xiang, M.F. McKinney, K. Fitz et al., Evaluation of sound classification algorithms for hearing aid applications, in International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2010), pp. 185–188

  58. J. Xiao, S.A. Suab, X. Chen et al., Enhancing assessment of corn growth performance using unmanned aerial vehicles (UAVs) and deep learning. Measurement 214, 1–17 (2023)

    Article  Google Scholar 

  59. Y. Xu, Q. Huang, W. Wang et al., Hierarchical learning for DNN-based acoustic scene classification, in Detection and Classification of Acoustic Scenes and Events (DCASE) workshop (2016), pp. 1–5

  60. L, Yang., X, Chen., L, Tao. Acoustic scene classification using multi-scale features. Detection and Classification of Acoustic Scenes and Events DCASE Challenge (2018), pp. 29–33

  61. L. Yang, L. Tao, X. Chen et al., Multi-scale semantic feature fusion and data augmentation for acoustic scene classification. Appl. Acoust. 163, 1–10 (2020)

    Article  Google Scholar 

  62. T. Zhang, J. Liang, B. Ding, Acoustic scene classification using deep CNN with fine-resolution feature. Expert Syst. Appl. 143, 1–9 (2020)

    Article  Google Scholar 

Download references

Funding

Doctoral fellowship was received from the Ministry of Human Resource Development (MHRD), Government of India to support this research work.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization was contributed by SV, SGK. Literature review was contributed by SV. Writing— original draft preparation, was contributed by SV. Writing—review and editing, was contributed by SGK. Supervision was contributed by SGK.

Corresponding author

Correspondence to Venkatesh Spoorthy.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical Approval

The authors adhere to all the ethics declaration and guarantee that no discrepancies have occurred in the manuscript.

Consent for Publication

The authors and co-authors of the manuscript provide consent for publication in Circuits, Systems, and Signal Processing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Spoorthy, V., Koolagudi, S.G. Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model. Circuits Syst Signal Process 43, 388–407 (2024). https://doi.org/10.1007/s00034-023-02478-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02478-0

Keywords

Navigation