Device Robust Acoustic Scene Classification Using Adaptive Noise Reduction and Convolutional Recurrent Attention Neural Network

Venkatesh, Spoorthy; Koolagudi, Shashidhar G.

doi:10.1007/978-3-031-20980-2_58

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

International Conference on Speech and Computer

820 Accesses
2 Citations

Abstract

Acoustic Scene Classification (ASC) is the task of identifying a scene using sound cues and assigning a label to the identified scene. From the past two years, the datasets that are released for ASC consist of audio samples recorded with multiple devices bringing the problem closer to real-world scenarios. Therefore, we aim to develop a device robust ASC model consisting of audio samples recorded with three different devices. The dataset considered is DCASE 2019 ASC task 1a which consists of the primary recording device (Device A) and two mobile devices (Device B and C). This work introduces the Adaptive Noise Reduction (ANR) technique to reduce the device distortion present in devices B and C audio samples. Spectrograms are extracted from all audio samples and normalized to remove biased values in the input signal. The normalized features are fed to Light weight Convolutional Recurrent Attention Neural Network to perform ASC. The key contributions of this work are the reduction of device distortion in mismatched devices and the introduction of an attention layer in the Convolutional Recurrent Neural Network (CRANN). The results achieved from the proposed method have shown a considerable improvement in the accuracy related to mismatched device ASC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Barchiesi, D., Giannoulis, D., Stowell, D., Plumbley, M.D.: Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process. Mag. 32(3), 16–34 (2015)
Article Google Scholar
Chen, H., Liu, Z., Liu, Z., Zhang, P., Yan, Y.: Integrating the data augmentation scheme with various classifiers for acoustic scene modeling. Technical report, DCASE2019 Challenge (2019)
Google Scholar
Dorfer, M., Lehner, B., Eghbal-zadeh, H., Christop, H., Fabian, P., Gerhard, W.: Acoustic scene classification with fully convolutional neural networks and i-vectors. DCASE2018 challenge (2018)
Google Scholar
Eghbal-zadeh, H., Koutini, K., Widmer, G.: Acoustic scene classification and audio tagging with receptive-field-regularized CNNs. Technical Report, DCASE 2019 Challenge (2019)
Google Scholar
Heittola, T., Mesaros, A., Virtanen, T.: Acoustic scene classification challenge: generalization across devices and low complexity solutions. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE2020), pp. 56–60 (2020)
Google Scholar
Hu, H., et al.: Device-robust acoustic scene classification based on two-stage categorization and data augmentation. Technical report, DCASE2020 Challenge (2020)
Google Scholar
Ma, L., Smith, D., Milner, B.: Environmental noise classification for context-aware applications. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 360–370. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45227-0_36
Chapter Google Scholar
McDonnell, M.D., Gao, W.: Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 141–145 (2020)
Google Scholar
Mesaros, A., Heittola, T., Virtanen, T.: A multi-device dataset for urban acoustic scene classification. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), pp. 9–13 (2018)
Google Scholar
Misra, H., Ikbal, S., Bourlard, H., Hermansky, H.: Spectral entropy based feature for robust ASR. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-193. IEEE (2004)
Google Scholar
Nguyen, T., Pernkopf, F.: Acoustic scene classification using a convolutional neural network ensemble and nearest neighbor filters. In: Workshop on Detection and Classification of Acoustic Scenes and Events (2018)
Google Scholar
Nguyen, T., Pernkopf, F., Kosmider, M.: Acoustic scene classification for mismatched recording devices using heated-up softmax and spectrum correction. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 126–130. IEEE (2020)
Google Scholar
Pham, L.D., Mcloughlin, I., Phan, H.P., Palaniappan, R.: A multi-spectrogram deep neural network for acoustic scene classification technical report (2019)
Google Scholar
Plata, M.: Deep neural networks with supported clusters preclassification procedure for acoustic scene recognition. Technical Report, DCASE2019 Challenge (2019)
Google Scholar
Sakashita, Y.: Acoustic scene classification by ensemble of spectrograms based on adaptive temporal divisions. In: Technical Report, Detection and Classification of Acoustic Scenes and Events Challenge (2018)
Google Scholar
Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1331–1334. IEEE (1997)
Google Scholar
Sehili, M.A., et al.: Sound environment analysis in smart home. In: Paternò, F., de Ruyter, B., Markopoulos, P., Santoro, C., van Loenen, E., Luyten, K. (eds.) AmI 2012. LNCS, vol. 7683, pp. 208–223. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34898-3_14
Chapter Google Scholar
Song, H., Yang, H.: Feature enhancement for robust acoustic scene classification with device mismatch. Technical Report, DCASE2019 Challenge (2019)
Google Scholar
Wang, D., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE press, Hoboken (2006)
Book Google Scholar
Zieliński, S.K., Lee, H.: Automatic spatial audio scene classification in binaural recordings of music. Appl. Sci. 9(9), 1724 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Technology Karnataka, Surathkal, India
Spoorthy Venkatesh & Shashidhar G. Koolagudi

Authors

Spoorthy Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar
Shashidhar G. Koolagudi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Spoorthy Venkatesh .

Editor information

Editors and Affiliations

Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna
St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Venkatesh, S., Koolagudi, S.G. (2022). Device Robust Acoustic Scene Classification Using Adaptive Noise Reduction and Convolutional Recurrent Attention Neural Network. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_58

Download citation

DOI: https://doi.org/10.1007/978-3-031-20980-2_58
Published: 10 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Device Robust Acoustic Scene Classification Using Adaptive Noise Reduction and Convolutional Recurrent Attention Neural Network