Abstract
Acoustic Scene Classification (ASC) is the task of identifying a scene using sound cues and assigning a label to the identified scene. From the past two years, the datasets that are released for ASC consist of audio samples recorded with multiple devices bringing the problem closer to real-world scenarios. Therefore, we aim to develop a device robust ASC model consisting of audio samples recorded with three different devices. The dataset considered is DCASE 2019 ASC task 1a which consists of the primary recording device (Device A) and two mobile devices (Device B and C). This work introduces the Adaptive Noise Reduction (ANR) technique to reduce the device distortion present in devices B and C audio samples. Spectrograms are extracted from all audio samples and normalized to remove biased values in the input signal. The normalized features are fed to Light weight Convolutional Recurrent Attention Neural Network to perform ASC. The key contributions of this work are the reduction of device distortion in mismatched devices and the introduction of an attention layer in the Convolutional Recurrent Neural Network (CRANN). The results achieved from the proposed method have shown a considerable improvement in the accuracy related to mismatched device ASC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Barchiesi, D., Giannoulis, D., Stowell, D., Plumbley, M.D.: Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process. Mag. 32(3), 16ā34 (2015)
Chen, H., Liu, Z., Liu, Z., Zhang, P., Yan, Y.: Integrating the data augmentation scheme with various classifiers for acoustic scene modeling. Technical report, DCASE2019 Challenge (2019)
Dorfer, M., Lehner, B., Eghbal-zadeh, H., Christop, H., Fabian, P., Gerhard, W.: Acoustic scene classification with fully convolutional neural networks and i-vectors. DCASE2018 challenge (2018)
Eghbal-zadeh, H., Koutini, K., Widmer, G.: Acoustic scene classification and audio tagging with receptive-field-regularized CNNs. Technical Report, DCASE 2019 Challenge (2019)
Heittola, T., Mesaros, A., Virtanen, T.: Acoustic scene classification challenge: generalization across devices and low complexity solutions. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE2020), pp. 56ā60 (2020)
Hu, H., et al.: Device-robust acoustic scene classification based on two-stage categorization and data augmentation. Technical report, DCASE2020 Challenge (2020)
Ma, L., Smith, D., Milner, B.: Environmental noise classification for context-aware applications. In: MaÅĆk, V., Retschitzegger, W., Å tÄpĆ”nkovĆ”, O. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 360ā370. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45227-0_36
McDonnell, M.D., Gao, W.: Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 141ā145 (2020)
Mesaros, A., Heittola, T., Virtanen, T.: A multi-device dataset for urban acoustic scene classification. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), pp. 9ā13 (2018)
Misra, H., Ikbal, S., Bourlard, H., Hermansky, H.: Spectral entropy based feature for robust ASR. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-193. IEEE (2004)
Nguyen, T., Pernkopf, F.: Acoustic scene classification using a convolutional neural network ensemble and nearest neighbor filters. In: Workshop on Detection and Classification of Acoustic Scenes and Events (2018)
Nguyen, T., Pernkopf, F., Kosmider, M.: Acoustic scene classification for mismatched recording devices using heated-up softmax and spectrum correction. In: ICASSP 2020ā2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 126ā130. IEEE (2020)
Pham, L.D., Mcloughlin, I., Phan, H.P., Palaniappan, R.: A multi-spectrogram deep neural network for acoustic scene classification technical report (2019)
Plata, M.: Deep neural networks with supported clusters preclassification procedure for acoustic scene recognition. Technical Report, DCASE2019 Challenge (2019)
Sakashita, Y.: Acoustic scene classification by ensemble of spectrograms based on adaptive temporal divisions. In: Technical Report, Detection and Classification of Acoustic Scenes and Events Challenge (2018)
Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1331ā1334. IEEE (1997)
Sehili, M.A., et al.: Sound environment analysis in smart home. In: PaternĆ², F., de Ruyter, B., Markopoulos, P., Santoro, C., van Loenen, E., Luyten, K. (eds.) AmI 2012. LNCS, vol. 7683, pp. 208ā223. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34898-3_14
Song, H., Yang, H.: Feature enhancement for robust acoustic scene classification with device mismatch. Technical Report, DCASE2019 Challenge (2019)
Wang, D., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE press, Hoboken (2006)
ZieliÅski, S.K., Lee, H.: Automatic spatial audio scene classification in binaural recordings of music. Appl. Sci. 9(9), 1724 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Venkatesh, S., Koolagudi, S.G. (2022). Device Robust Acoustic Scene Classification Using Adaptive Noise Reduction andĀ Convolutional Recurrent Attention Neural Network. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_58
Download citation
DOI: https://doi.org/10.1007/978-3-031-20980-2_58
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)