Sparse Representation Frameworks for Acoustic Scene Classification

Tyagi, Akansha; Rajan, Padmanabhan

doi:10.1007/978-3-031-48309-7_15

Akansha Tyagi¹³ &
Padmanabhan Rajan¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14338))

Included in the following conference series:

International Conference on Speech and Computer

434 Accesses

Abstract

This work addresses the task of acoustic scene classification (ASC) by using sparse representation frameworks, motivated by the inherent sparseness of audio data. We explore three different sparse representation classification (SRC) frameworks, generating sparse acoustic scene representations. The first two frameworks focus on producing linear and non-linear features respectively. On the other hand, the third framework presents a novel approach-a two-branch deep sparse auto-encoder (DSAE) representation framework that generates non-linear and discriminative features. In the proposed framework, the first branch induces sparsity, while the second focuses on enforcing discrimination within the learned sparse acoustic scene representations. These representations are later used to classify the acoustic scene data into different acoustic scene classes. We also compare the performance of the three sparse frameworks by evaluating them on three ASC datasets. Our results indicate that acoustic scene representations based on DSAE outperform the sparse representations obtained from the other two frameworks. This results in an average performance gain of approximately 8% across all the ASC datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Abavisani, M., Patel, V.M.: Deep sparse representation-based classification. IEEE Signal Process. Lett. 26(6), 948–952 (2019). https://doi.org/10.1109/LSP.2019.2913022
Article Google Scholar
Aryal, N., Lee, S.W.: Attention-based resnet-18 model for acoustic scene classification. Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge (2020)
Google Scholar
Barchiesi, D., Giannoulis, D., Stowell, D., Plumbley, M.D.: Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process. Mag. 32(3), 16–34 (2015)
Article Google Scholar
Cramer, J., Wu, H.H., Salamon, J., Bello, J.P.: Look, listen, and learn more: design choices for deep audio embeddings. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Google Scholar
Heittola, T., Mesaros, A., Virtanen, T.: Acoustic scene classification in DCASE 2020 challenge: generalization across devices and low complexity solutions. arXiv preprint arXiv:2005.14623 (2020)
Jun, W., Shengchen, L.: Self-attention mechanism based system for dcase2018 challenge task1 and task4. Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge (2018)
Google Scholar
Kua, J.M.K., Ambikairajah, E., Epps, J., Togneri, R.: Speaker verification using sparse representation classification. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4548–4551. IEEE (2011)
Google Scholar
Liang, H., Ma, Y.: Acoustic scene classification using attention-based convolutional neural network. Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge (2019)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11) (2008)
Google Scholar
Ng, A., et al.: Sparse autoencoder. CS294A Lect. notes 72(2011), 1–19 (2011)
Google Scholar
Ren, Z., Kong, Q., Qian, K., Plumbley, M., Schuller, B.: Attention-based convolutional neural networks for acoustic scene classification. Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge (2018)
Google Scholar
Sainath, T.N., Carmi, A., Kanevsky, D., Ramabhadran, B.: Bayesian compressive sensing for phonetic classification. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4370–4373 (2010). https://doi.org/10.1109/ICASSP.2010.5495638
Salvati, D., Drioli, C., Foresti, G.L.: Urban acoustic scene classification using raw waveform convolutional neural networks. Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge (2019)
Google Scholar
Virtanen, T., Plumbley, M.D., Ellis, D. (eds.): Computational Analysis of Sound Scenes and Events. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-63450-0
Book Google Scholar
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009). https://doi.org/10.1109/TPAMI.2008.79
Article Google Scholar
Zhang, C., Zhu, H., Ting, C.: Simple convolutional networks attempting acoustic scene classification cross devices. Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Electrical Engineering, Indian Institute of Technology, Mandi, India
Akansha Tyagi & Padmanabhan Rajan

Authors

Akansha Tyagi
View author publications
You can also search for this author in PubMed Google Scholar
Padmanabhan Rajan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akansha Tyagi .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
Indian Institute of Information Technology Dharwad, Dharwad, India
K. T. Deepak
Indian Institute of Technology Dharwad, Dharwad, India
Rajesh M. Hegde
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal
Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tyagi, A., Rajan, P. (2023). Sparse Representation Frameworks for Acoustic Scene Classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-48309-7_15
Published: 22 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sparse Representation Frameworks for Acoustic Scene Classification