skip to main content
10.1145/3377713.3377720acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacaiConference Proceedingsconference-collections
research-article

A Multi-task Learning Approach Based on Convolutional Neural Network for Acoustic Scene Classification

Published: 07 February 2020 Publication History

Abstract

Acoustic Scene Classification (ASC) aim to recognize an acoustic scene in audio signal records. The acoustic scene is a mixture of background sounds and various sound events, and sound events often determine the type of acoustic scene. However, in many research methods for acoustic scene classification, only a few people have noticed the important information of sound events. In this paper, we combine the ASC task and Sound Event Detection (SED) task, and propose a new CNN approach with multi-task Learning (MTL), which uses SED as an auxiliary task to pay more attention to the information of the sound event in the model. Besides, in view of the characteristic of the sound event with high-energy time-frequency components, we use Global Max Pooling (GMP) instead of the Fully Connected layer (FC) in the traditional CNN. The advantage is that the model focused on distinct high-energy time-frequency components of audio signals (sound event). Finally, extensive experiments are carried out on the TUT acoustic scene 2017 dataset. Our proposed CNN approach with MTL shows better generalization, and improves the Unweighted Average Recall (UAR) of 5.2% over the DCASE 2017 ASC baseline system.

References

[1]
Ito A, Aiba A, Ito A, et al. Detection of abnormal sound using multi-stage GMM for surveillance microphone[C]// International Conference on Information Assurance and Security.Washington D C: IEEE, 2009:733--736.
[2]
Ajmera J, Mccowan I, Bourlard H. Speech/music segmentation using entropy and dynamism features in a HMM classification framework [J]. Speech Communication, 2003, 40(3): 351--363.
[3]
Chit K M. Audio-Based action scene classification using HMM-SVM algorithm [J]. International Journal of Advanced Research in Computer Engineering & Technology, 2013, 2(4): 1347--1351.
[4]
Nguyen-Quoc-Khanh L, Quang-Thai H, Yu-Yen O. Classifying the molecular functions of Rab GTPases in membrane trafficking using deep convolutional neural networks[J]. Analytical Biochemistry, 2018, 555:33--41.
[5]
Le N Q K, Ho Q T, Ou Y Y. Incorporating deep learning with convolutional neural networks and position specific scoring matrices for identifying electron transport proteins[J]. Journal of Computational Chemistry, 2017.
[6]
R. Patiyal and P. Rajan, " Acoustic Scene Classification Using Deep Learning", in Detection and Classification of Acoustic Scenes and Events 2016, Budapest, Hungary, 2016
[7]
L. Hertel, H. Phan and A. Mertins, "Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling", in Workshop on Detection and Classification of Acoustic Scenes and Events 2016, Budapest, Hungary, 2016
[8]
Caruana, R. (1998). Multitask Learning. Autonomous Agents and Multi-Agent Systems, 27(1):95--133.
[9]
Zhang, Z., Luo, P., Loy, C. C., and Tang, X. (2014). Facial Landmark Detectionby Deep Multi-task Learning. In European Conference on Computer Vision, pages 94--108.
[10]
Liu, X., Gao, J, He, X., Deng, L., Duh, K., andWang, Y.-Y. (2015). Representation Learning Using Multi- Task Deep Neural Networks for Semantic Classification and Information Retrieval. NAACL-2015, pages 912--921.
[11]
Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, pages 1440--1448.
[12]
Johnson J, Li F F, Karpathy A, et al. Convolutional neural networks: Architectures, convolution pooling layers [EB/OL].[2017-02-13]. http:// cs231n.github. io/convolutional-networks/.
[13]
Lin M, Chen Q, Yan S. Network In Network [J]. Computer Science, 2013.
[14]
Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. arXiv preprint arXiv:1302.4389, 2013.
[15]
Mcfee B, Raffel C, Liang D W, et al. Librosa: Audio and music signal analysis in Python[C]// Proceedings of the Python 14th Python in Science Conference. Austin: TX, 2015: 18--25.
[16]
Diederik Kingma, Jimmy Ba, "Adam: A method for stochastic optimization, " in arXiv: 14126980, 2014
[17]
Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning[C]// International Conference on Machine Learning. Atlanta: JMLR. org, 2013: 1139--1147.
[18]
B. Schuller, S. Steidl, A. Batliner, E. Bergelson, J. Krajewski, C. Janott, A. Amatuni, M. Casillas, A. Seidl, M. Soderstrom et al., The interspeech 2017 computational paralinguistics challenge: Addressee, cold & snoring
[19]
S.Ioffe and C.Szegedy, "Batch normalization: Accelerating deep network training be reducing internal covariate shift," in CoRR, vol.abs/1502.03167, 2015.
[20]
T. Heittola and A. Mesaros, "DCASE 2017 challenge setup: Tasks, datasets and baseline system," Tech. Rep., DCASE2017 Challenge, September 2017.

Cited By

View all
  • (2024)Acoustic Scene Classification Across Cities and Devices via Feature DisentanglementIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.335357832(1286-1297)Online publication date: 2024
  • (2023)SARdBScene: Dataset and Resnet Baseline for Audio Scene Source Counting and AnalysisICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10097115(1-5)Online publication date: 4-Jun-2023
  • (2021)A novel benchmark dataset of color steel sheds for remote sensing image retrievalEarth Science Informatics10.1007/s12145-021-00593-714:2(809-818)Online publication date: 24-Feb-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence
December 2019
614 pages
ISBN:9781450372619
DOI:10.1145/3377713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Chinese Univ. of Hong Kong: Chinese University of Hong Kong

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Acoustic Scene Classification
  2. Convolutional Neural Network
  3. Multi-task Learning
  4. Sound Event Detection

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Shenzhen Basic Research Program
  • Shenzhen Key Technological Project

Conference

ACAI 2019

Acceptance Rates

ACAI '19 Paper Acceptance Rate 97 of 203 submissions, 48%;
Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Acoustic Scene Classification Across Cities and Devices via Feature DisentanglementIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.335357832(1286-1297)Online publication date: 2024
  • (2023)SARdBScene: Dataset and Resnet Baseline for Audio Scene Source Counting and AnalysisICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10097115(1-5)Online publication date: 4-Jun-2023
  • (2021)A novel benchmark dataset of color steel sheds for remote sensing image retrievalEarth Science Informatics10.1007/s12145-021-00593-714:2(809-818)Online publication date: 24-Feb-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media