research-article

A Multi-task Learning Approach Based on Convolutional Neural Network for Acoustic Scene Classification

Authors:

Xiao SongAuthors Info & Claims

ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence

Pages 23 - 27

https://doi.org/10.1145/3377713.3377720

Published: 07 February 2020 Publication History

Abstract

Acoustic Scene Classification (ASC) aim to recognize an acoustic scene in audio signal records. The acoustic scene is a mixture of background sounds and various sound events, and sound events often determine the type of acoustic scene. However, in many research methods for acoustic scene classification, only a few people have noticed the important information of sound events. In this paper, we combine the ASC task and Sound Event Detection (SED) task, and propose a new CNN approach with multi-task Learning (MTL), which uses SED as an auxiliary task to pay more attention to the information of the sound event in the model. Besides, in view of the characteristic of the sound event with high-energy time-frequency components, we use Global Max Pooling (GMP) instead of the Fully Connected layer (FC) in the traditional CNN. The advantage is that the model focused on distinct high-energy time-frequency components of audio signals (sound event). Finally, extensive experiments are carried out on the TUT acoustic scene 2017 dataset. Our proposed CNN approach with MTL shows better generalization, and improves the Unweighted Average Recall (UAR) of 5.2% over the DCASE 2017 ASC baseline system.

References

[1]

Ito A, Aiba A, Ito A, et al. Detection of abnormal sound using multi-stage GMM for surveillance microphone[C]// International Conference on Information Assurance and Security.Washington D C: IEEE, 2009:733--736.

[2]

Ajmera J, Mccowan I, Bourlard H. Speech/music segmentation using entropy and dynamism features in a HMM classification framework [J]. Speech Communication, 2003, 40(3): 351--363.

Digital Library

[3]

Chit K M. Audio-Based action scene classification using HMM-SVM algorithm [J]. International Journal of Advanced Research in Computer Engineering & Technology, 2013, 2(4): 1347--1351.

[4]

Nguyen-Quoc-Khanh L, Quang-Thai H, Yu-Yen O. Classifying the molecular functions of Rab GTPases in membrane trafficking using deep convolutional neural networks[J]. Analytical Biochemistry, 2018, 555:33--41.

[5]

Le N Q K, Ho Q T, Ou Y Y. Incorporating deep learning with convolutional neural networks and position specific scoring matrices for identifying electron transport proteins[J]. Journal of Computational Chemistry, 2017.

[6]

R. Patiyal and P. Rajan, " Acoustic Scene Classification Using Deep Learning", in Detection and Classification of Acoustic Scenes and Events 2016, Budapest, Hungary, 2016

[7]

L. Hertel, H. Phan and A. Mertins, "Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling", in Workshop on Detection and Classification of Acoustic Scenes and Events 2016, Budapest, Hungary, 2016

[8]

Caruana, R. (1998). Multitask Learning. Autonomous Agents and Multi-Agent Systems, 27(1):95--133.

[9]

Zhang, Z., Luo, P., Loy, C. C., and Tang, X. (2014). Facial Landmark Detectionby Deep Multi-task Learning. In European Conference on Computer Vision, pages 94--108.

[10]

Liu, X., Gao, J, He, X., Deng, L., Duh, K., andWang, Y.-Y. (2015). Representation Learning Using Multi- Task Deep Neural Networks for Semantic Classification and Information Retrieval. NAACL-2015, pages 912--921.

[11]

Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, pages 1440--1448.

[12]

Johnson J, Li F F, Karpathy A, et al. Convolutional neural networks: Architectures, convolution pooling layers [EB/OL].[2017-02-13]. http:// cs231n.github. io/convolutional-networks/.

[13]

Lin M, Chen Q, Yan S. Network In Network [J]. Computer Science, 2013.

[14]

Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. arXiv preprint arXiv:1302.4389, 2013.

[15]

Mcfee B, Raffel C, Liang D W, et al. Librosa: Audio and music signal analysis in Python[C]// Proceedings of the Python 14th Python in Science Conference. Austin: TX, 2015: 18--25.

[16]

Diederik Kingma, Jimmy Ba, "Adam: A method for stochastic optimization, " in arXiv: 14126980, 2014

[17]

Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning[C]// International Conference on Machine Learning. Atlanta: JMLR. org, 2013: 1139--1147.

[18]

B. Schuller, S. Steidl, A. Batliner, E. Bergelson, J. Krajewski, C. Janott, A. Amatuni, M. Casillas, A. Seidl, M. Soderstrom et al., The interspeech 2017 computational paralinguistics challenge: Addressee, cold & snoring

[19]

S.Ioffe and C.Szegedy, "Batch normalization: Accelerating deep network training be reducing internal covariate shift," in CoRR, vol.abs/1502.03167, 2015.

[20]

T. Heittola and A. Mesaros, "DCASE 2017 challenge setup: Tasks, datasets and baseline system," Tech. Rep., DCASE2017 Challenge, September 2017.

Cited By

Tan YAi HLi SPlumbley M(2024)Acoustic Scene Classification Across Cities and Devices via Feature DisentanglementIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.335357832(1286-1297)Online publication date: 2024
https://doi.org/10.1109/TASLP.2024.3353578
Nigro MKrishnan S(2023)SARdBScene: Dataset and Resnet Baseline for Audio Scene Source Counting and AnalysisICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10097115(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10097115
Hou DWang SXing H(2021)A novel benchmark dataset of color steel sheds for remote sensing image retrievalEarth Science Informatics10.1007/s12145-021-00593-714:2(809-818)Online publication date: 24-Feb-2021
https://doi.org/10.1007/s12145-021-00593-7

Index Terms

A Multi-task Learning Approach Based on Convolutional Neural Network for Acoustic Scene Classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems
      1. Digital signal processing

Recommendations

Acoustic Event and Scene Classification: A Review
Abstract
This paper gives deeper insight into the range of recent approaches developed and reported in the literature specifically for monophonic acoustic event classification (AEC), polyphonic acoustic event detection (AED) and acoustic scene ...
Acoustic Scene Classification based on Sound Textures and Events
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Semantic labelling of acoustic scenes has recently emerged as active topic covering a wide range of applications, e.g. surveillance and audio-based information retrieval. In this paper, we present an effective approach for acoustic scene classification ...
Sound recurrence analysis for acoustic scene classification
Abstract
In everyday life, people experience different soundscapes in which natural sounds, animal noises, and man-made sounds blend together. Although there have been several studies on the importance of recurring sound patterns in music and language, the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence

December 2019

614 pages

ISBN:9781450372619

DOI:10.1145/3377713

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Chinese Univ. of Hong Kong: Chinese University of Hong Kong

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Shenzhen Basic Research Program
Shenzhen Key Technological Project

Conference

ACAI 2019

ACAI 2019: 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence

December 20 - 22, 2019

Sanya, China

Acceptance Rates

ACAI '19 Paper Acceptance Rate 97 of 203 submissions, 48%;

Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
112
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tan YAi HLi SPlumbley M(2024)Acoustic Scene Classification Across Cities and Devices via Feature DisentanglementIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.335357832(1286-1297)Online publication date: 2024
https://doi.org/10.1109/TASLP.2024.3353578
Nigro MKrishnan S(2023)SARdBScene: Dataset and Resnet Baseline for Audio Scene Source Counting and AnalysisICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10097115(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10097115
Hou DWang SXing H(2021)A novel benchmark dataset of color steel sheds for remote sensing image retrievalEarth Science Informatics10.1007/s12145-021-00593-714:2(809-818)Online publication date: 24-Feb-2021
https://doi.org/10.1007/s12145-021-00593-7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten