Environmental Sound Classification Based on Stacked Concatenated DNN using Aggregated Features

Liu, Chengwei; Hong, Feng; Feng, Haihong; Zhai, Yushuang; Chen, Youyuan

doi:10.1007/s11265-021-01702-x

Environmental Sound Classification Based on Stacked Concatenated DNN using Aggregated Features

Published: 16 September 2021

Volume 93, pages 1287–1299, (2021)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Chengwei Liu^1,2,
Feng Hong ORCID: orcid.org/0000-0002-8727-4228¹,
Haihong Feng^1,2,
Yushuang Zhai^1,2 &
…
Youyuan Chen¹

472 Accesses
5 Citations
Explore all metrics

Abstract

In recent years, there has been an increasing interest in Environmental Sound Classification (ESC), and it is a challenging non-speech audio event classification problem because of the complexity of the environment. However, the classification accuracy of the conventional methods is significantly dependent on the robustness of representative features and the effectiveness of the constructed model, which causes the poor adaptability of current models. Considering this, a novel ESC scheme based on stacked Deep Neural Networks with multi-dimensional aggregated features is proposed. Firstly, we use the aggregated features composed of time-domain features and time–frequency (TF) domain features to capture a more comprehensive representation of sounds. Afterward, the feature reduction based on Principal Component Analysis (PCA) is employed to select the most discriminative representations. Finally, a novel Stacked Deep Neural Networks based on ensemble learning and data augmentation is presented to improve the ESC scheme's generalizing capability. The experimental results demonstrate that the proposed method is appropriate for ESC problems, which achieves 96.1% and 98.1% accuracy scores for ESC-10 and UrbanSound8K datasets, respectively, and outperforms most state-of-art methods in ESC tasks at the aspect of both accuracy and computational burden.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 3

Environmental Sound Recognition Based on Residual Network and Stacking Algorithm

Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

Article 01 July 2021

Yu Zhang, Jinfang Zeng, … Da Chen

References

Chachada, S., Jay, C. C. (2014). Environmental sound recognition: A survey. Apsipa Transactions on Signal & Information Processing, 3.
Baum, E., Harper, M., Alicea, R., & Ordonez, C. (2018). Sound Identification for Fire-Fighting Mobile Robots. In 2018 Second IEEE international conference on robotic computing (IRC), pp.79–86.
Mydlarz, C., Salamon, J., & Bello, J. P. (2017). The implementation of low-cost urban acoustic monitoring devices. Applied Acoustics, 117, 207–218.
Article Google Scholar
Fan, X., Sun, T., Chen, W., Fan, Q. (2020). Deep neural network based environment sound classification and its implementation on hearing aid app. Measurement, 159(9), 107790.
Su, Y., Zhang, K., Wang, J., & Madani, K. (2019). Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors (Switzerland), 19(7), 1–15.
Article Google Scholar
Barchiesi, D., Giannoulis, D., Stowellm, D., & Plumbleym, M. D. (2014). Acoustic Scene Classification: Classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 32(3), 16–34.
Article Google Scholar
Cheveigné, D. A. (2008). Computational Auditory Scene Analysis. ISTE.
Mesaros, A., Heittola, T. (2017). DCASE 2017 Challenge setup: Tasks, datasets and baseline system. In Detection and Classification of Acoustic Scenes & Events 2017.
Barchiesi, D., Giannoulis, D. D., Stowell, D., & Plumbley, M. D. (2015). Acoustic Scene Classification: Classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 32(3), 16–34.
Article Google Scholar
Piczak, K. J. (2015). Environmental sound classification with convolutional neural networks. In 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP), pp. 1–6.
Su, Y., Zhang, K., Wang, J., Zhou, D., Madani, K. (2020). Performance analysis of multiple aggregated acoustic features for environment sound classification. Applied Acoustics, 158, 107050.
Dai, W. (2014). Acoustic scene recognition with deep learning. Carnegie Mellon University, Pittsburg, Pennsylvania, USA.
Google Scholar
Bountourakis, V., Vrysis, L., Papanikolaou, G. (2015). Machine learning algorithms for environmental sound recognition: Towards soundscape semantics. In Proceedings of the Audio Mostly 2015 on Interaction With Sound (AM '15), pp.1–7.
Bregman, A. S. (1990). Auditory Scene Analysis. MIT Press.
Book Google Scholar
Silva, B. D., Happi, A. W., Braeken, A., & Touhafi, A. (2019). Evaluation of classical Machine Learning techniques towards urban sound recognition on embedded systems. Applied Sciences, 9(18), 3885.
Article Google Scholar
Rakotomamonjy, A. (2017). Supervised representation learning for audio scene classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1253–1265.
Article Google Scholar
Ahmad, S., Agrawal, S., Joshi, S., Taran, S. (2020). Environmental sound classification using optimum allocation sampling based empirical mode decomposition. Physica A: Statistical Mechanics and its Applications, 537, 122613.
Abdoli, S., Cardinal, P., & Lameiras, K. A. (2019). End-to-end environmental sound classification using a 1D convolutional neural network. Expert Systems with Applications, 136, 52–263.
Article Google Scholar
Chen, Y., Guo, Q., Liang, X., Wang, J., & Qian, Y. (2019). Environmental sound classification with dilated convolutions. Applied Acoustics, 148, 123–132.
Article Google Scholar
Huang, Z., Liu, C., Fei, H. (2020). Urban sound classification based on 2-order dense convolutional network using dual features. Applied Acoustics, 164, 107243.
Zhang, Z., Xu, S., Zhang, S., Qiao, T., & Cao, S. (2021). Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing, 453, 896–903.
Article Google Scholar
Parascandolo, G., Huttunen, H., Virtanen, T. (2020). Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6440–6444.
Demir, F., Abdullah, D. A., & Sengur, A. (2020). A New Deep CNN Model for Environmental Sound Classification. IEEE Access, 8, 66529–66537.
Article Google Scholar
Salamon, J., & Bello, J. P. (2017). Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification. IEEE Signal Processing Letters, 24, 279–283.
Article Google Scholar
Lu, R., Duan, Z., Zhang, C. (2017). Metric learning based data augmentation for environmental sound classification. In 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–5.
Mushtaq, Z., & Su, S. F. (2020). Environmental sound classification using a regularized deep convolutional neural network with data augmentation. Applied Acoustics, 167, 107389.
Mun, S., Park, S., Han, D. K., Ko, H. (2017). Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane. Work Detect Classified Acoustic Scenes Events, pp. 93–97.
Mun, S., Shon, S., Kim, W. et al. (2017). Deep Neural Network based learning and transferring mid-level audio features for acoustic scene classification. In 2017 IEEE International Conference Acoustic Speech, Signal Processing (ICASSP), pp. 796–800.
Li, S., Yao, Y., Hu, J., & Liu, G. (2018). An ensemble stacked convolutional neural network model for environmental event sound recognition. Applied Sciences, 87(7), 1152.
Article Google Scholar
Li, X., Chebiyyam, V., Kirchhoff, K. (2019). Multi-stream network with temporal attention for environmental sound classification. In Proceeding Annual Conference International Speech Communication Association INTERSPEECH, pp. 3604–3608.
Piczak, K. J. (2015). ESC: Dataset for environmental sound classification. In Proceedings of the 23rd ACM international conference on Multimedia, pp. 1015–1018.
McFee, B., Raffel, C. et al. (2015). librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in Science Conference, vol. 8, pp.18–25.
Hinton, G. E., Salakhutdinov, R. R., & Code, M. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507.
Article MathSciNet Google Scholar
Zhang, Z., Xu, S., Cao, S., & Zhang, S. (2018). Deep convolutional neural network with mixup for environmental sound classification. Lecture Notes in Computer Science, 11257, 356–367.
Article Google Scholar
Mushtaq, Z., Su, S. F., Tran, Q. V. (2021). Spectral images based environmental sound classification using CNN with meaningful data augmentation. Appllied Acoustics, 172,107581.
Zhang, X., Zou, Y., Shi, W. (2017). Dilated convolution neural network with LeakyReLU for environmental sound classification. In 2017 22nd International Conference on Digital Signal Processing (DSP), London, UK, pp.1–5.
Zhang, K., Cai, Y., Ren, Y., Ye, R., & He, L. (2020). MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network For Sound Event Detection. IEEE Access, pp.99, 1–1.

Download references

Acknowledgements

This work reported herein was funded jointly by the National Natural Science Foundation of China for Young Scholar (Grant No. 61801471), Youth Innovation Promotion Association CAS (Grant No. 2021022), the development fund for Shanghai talents (Grant No. 2020011), and Jiading Youth Talents Program

Author information

Authors and Affiliations

Shanghai Acoustics Lab, Chinese Academy of Sciences, Shanghai, China
Chengwei Liu, Feng Hong, Haihong Feng, Yushuang Zhai & Youyuan Chen
University of Chinese Academy of Sciences, Beijing, China
Chengwei Liu, Haihong Feng & Yushuang Zhai

Authors

Chengwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Feng Hong
View author publications
You can also search for this author in PubMed Google Scholar
Haihong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yushuang Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Youyuan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, C.L.; methodology, C.L.; validation, C.L, F.H.; investigation, C.L.; writing—original draft preparation, F.H., and C.L.; visualization, C.L., Y.C., and Y.Z; project administration, F.H. and H.F.; funding acquisition, F.H. and H.F. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Feng Hong.

Ethics declarations

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C., Hong, F., Feng, H. et al. Environmental Sound Classification Based on Stacked Concatenated DNN using Aggregated Features. J Sign Process Syst 93, 1287–1299 (2021). https://doi.org/10.1007/s11265-021-01702-x

Download citation

Received: 18 April 2021
Revised: 22 August 2021
Accepted: 01 September 2021
Published: 16 September 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11265-021-01702-x

Keyword

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Environmental Sound Classification Based on Stacked Concatenated DNN using Aggregated Features

Abstract

Access this article

Similar content being viewed by others

Environmental Sound Recognition Based on Residual Network and Stacking Algorithm

Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keyword

Navigation

Environmental Sound Classification Based on Stacked Concatenated DNN using Aggregated Features

Abstract

Access this article

Similar content being viewed by others

Environmental Sound Recognition Based on Residual Network and Stacking Algorithm

Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keyword

Search

Navigation