Skip to main content
Log in

Environmental Sound Classification Based on Stacked Concatenated DNN using Aggregated Features

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In recent years, there has been an increasing interest in Environmental Sound Classification (ESC), and it is a challenging non-speech audio event classification problem because of the complexity of the environment. However, the classification accuracy of the conventional methods is significantly dependent on the robustness of representative features and the effectiveness of the constructed model, which causes the poor adaptability of current models. Considering this, a novel ESC scheme based on stacked Deep Neural Networks with multi-dimensional aggregated features is proposed. Firstly, we use the aggregated features composed of time-domain features and time–frequency (TF) domain features to capture a more comprehensive representation of sounds. Afterward, the feature reduction based on Principal Component Analysis (PCA) is employed to select the most discriminative representations. Finally, a novel Stacked Deep Neural Networks based on ensemble learning and data augmentation is presented to improve the ESC scheme's generalizing capability. The experimental results demonstrate that the proposed method is appropriate for ESC problems, which achieves 96.1% and 98.1% accuracy scores for ESC-10 and UrbanSound8K datasets, respectively, and outperforms most state-of-art methods in ESC tasks at the aspect of both accuracy and computational burden.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

References

  1. Chachada, S., Jay, C. C. (2014). Environmental sound recognition: A survey. Apsipa Transactions on Signal & Information Processing, 3.

  2. Baum, E., Harper, M., Alicea, R., & Ordonez, C. (2018). Sound Identification for Fire-Fighting Mobile Robots. In 2018 Second IEEE international conference on robotic computing (IRC), pp.79–86.

  3. Mydlarz, C., Salamon, J., & Bello, J. P. (2017). The implementation of low-cost urban acoustic monitoring devices. Applied Acoustics, 117, 207–218.

    Article  Google Scholar 

  4. Fan, X., Sun, T., Chen, W., Fan, Q. (2020). Deep neural network based environment sound classification and its implementation on hearing aid app. Measurement, 159(9), 107790.

  5. Su, Y., Zhang, K., Wang, J., & Madani, K. (2019). Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors (Switzerland), 19(7), 1–15.

    Article  Google Scholar 

  6. Barchiesi, D., Giannoulis, D., Stowellm, D., & Plumbleym, M. D. (2014). Acoustic Scene Classification: Classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 32(3), 16–34.

    Article  Google Scholar 

  7. Cheveigné, D. A. (2008). Computational Auditory Scene Analysis. ISTE.

  8. Mesaros, A., Heittola, T. (2017). DCASE 2017 Challenge setup: Tasks, datasets and baseline system. In Detection and Classification of Acoustic Scenes & Events 2017.

  9. Barchiesi, D., Giannoulis, D. D., Stowell, D., & Plumbley, M. D. (2015). Acoustic Scene Classification: Classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 32(3), 16–34.

    Article  Google Scholar 

  10. Piczak, K. J. (2015). Environmental sound classification with convolutional neural networks. In 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP), pp. 1–6.

  11. Su, Y., Zhang, K., Wang, J., Zhou, D., Madani, K. (2020). Performance analysis of multiple aggregated acoustic features for environment sound classification. Applied Acoustics, 158, 107050.

  12. Dai, W. (2014). Acoustic scene recognition with deep learning. Carnegie Mellon University, Pittsburg, Pennsylvania, USA.

    Google Scholar 

  13. Bountourakis, V., Vrysis, L., Papanikolaou, G. (2015). Machine learning algorithms for environmental sound recognition: Towards soundscape semantics. In Proceedings of the Audio Mostly 2015 on Interaction With Sound (AM '15), pp.1–7.

  14. Bregman, A. S. (1990). Auditory Scene Analysis. MIT Press.

    Book  Google Scholar 

  15. Silva, B. D., Happi, A. W., Braeken, A., & Touhafi, A. (2019). Evaluation of classical Machine Learning techniques towards urban sound recognition on embedded systems. Applied Sciences, 9(18), 3885.

    Article  Google Scholar 

  16. Rakotomamonjy, A. (2017). Supervised representation learning for audio scene classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1253–1265.

    Article  Google Scholar 

  17. Ahmad, S., Agrawal, S., Joshi, S., Taran, S. (2020). Environmental sound classification using optimum allocation sampling based empirical mode decomposition. Physica A: Statistical Mechanics and its Applications, 537, 122613.

  18. Abdoli, S., Cardinal, P., & Lameiras, K. A. (2019). End-to-end environmental sound classification using a 1D convolutional neural network. Expert Systems with Applications, 136, 52–263.

    Article  Google Scholar 

  19. Chen, Y., Guo, Q., Liang, X., Wang, J., & Qian, Y. (2019). Environmental sound classification with dilated convolutions. Applied Acoustics, 148, 123–132.

    Article  Google Scholar 

  20. Huang, Z., Liu, C., Fei, H. (2020). Urban sound classification based on 2-order dense convolutional network using dual features. Applied Acoustics, 164, 107243.

  21. Zhang, Z., Xu, S., Zhang, S., Qiao, T., & Cao, S. (2021). Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing, 453, 896–903.

    Article  Google Scholar 

  22. Parascandolo, G., Huttunen, H., Virtanen, T. (2020). Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6440–6444.

  23. Demir, F., Abdullah, D. A., & Sengur, A. (2020). A New Deep CNN Model for Environmental Sound Classification. IEEE Access, 8, 66529–66537.

    Article  Google Scholar 

  24. Salamon, J., & Bello, J. P. (2017). Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification. IEEE Signal Processing Letters, 24, 279–283.

    Article  Google Scholar 

  25. Lu, R., Duan, Z., Zhang, C. (2017). Metric learning based data augmentation for environmental sound classification. In 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–5.

  26. Mushtaq, Z., & Su, S. F. (2020). Environmental sound classification using a regularized deep convolutional neural network with data augmentation. Applied Acoustics, 167, 107389.

  27. Mun, S., Park, S., Han, D. K., Ko, H. (2017). Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane. Work Detect Classified Acoustic Scenes Events, pp. 93–97.

  28. Mun, S., Shon, S., Kim, W. et al. (2017). Deep Neural Network based learning and transferring mid-level audio features for acoustic scene classification. In 2017 IEEE International Conference Acoustic Speech, Signal Processing (ICASSP), pp. 796–800.

  29. Li, S., Yao, Y., Hu, J., & Liu, G. (2018). An ensemble stacked convolutional neural network model for environmental event sound recognition. Applied Sciences, 87(7), 1152.

    Article  Google Scholar 

  30. Li, X., Chebiyyam, V., Kirchhoff, K. (2019). Multi-stream network with temporal attention for environmental sound classification. In Proceeding Annual Conference International Speech Communication Association INTERSPEECH, pp. 3604–3608.

  31. Piczak, K. J. (2015). ESC: Dataset for environmental sound classification. In Proceedings of the 23rd ACM international conference on Multimedia, pp. 1015–1018.

  32. McFee, B., Raffel, C. et al. (2015). librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in Science Conference, vol. 8, pp.18–25.

  33. Hinton, G. E., Salakhutdinov, R. R., & Code, M. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507.

    Article  MathSciNet  Google Scholar 

  34. Zhang, Z., Xu, S., Cao, S., & Zhang, S. (2018). Deep convolutional neural network with mixup for environmental sound classification. Lecture Notes in Computer Science, 11257, 356–367.

    Article  Google Scholar 

  35. Mushtaq, Z., Su, S. F., Tran, Q. V. (2021). Spectral images based environmental sound classification using CNN with meaningful data augmentation. Appllied Acoustics, 172,107581.

  36. Zhang, X., Zou, Y., Shi, W. (2017). Dilated convolution neural network with LeakyReLU for environmental sound classification. In 2017 22nd International Conference on Digital Signal Processing (DSP), London, UK, pp.1–5.

  37. Zhang, K., Cai, Y., Ren, Y., Ye, R., & He, L. (2020). MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network For Sound Event Detection. IEEE Access, pp.99, 1–1.

Download references

Acknowledgements

This work reported herein was funded jointly by the National Natural Science Foundation of China for Young Scholar (Grant No. 61801471), Youth Innovation Promotion Association CAS (Grant No. 2021022), the development fund for Shanghai talents (Grant No. 2020011), and Jiading Youth Talents Program

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, C.L.; methodology, C.L.; validation, C.L, F.H.; investigation, C.L.; writing—original draft preparation, F.H., and C.L.; visualization, C.L., Y.C., and Y.Z; project administration, F.H. and H.F.; funding acquisition, F.H. and H.F. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Feng Hong.

Ethics declarations

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, C., Hong, F., Feng, H. et al. Environmental Sound Classification Based on Stacked Concatenated DNN using Aggregated Features. J Sign Process Syst 93, 1287–1299 (2021). https://doi.org/10.1007/s11265-021-01702-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-021-01702-x

Keyword

Navigation