Skip to main content
Log in

Dictionary-based active learning for sound event classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a new dictionary-based active learning method for sound event classification, which significantly reduces the required amount of labeled samples in the process of classifier training. Active learning is a process of selecting samples to be labeled. In our method, the active learning is based on clustering. We use dictionary-based clustering as the dictionary learning is more suitable to sound event classification. Our classifier will be trained using both unlabelled sound segments (that have predicted labels), and a small number of labeled samples. The proposed method and other reference methods are implemented on a public urban sound dataset with 8732 sound segments, the classification accuracy is used to measure the performance of these classifiers. Experimental results show that the proposed method has higher classification accuracy but requires much less labeled samples than other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Barkana BD, Uzkent B (2011) Environmental noise classifier using a new set of feature parameters based on pitch range. Appl Acoust 72(11):841–848

    Article  Google Scholar 

  2. Chu S, Narayanan S, Jay Kuo C-C (2009) Environmental sound recognition with time-frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158

    Article  Google Scholar 

  3. Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221

    Google Scholar 

  4. Duan S, Zhang J, Roe P, Towsey M (2012) A survey of tagging techniques for music, speech and environmental sound. Artif Intell Rev 42(4):637–661

    Article  Google Scholar 

  5. Fleury A, Noury N, Vacher M, Glasson H, Seri JF (2008) Sound and speech detection and classification in a health smart home. In: Proc. IEEE Int. Conf. Engineering in Medicine and Biology Society, p 4644–4647

  6. Foggia P, Petkov N, Saggese A, Strisciuglio N, Vento M (2016) Audio surveillance of roads: a system for detecting anomalous sounds. IEEE Trans Intell Transp Syst 17(1):279–288

    Article  Google Scholar 

  7. Gadde A, Anis A, Ortega A (2014) Active semi-supervised learning using sampling theory for graph signals. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, p 492–501

  8. Ghofrani S, McLernon DC, Ayatollahi A (2003) Comparing Gaussian and chirplet dictionaries for time-frequency analysis using matching pursuit decomposition. In: Signal Processing and Information Technology, 2003. ISSPIT 2003. Proceedings of the 3rd IEEE International Symposium on. IEEE, p 713–716

  9. Gold B, Morgan N, Ellis D (2011) Speech and audio signal processing: processing and perception of speech and music. Wiley, Hoboken

    Book  Google Scholar 

  10. Han W, Coutinho E, Ruan H, Li H, Schuller B, Yu X, Zhu X (2016) Semi-supervised active learning for sound classification in hybrid learning environments. PLoS One 11(9):e0162075

    Article  Google Scholar 

  11. Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. In: Advances in neural information processing systems, p 231–238

  12. Lei C, Zhu X (2017) Unsupervised feature selection via local structure learning and sparse learning. Multimedia Tools and Appl: 1–18

  13. Maijala P, Shuyang Z, Heittola T, Virtanen T (2018) Environmental noise monitoring using source classification in sensors. Appl Acoust 129:258–267

    Article  Google Scholar 

  14. Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415

    Article  Google Scholar 

  15. Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Comm 49(2):98–112

    Article  Google Scholar 

  16. Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341

    Article  Google Scholar 

  17. Phuong NC, Dat TD (2013) Sound classification for event detection: Application into medical telemonitoring. In: Proc. Int. Conf. Computing, Management and Telecommunications (ComManTel), p 330–333

  18. Piczak KJ (2015) ESC: dataset for environmental sound classification. In: Proc. ACM Int. Conf. Multimedia, p 1015–1018

  19. Ren J, Jiang X, Yuan J, Magnenat-Thalmann N (2017) Sound-event classification using robust texture features for robot hearing. IEEE Trans Multimedia 19(3):447–458

    Article  Google Scholar 

  20. Riccardi G, Hakkani-Tur D (2005) Active learning: theory and applications to automatic speech recognition. IEEE Trans Speech Audio Process 13(4):504–511

    Article  Google Scholar 

  21. Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057

    Article  Google Scholar 

  22. Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia. ACM, p. 1041–1044

  23. Schröder J, Anemiiller J, Goetze S (2016) Classification of human cough signals using spectro-temporal Gabor filterbank features. In: Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, p. 6455–6459

  24. Sharan RV, Moir TJ (2017) Robust acoustic event classification using deep neural networks. Inf Sci 396:24–32

    Article  Google Scholar 

  25. Shuyang Z, Heittola T, Virtanen T (2017) Active learning for sound event classification by clustering unlabeled data. In: Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, p 751–755

  26. Sugden P, Canagarajah N (2004) Underdetermined noisy blind separation using dual matching pursuits. In: Acoustics, Speech, and Signal Processing (ICASSP'04). IEEE International Conference on, vol. 5, p V-557. IEEE

  27. Vera-Candeas P, Ruiz-Reyes N, Rosa-Zurera M, Martinez-Munoz D, López-Ferreras F (2004) Transient modeling by matching pursuits with a wavelet dictionary for parametric audio coding. IEEE Signal Process Lett 11(3):349–352

    Article  Google Scholar 

  28. Wang R, Zong M (2018) Unsupervised feature selection based on self-representation and subspace learning. World Wide Web. https://doi.org/10.1007/s11280-017-0508-3

  29. Wang J-C, Lin C-H, Chen B-W, Tsai M-K (2014) Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation. IEEE Trans Autom Sci Eng 11(2):607–613

    Article  Google Scholar 

  30. Wang C-Y, Wang J-C, Santoso A, Chiang C-C, Wu C-H (2017) Sound event recognition using auditory-receptive-field binary pattern and hierarchical-diving deep belief network. IEEE/ACM Transactions on Audio, Speech, and Language Processing, p 1–16

  31. Wang R, Ji W, Liu M, Wang X, Weng J, Deng S, Gao S, Yuan C-a. (2018) Review on mining data from multiple data sources. Pattern Recogn Lett

  32. Zhang Z, Schuller B (2012) Semi-supervised learning helps in sound event classification. In: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, p 333–336

  33. Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient knn classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn Syst

  34. Zheng W, Zhu X, Zhu Y, Hu R, Lei C (2017) Dynamic graph learning for spectral feature selection. Multimed Tools Appl: 1–17

  35. Zhu X (2006) Semi-supervised learning literature survey. University of Wisconsin-Madison, Technical Report 1530, Wisconsin

  36. Zhu X, Zhang S, Hu R, Zhu Y (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of Zhejiang Province (No. LY18F010008) and the Marsden Fund of New Zealand.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wanting Ji.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, W., Wang, R. & Ma, J. Dictionary-based active learning for sound event classification. Multimed Tools Appl 78, 3831–3842 (2019). https://doi.org/10.1007/s11042-018-6380-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6380-z

Keywords

Navigation