Abstract
This paper proposes a new dictionary-based active learning method for sound event classification, which significantly reduces the required amount of labeled samples in the process of classifier training. Active learning is a process of selecting samples to be labeled. In our method, the active learning is based on clustering. We use dictionary-based clustering as the dictionary learning is more suitable to sound event classification. Our classifier will be trained using both unlabelled sound segments (that have predicted labels), and a small number of labeled samples. The proposed method and other reference methods are implemented on a public urban sound dataset with 8732 sound segments, the classification accuracy is used to measure the performance of these classifiers. Experimental results show that the proposed method has higher classification accuracy but requires much less labeled samples than other methods.
Similar content being viewed by others
References
Barkana BD, Uzkent B (2011) Environmental noise classifier using a new set of feature parameters based on pitch range. Appl Acoust 72(11):841–848
Chu S, Narayanan S, Jay Kuo C-C (2009) Environmental sound recognition with time-frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158
Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221
Duan S, Zhang J, Roe P, Towsey M (2012) A survey of tagging techniques for music, speech and environmental sound. Artif Intell Rev 42(4):637–661
Fleury A, Noury N, Vacher M, Glasson H, Seri JF (2008) Sound and speech detection and classification in a health smart home. In: Proc. IEEE Int. Conf. Engineering in Medicine and Biology Society, p 4644–4647
Foggia P, Petkov N, Saggese A, Strisciuglio N, Vento M (2016) Audio surveillance of roads: a system for detecting anomalous sounds. IEEE Trans Intell Transp Syst 17(1):279–288
Gadde A, Anis A, Ortega A (2014) Active semi-supervised learning using sampling theory for graph signals. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, p 492–501
Ghofrani S, McLernon DC, Ayatollahi A (2003) Comparing Gaussian and chirplet dictionaries for time-frequency analysis using matching pursuit decomposition. In: Signal Processing and Information Technology, 2003. ISSPIT 2003. Proceedings of the 3rd IEEE International Symposium on. IEEE, p 713–716
Gold B, Morgan N, Ellis D (2011) Speech and audio signal processing: processing and perception of speech and music. Wiley, Hoboken
Han W, Coutinho E, Ruan H, Li H, Schuller B, Yu X, Zhu X (2016) Semi-supervised active learning for sound classification in hybrid learning environments. PLoS One 11(9):e0162075
Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. In: Advances in neural information processing systems, p 231–238
Lei C, Zhu X (2017) Unsupervised feature selection via local structure learning and sparse learning. Multimedia Tools and Appl: 1–18
Maijala P, Shuyang Z, Heittola T, Virtanen T (2018) Environmental noise monitoring using source classification in sensors. Appl Acoust 129:258–267
Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415
Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Comm 49(2):98–112
Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341
Phuong NC, Dat TD (2013) Sound classification for event detection: Application into medical telemonitoring. In: Proc. Int. Conf. Computing, Management and Telecommunications (ComManTel), p 330–333
Piczak KJ (2015) ESC: dataset for environmental sound classification. In: Proc. ACM Int. Conf. Multimedia, p 1015–1018
Ren J, Jiang X, Yuan J, Magnenat-Thalmann N (2017) Sound-event classification using robust texture features for robot hearing. IEEE Trans Multimedia 19(3):447–458
Riccardi G, Hakkani-Tur D (2005) Active learning: theory and applications to automatic speech recognition. IEEE Trans Speech Audio Process 13(4):504–511
Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057
Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia. ACM, p. 1041–1044
Schröder J, Anemiiller J, Goetze S (2016) Classification of human cough signals using spectro-temporal Gabor filterbank features. In: Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, p. 6455–6459
Sharan RV, Moir TJ (2017) Robust acoustic event classification using deep neural networks. Inf Sci 396:24–32
Shuyang Z, Heittola T, Virtanen T (2017) Active learning for sound event classification by clustering unlabeled data. In: Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, p 751–755
Sugden P, Canagarajah N (2004) Underdetermined noisy blind separation using dual matching pursuits. In: Acoustics, Speech, and Signal Processing (ICASSP'04). IEEE International Conference on, vol. 5, p V-557. IEEE
Vera-Candeas P, Ruiz-Reyes N, Rosa-Zurera M, Martinez-Munoz D, López-Ferreras F (2004) Transient modeling by matching pursuits with a wavelet dictionary for parametric audio coding. IEEE Signal Process Lett 11(3):349–352
Wang R, Zong M (2018) Unsupervised feature selection based on self-representation and subspace learning. World Wide Web. https://doi.org/10.1007/s11280-017-0508-3
Wang J-C, Lin C-H, Chen B-W, Tsai M-K (2014) Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation. IEEE Trans Autom Sci Eng 11(2):607–613
Wang C-Y, Wang J-C, Santoso A, Chiang C-C, Wu C-H (2017) Sound event recognition using auditory-receptive-field binary pattern and hierarchical-diving deep belief network. IEEE/ACM Transactions on Audio, Speech, and Language Processing, p 1–16
Wang R, Ji W, Liu M, Wang X, Weng J, Deng S, Gao S, Yuan C-a. (2018) Review on mining data from multiple data sources. Pattern Recogn Lett
Zhang Z, Schuller B (2012) Semi-supervised learning helps in sound event classification. In: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, p 333–336
Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient knn classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn Syst
Zheng W, Zhu X, Zhu Y, Hu R, Lei C (2017) Dynamic graph learning for spectral feature selection. Multimed Tools Appl: 1–17
Zhu X (2006) Semi-supervised learning literature survey. University of Wisconsin-Madison, Technical Report 1530, Wisconsin
Zhu X, Zhang S, Hu R, Zhu Y (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529
Acknowledgments
This work was supported in part by the Natural Science Foundation of Zhejiang Province (No. LY18F010008) and the Marsden Fund of New Zealand.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ji, W., Wang, R. & Ma, J. Dictionary-based active learning for sound event classification. Multimed Tools Appl 78, 3831–3842 (2019). https://doi.org/10.1007/s11042-018-6380-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6380-z