Abstract
Sound event tagging is a process that adds texts or labels to sound segments based on their salient features and/or annotations. In the real world, since annotating cost is much expensive, tagged sound segments are limited, while untagged sound segments can be obtained easily and inexpensively. Thus, semi-automatic tagging becomes very important, which can assign labels to massive untagged sound segments according to a small number of manually annotated sound segments. Active learning is an effective technique to solve this problem, in which selected sound segments are manually tagged while other sound segments are automatically tagged. In this paper, a learnt dictionary based active learning method is proposed for environmental sound event tagging, which can significantly reduce the annotating cost in the process of semi-automatic tagging. The proposed method is based on a learnt dictionary, as dictionary learning is more adapt to sound feature extraction. Moreover, tagging accuracy and annotating cost are used to measure the performance of the proposed method. Experimental results demonstrate that the proposed method has higher tagging accuracy but requires much less annotating cost than other existing methods.



References
Aharon M, Elad M, Bruckstein A (2006) Rm k-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Biljana L, Stojkoska R, Kire V (2017) Trivodaliev. a review of internet of things for smart home: challenges and solutions. J Clean Prod 140:1454–1464
Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning. IEEE Trans Neural Netw 20(3):542–542
Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159
Chu S, Narayanan S, Jay Kuo C-C (2009) Environmental sound recognition with time-frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158
Duan S, Zhang J, Roe P, Towsey M (2014) A survey of tagging techniques for music, speech and environmental sound. Artif Intell Rev 42(4):637–661
Engan K, Aase SO, Hakon Husoy J (1999) Method of optimal directions for frame design. Acoust Speech Sign Process 1999 Proc 1999 IEEE Int Conf 5:2443–2446
Fleury A, Noury N, Vacher M, Glasson H, Seri JF (2008) Sound and speech detection and classification in a health smart home. Proc IEEE Int Conf Eng Med Biol Soc: 4644–4647
Foggia P, Petkov N, Saggese A, Strisciuglio N, Vento M (2016) Audio surveillance of roads: a system for detecting anomalous sounds. IEEE Trans Intell Transp Syst 17(1):279–288
Gadde A, Anis A, Ortega A (2014) Active semi-supervised learning using sampling theory for graph signals. Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining: 492–501
Han W, Coutinho E, Ruan H, Li H, Schuller B, Yu X, Zhu X (2016) Semi-supervised active learning for sound classification in hybrid learning environments. PLoS One 11(9):e0162075
Jayalakshmi SL, Chandrakala S, Nedunchelian R (2018) Global statistical features-based approach for acoustic event detection. Appl Acoust 139:113–118
Ji W, Wang R, Ma J (2018) Dictionary-based active learning method for sound event classification. Multimed Tools Appl
Jin X, Han J (2011) K-medoids clustering. Encyclopedia of machine learning: 564–565
Lewicki MS, Sejnowski TJ (2000) Learning overcomplete representations. Neural Comput 12(2):337–365
Maijala P, Zhao S, Heittola T, Virtanen T (2018) Environmental noise monitoring using source classification in sensors. Appl Acoust 129:258–267
Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415
Mesaros A, Heittola T, Eronen A, Virtanen T (2010) Acoustic event detection in real life recordings. Signal Process Conf 2010 18th European: 1267–1271
Morrison D, Wang R, De Silva LC (2005) Spoken affect classification using neural networks. Granular Comput, 2005 IEEE Int Conf: 583–586
Morrison D, Wang R, De Silva LC, Xu WL (2005) Real-time spoken affect classification and its application in call-centres. Information technology and applications, 2005. ICITA 2005. Third international conference on 1:483–487
Ophir B, Lustig M, Elad M (2011) Multi-scale dictionary learning using wavelets. IEEE J Select Topics Signal Process 5(5):1014–1024
Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341
Pati YC, Rezaiifar R, Krishnaprasad PS (1993) Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. Signals, systems and computers, 1993. 1993 conference record of the twenty-seventh Asilomar conference on: 40–44
Piczak KJ (2015) ESC: dataset for environmental sound classification. Proceedings of the 23rd ACM international conference on Multimedia: 1015–1018
Ren J, Jiang X, Yuan J, Magnenat-Thalmann N (2017) Sound-event classification using robust texture features for robot hearing. IEEE Trans Multimed 19(3):447–458
Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. Proceedings of the 22nd ACM international conference on Multimedia: 1041–1044
Schröder J, Anemiiller J, Goetze S (2016) Classification of human cough signals using spectro-temporal Gabor filterbank features. Acoustics, speech and signal processing (ICASSP), 2016 IEEE international conference on: 6455–6459
Sharan RV, Moir TJ (2017) Robust acoustic event classification using deep neural networks. Inf Sci 396:24–32
Shen J, Chen Z, Xu C, Wang H (2017) Polarization and solar altitude correlation analysis and application in object detection. Progress Inform Comput (PIC), 2017 Int Conf: 179–183
Shi Y, Gao Y, Wang R, Zhang Y, Wang D (2013) Transductive cost-sensitive lung cancer image classification. Appl Intell 38(1):16–28
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(11):45–66
Tüysüzoğlu G, Yaslan Y (2018) Sparse coding based classifier ensembles in supervised and active learning scenarios for data classification. Expert Syst Appl 91:364–373
Vera-Candeas P, Ruiz-Reyes N, Rosa-Zurera M, Martinez-Munoz D, López-Ferreras F (2004) Transient modeling by matching pursuits with a wavelet dictionary for parametric audio coding. IEEE Signal Process Lett 11(3):349–352
Wang Y (2008) A tree-based multi-class SVM classifier for digital library document. MultiMedia and information technology, 2008. MMIT'08. International conference on: 15–18
Wang C-Y, Wang J-C, Santoso A, Chiang C-C, Wu C-H (2017) Sound event recognition using auditory-receptive-field binary pattern and hierarchical-diving deep belief network. IEEE/ACM Trans Audio, Speech, Language Process: 1–16
Wang R, Ji W, Liu M, Wang X, Weng J, Deng S, Gao S, Yuan C-A (2018) Review on mining data from multiple data sources. Pattern Recogn Lett
Ye J, Kobayashi T, Murakawa M (2017) Urban sound event classification based on local and global features aggregation. Appl Acoust 117:246–256
Zhang J, Yuan H (2014) A Certainty-based active learning framework of meeting speech summarization. Computer Engineering and Networking: 235–242
Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient knn classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn Syst
Zhao S, Heittola T, Virtanen T (2017) Active learning for sound event classification by clustering unlabeled data. Acoust Speech Signal Process (ICASSP), 2017 IEEE Int Conf : 751–755
Zhao S, Heittola T, Virtanen T (2017) Learning vocal mode classifiers from heterogeneous data sources. Applications of signal processing to audio and acoustics (WASPAA), 2017 IEEE workshop: 16–20
Zhu X, Zhang S, Hu R, Zhu Y (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529
Acknowledgements
This work is partially supported by the National Natural Science Foundation of Guangxi under Grant (2016GXNSFAA380209, 2014GXNSFDA118037), the Natural Science Foundation of Zhejiang Province (No. LY18F010008), the “BAGUI Scholar” Program of Guangxi Zhuang Autonomous Region of China, the project of Scientific Research and Technology Development (AB16380272, AA18118047) in Guangxi, and the project of Scientific Research and Technology Development (#20175177) in Guangxi Nanning.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qin, X., Ji, W., Wang, R. et al. Learnt dictionary based active learning method for environmental sound event tagging. Multimed Tools Appl 78, 29493–29508 (2019). https://doi.org/10.1007/s11042-018-7139-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-7139-2