Learnt dictionary based active learning method for environmental sound event tagging

Qin, Xiao; Ji, Wanting; Wang, Ruili; Yuan, ChangAn

doi:10.1007/s11042-018-7139-2

Learnt dictionary based active learning method for environmental sound event tagging

Published: 09 January 2019

Volume 78, pages 29493–29508, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xiao Qin¹,
Wanting Ji^2,3,
Ruili Wang^2,3 &
…
ChangAn Yuan¹

338 Accesses
Explore all metrics

Abstract

Sound event tagging is a process that adds texts or labels to sound segments based on their salient features and/or annotations. In the real world, since annotating cost is much expensive, tagged sound segments are limited, while untagged sound segments can be obtained easily and inexpensively. Thus, semi-automatic tagging becomes very important, which can assign labels to massive untagged sound segments according to a small number of manually annotated sound segments. Active learning is an effective technique to solve this problem, in which selected sound segments are manually tagged while other sound segments are automatically tagged. In this paper, a learnt dictionary based active learning method is proposed for environmental sound event tagging, which can significantly reduce the annotating cost in the process of semi-automatic tagging. The proposed method is based on a learnt dictionary, as dictionary learning is more adapt to sound feature extraction. Moreover, tagging accuracy and annotating cost are used to measure the performance of the proposed method. Experimental results demonstrate that the proposed method has higher tagging accuracy but requires much less annotating cost than other existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aharon M, Elad M, Bruckstein A (2006) Rm k-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Article MATH Google Scholar
Biljana L, Stojkoska R, Kire V (2017) Trivodaliev. a review of internet of things for smart home: challenges and solutions. J Clean Prod 140:1454–1464
Article Google Scholar
Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning. IEEE Trans Neural Netw 20(3):542–542
Article Google Scholar
Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159
Article MathSciNet MATH Google Scholar
Chu S, Narayanan S, Jay Kuo C-C (2009) Environmental sound recognition with time-frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158
Article Google Scholar
Duan S, Zhang J, Roe P, Towsey M (2014) A survey of tagging techniques for music, speech and environmental sound. Artif Intell Rev 42(4):637–661
Article Google Scholar
Engan K, Aase SO, Hakon Husoy J (1999) Method of optimal directions for frame design. Acoust Speech Sign Process 1999 Proc 1999 IEEE Int Conf 5:2443–2446
Article Google Scholar
Fleury A, Noury N, Vacher M, Glasson H, Seri JF (2008) Sound and speech detection and classification in a health smart home. Proc IEEE Int Conf Eng Med Biol Soc: 4644–4647
Foggia P, Petkov N, Saggese A, Strisciuglio N, Vento M (2016) Audio surveillance of roads: a system for detecting anomalous sounds. IEEE Trans Intell Transp Syst 17(1):279–288
Article Google Scholar
Gadde A, Anis A, Ortega A (2014) Active semi-supervised learning using sampling theory for graph signals. Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining: 492–501
Han W, Coutinho E, Ruan H, Li H, Schuller B, Yu X, Zhu X (2016) Semi-supervised active learning for sound classification in hybrid learning environments. PLoS One 11(9):e0162075
Article Google Scholar
Jayalakshmi SL, Chandrakala S, Nedunchelian R (2018) Global statistical features-based approach for acoustic event detection. Appl Acoust 139:113–118
Article Google Scholar
Ji W, Wang R, Ma J (2018) Dictionary-based active learning method for sound event classification. Multimed Tools Appl
Jin X, Han J (2011) K-medoids clustering. Encyclopedia of machine learning: 564–565
Lewicki MS, Sejnowski TJ (2000) Learning overcomplete representations. Neural Comput 12(2):337–365
Article Google Scholar
Maijala P, Zhao S, Heittola T, Virtanen T (2018) Environmental noise monitoring using source classification in sensors. Appl Acoust 129:258–267
Article Google Scholar
Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415
Article MATH Google Scholar
Mesaros A, Heittola T, Eronen A, Virtanen T (2010) Acoustic event detection in real life recordings. Signal Process Conf 2010 18th European: 1267–1271
Morrison D, Wang R, De Silva LC (2005) Spoken affect classification using neural networks. Granular Comput, 2005 IEEE Int Conf: 583–586
Morrison D, Wang R, De Silva LC, Xu WL (2005) Real-time spoken affect classification and its application in call-centres. Information technology and applications, 2005. ICITA 2005. Third international conference on 1:483–487
Google Scholar
Ophir B, Lustig M, Elad M (2011) Multi-scale dictionary learning using wavelets. IEEE J Select Topics Signal Process 5(5):1014–1024
Article Google Scholar
Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341
Article Google Scholar
Pati YC, Rezaiifar R, Krishnaprasad PS (1993) Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. Signals, systems and computers, 1993. 1993 conference record of the twenty-seventh Asilomar conference on: 40–44
Piczak KJ (2015) ESC: dataset for environmental sound classification. Proceedings of the 23rd ACM international conference on Multimedia: 1015–1018
Ren J, Jiang X, Yuan J, Magnenat-Thalmann N (2017) Sound-event classification using robust texture features for robot hearing. IEEE Trans Multimed 19(3):447–458
Article Google Scholar
Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. Proceedings of the 22nd ACM international conference on Multimedia: 1041–1044
Schröder J, Anemiiller J, Goetze S (2016) Classification of human cough signals using spectro-temporal Gabor filterbank features. Acoustics, speech and signal processing (ICASSP), 2016 IEEE international conference on: 6455–6459
Sharan RV, Moir TJ (2017) Robust acoustic event classification using deep neural networks. Inf Sci 396:24–32
Article Google Scholar
Shen J, Chen Z, Xu C, Wang H (2017) Polarization and solar altitude correlation analysis and application in object detection. Progress Inform Comput (PIC), 2017 Int Conf: 179–183
Shi Y, Gao Y, Wang R, Zhang Y, Wang D (2013) Transductive cost-sensitive lung cancer image classification. Appl Intell 38(1):16–28
Article Google Scholar
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(11):45–66
MATH Google Scholar
Tüysüzoğlu G, Yaslan Y (2018) Sparse coding based classifier ensembles in supervised and active learning scenarios for data classification. Expert Syst Appl 91:364–373
Article Google Scholar
Vera-Candeas P, Ruiz-Reyes N, Rosa-Zurera M, Martinez-Munoz D, López-Ferreras F (2004) Transient modeling by matching pursuits with a wavelet dictionary for parametric audio coding. IEEE Signal Process Lett 11(3):349–352
Article Google Scholar
Wang Y (2008) A tree-based multi-class SVM classifier for digital library document. MultiMedia and information technology, 2008. MMIT'08. International conference on: 15–18
Wang C-Y, Wang J-C, Santoso A, Chiang C-C, Wu C-H (2017) Sound event recognition using auditory-receptive-field binary pattern and hierarchical-diving deep belief network. IEEE/ACM Trans Audio, Speech, Language Process: 1–16
Wang R, Ji W, Liu M, Wang X, Weng J, Deng S, Gao S, Yuan C-A (2018) Review on mining data from multiple data sources. Pattern Recogn Lett
Ye J, Kobayashi T, Murakawa M (2017) Urban sound event classification based on local and global features aggregation. Appl Acoust 117:246–256
Article Google Scholar
Zhang J, Yuan H (2014) A Certainty-based active learning framework of meeting speech summarization. Computer Engineering and Networking: 235–242
Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient knn classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn Syst
Zhao S, Heittola T, Virtanen T (2017) Active learning for sound event classification by clustering unlabeled data. Acoust Speech Signal Process (ICASSP), 2017 IEEE Int Conf : 751–755
Zhao S, Heittola T, Virtanen T (2017) Learning vocal mode classifiers from heterogeneous data sources. Applications of signal processing to audio and acoustics (WASPAA), 2017 IEEE workshop: 16–20
Zhu X, Zhang S, Hu R, Zhu Y (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529
Article Google Scholar

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of Guangxi under Grant (2016GXNSFAA380209, 2014GXNSFDA118037), the Natural Science Foundation of Zhejiang Province (No. LY18F010008), the “BAGUI Scholar” Program of Guangxi Zhuang Autonomous Region of China, the project of Scientific Research and Technology Development (AB16380272, AA18118047) in Guangxi, and the project of Scientific Research and Technology Development (#20175177) in Guangxi Nanning.

Author information

Authors and Affiliations

Nanning Normal University, Nanning, China
Xiao Qin & ChangAn Yuan
Zhejiang Gongshang University, Hangzhou, China
Wanting Ji & Ruili Wang
Massey University, Auckland, New Zealand
Wanting Ji & Ruili Wang

Authors

Xiao Qin
View author publications
You can also search for this author inPubMed Google Scholar
Wanting Ji
View author publications
You can also search for this author inPubMed Google Scholar
Ruili Wang
View author publications
You can also search for this author inPubMed Google Scholar
ChangAn Yuan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Wanting Ji.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, X., Ji, W., Wang, R. et al. Learnt dictionary based active learning method for environmental sound event tagging. Multimed Tools Appl 78, 29493–29508 (2019). https://doi.org/10.1007/s11042-018-7139-2

Download citation

Received: 01 July 2018
Revised: 12 November 2018
Accepted: 27 December 2018
Published: 09 January 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11042-018-7139-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learnt dictionary based active learning method for environmental sound event tagging

Abstract

Access this article

Subscribe and save

Buy Now

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now