Skip to main content
Log in

Learnt dictionary based active learning method for environmental sound event tagging

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Sound event tagging is a process that adds texts or labels to sound segments based on their salient features and/or annotations. In the real world, since annotating cost is much expensive, tagged sound segments are limited, while untagged sound segments can be obtained easily and inexpensively. Thus, semi-automatic tagging becomes very important, which can assign labels to massive untagged sound segments according to a small number of manually annotated sound segments. Active learning is an effective technique to solve this problem, in which selected sound segments are manually tagged while other sound segments are automatically tagged. In this paper, a learnt dictionary based active learning method is proposed for environmental sound event tagging, which can significantly reduce the annotating cost in the process of semi-automatic tagging. The proposed method is based on a learnt dictionary, as dictionary learning is more adapt to sound feature extraction. Moreover, tagging accuracy and annotating cost are used to measure the performance of the proposed method. Experimental results demonstrate that the proposed method has higher tagging accuracy but requires much less annotating cost than other existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

References

  1. Aharon M, Elad M, Bruckstein A (2006) Rm k-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322

    Article  MATH  Google Scholar 

  2. Biljana L, Stojkoska R, Kire V (2017) Trivodaliev. a review of internet of things for smart home: challenges and solutions. J Clean Prod 140:1454–1464

    Article  Google Scholar 

  3. Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning. IEEE Trans Neural Netw 20(3):542–542

    Article  Google Scholar 

  4. Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159

    Article  MathSciNet  MATH  Google Scholar 

  5. Chu S, Narayanan S, Jay Kuo C-C (2009) Environmental sound recognition with time-frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158

    Article  Google Scholar 

  6. Duan S, Zhang J, Roe P, Towsey M (2014) A survey of tagging techniques for music, speech and environmental sound. Artif Intell Rev 42(4):637–661

    Article  Google Scholar 

  7. Engan K, Aase SO, Hakon Husoy J (1999) Method of optimal directions for frame design. Acoust Speech Sign Process 1999 Proc 1999 IEEE Int Conf 5:2443–2446

    Article  Google Scholar 

  8. Fleury A, Noury N, Vacher M, Glasson H, Seri JF (2008) Sound and speech detection and classification in a health smart home. Proc IEEE Int Conf Eng Med Biol Soc: 4644–4647

  9. Foggia P, Petkov N, Saggese A, Strisciuglio N, Vento M (2016) Audio surveillance of roads: a system for detecting anomalous sounds. IEEE Trans Intell Transp Syst 17(1):279–288

    Article  Google Scholar 

  10. Gadde A, Anis A, Ortega A (2014) Active semi-supervised learning using sampling theory for graph signals. Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining: 492–501

  11. Han W, Coutinho E, Ruan H, Li H, Schuller B, Yu X, Zhu X (2016) Semi-supervised active learning for sound classification in hybrid learning environments. PLoS One 11(9):e0162075

    Article  Google Scholar 

  12. Jayalakshmi SL, Chandrakala S, Nedunchelian R (2018) Global statistical features-based approach for acoustic event detection. Appl Acoust 139:113–118

    Article  Google Scholar 

  13. Ji W, Wang R, Ma J (2018) Dictionary-based active learning method for sound event classification. Multimed Tools Appl

  14. Jin X, Han J (2011) K-medoids clustering. Encyclopedia of machine learning: 564–565

  15. Lewicki MS, Sejnowski TJ (2000) Learning overcomplete representations. Neural Comput 12(2):337–365

    Article  Google Scholar 

  16. Maijala P, Zhao S, Heittola T, Virtanen T (2018) Environmental noise monitoring using source classification in sensors. Appl Acoust 129:258–267

    Article  Google Scholar 

  17. Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415

    Article  MATH  Google Scholar 

  18. Mesaros A, Heittola T, Eronen A, Virtanen T (2010) Acoustic event detection in real life recordings. Signal Process Conf 2010 18th European: 1267–1271

  19. Morrison D, Wang R, De Silva LC (2005) Spoken affect classification using neural networks. Granular Comput, 2005 IEEE Int Conf: 583–586

  20. Morrison D, Wang R, De Silva LC, Xu WL (2005) Real-time spoken affect classification and its application in call-centres. Information technology and applications, 2005. ICITA 2005. Third international conference on 1:483–487

    Google Scholar 

  21. Ophir B, Lustig M, Elad M (2011) Multi-scale dictionary learning using wavelets. IEEE J Select Topics Signal Process 5(5):1014–1024

    Article  Google Scholar 

  22. Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341

    Article  Google Scholar 

  23. Pati YC, Rezaiifar R, Krishnaprasad PS (1993) Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. Signals, systems and computers, 1993. 1993 conference record of the twenty-seventh Asilomar conference on: 40–44

  24. Piczak KJ (2015) ESC: dataset for environmental sound classification. Proceedings of the 23rd ACM international conference on Multimedia: 1015–1018

  25. Ren J, Jiang X, Yuan J, Magnenat-Thalmann N (2017) Sound-event classification using robust texture features for robot hearing. IEEE Trans Multimed 19(3):447–458

    Article  Google Scholar 

  26. Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. Proceedings of the 22nd ACM international conference on Multimedia: 1041–1044

  27. Schröder J, Anemiiller J, Goetze S (2016) Classification of human cough signals using spectro-temporal Gabor filterbank features. Acoustics, speech and signal processing (ICASSP), 2016 IEEE international conference on: 6455–6459

  28. Sharan RV, Moir TJ (2017) Robust acoustic event classification using deep neural networks. Inf Sci 396:24–32

    Article  Google Scholar 

  29. Shen J, Chen Z, Xu C, Wang H (2017) Polarization and solar altitude correlation analysis and application in object detection. Progress Inform Comput (PIC), 2017 Int Conf: 179–183

  30. Shi Y, Gao Y, Wang R, Zhang Y, Wang D (2013) Transductive cost-sensitive lung cancer image classification. Appl Intell 38(1):16–28

    Article  Google Scholar 

  31. Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(11):45–66

    MATH  Google Scholar 

  32. Tüysüzoğlu G, Yaslan Y (2018) Sparse coding based classifier ensembles in supervised and active learning scenarios for data classification. Expert Syst Appl 91:364–373

    Article  Google Scholar 

  33. Vera-Candeas P, Ruiz-Reyes N, Rosa-Zurera M, Martinez-Munoz D, López-Ferreras F (2004) Transient modeling by matching pursuits with a wavelet dictionary for parametric audio coding. IEEE Signal Process Lett 11(3):349–352

    Article  Google Scholar 

  34. Wang Y (2008) A tree-based multi-class SVM classifier for digital library document. MultiMedia and information technology, 2008. MMIT'08. International conference on: 15–18

  35. Wang C-Y, Wang J-C, Santoso A, Chiang C-C, Wu C-H (2017) Sound event recognition using auditory-receptive-field binary pattern and hierarchical-diving deep belief network. IEEE/ACM Trans Audio, Speech, Language Process: 1–16

  36. Wang R, Ji W, Liu M, Wang X, Weng J, Deng S, Gao S, Yuan C-A (2018) Review on mining data from multiple data sources. Pattern Recogn Lett

  37. Ye J, Kobayashi T, Murakawa M (2017) Urban sound event classification based on local and global features aggregation. Appl Acoust 117:246–256

    Article  Google Scholar 

  38. Zhang J, Yuan H (2014) A Certainty-based active learning framework of meeting speech summarization. Computer Engineering and Networking: 235–242

  39. Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient knn classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn Syst

  40. Zhao S, Heittola T, Virtanen T (2017) Active learning for sound event classification by clustering unlabeled data. Acoust Speech Signal Process (ICASSP), 2017 IEEE Int Conf : 751–755

  41. Zhao S, Heittola T, Virtanen T (2017) Learning vocal mode classifiers from heterogeneous data sources. Applications of signal processing to audio and acoustics (WASPAA), 2017 IEEE workshop: 16–20

  42. Zhu X, Zhang S, Hu R, Zhu Y (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529

    Article  Google Scholar 

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of Guangxi under Grant (2016GXNSFAA380209, 2014GXNSFDA118037), the Natural Science Foundation of Zhejiang Province (No. LY18F010008), the “BAGUI Scholar” Program of Guangxi Zhuang Autonomous Region of China, the project of Scientific Research and Technology Development (AB16380272, AA18118047) in Guangxi, and the project of Scientific Research and Technology Development (#20175177) in Guangxi Nanning.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wanting Ji.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, X., Ji, W., Wang, R. et al. Learnt dictionary based active learning method for environmental sound event tagging. Multimed Tools Appl 78, 29493–29508 (2019). https://doi.org/10.1007/s11042-018-7139-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-7139-2

Keywords

Navigation