Auditory Scene Classification with Deep Belief Network

Xue, Like; Su, Feng

doi:10.1007/978-3-319-14445-0_30

Like Xue¹⁹ &
Feng Su¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8935))

Included in the following conference series:

International Conference on Multimedia Modeling

3776 Accesses
3 Citations

Abstract

Effective modeling and analyzing of an auditory scene is crucial to many context-aware and content-based multimedia applications. In this paper, we explore the effectiveness of the multiple-layer generative deep neural network model in discovering the underlying higher level and highly non-linear probabilistic representations from acoustic data of the unstructured auditory scenes. We first create a more compact and representative description of the input audio clip by focusing on the salient regions of data and modeling their contextual correlations. Next, we exploit deep belief network (DBN) to unsupervisedly discover and generate the high-level descriptions of scene audio as the activations of units on higher hidden layers of the trained DBN model, which are finally classified to certain category of scene by either the discriminative output layer of DBN or a separate classifier like support vector machine (SVM). The experiment reveals the effectiveness of the proposed DBN-based classification approach for auditory scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auditory scene analysis & classification dataset NJU-AS-10, http://cs.nju.edu.cn/sufeng/data/audioscene/
Baluja, S., Covell, M.: Audio fingerprinting: Combining computer vision & data stream processing. In: ICASSP 2007, pp. 213–216 (April 2007)
Google Scholar
Cai, R., Lu, L., Hanjalic, A., Zhang, H., Cai, L.: A flexible framework for key audio effects detection and auditory context inference. IEEE TASLP 14(3), 1026–1039 (2006)
Google Scholar
Chu, S., Narayanan, S., Kuo, C.C.J.: Environmental sound recognition with time-frequency audio features. IEEE TASLP 17(6), 1142–1158 (2009)
Google Scholar
Chu, W., Cheng, W., Wu, J.: Generative and discriminative modeling toward semantic context detection in audio tracks. In: MMM 2005, pp. 38–45 (2005)
Google Scholar
Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: An overview. In: ICASSP 2013, pp. 8599–8603 (2013)
Google Scholar
Eronen, A.J., Peltonen, V.T., Tuomi, J.T., Klapuri, A.P., Fagerlund, S., Sorsa, T., Lorho, G., Huopaniemi, J.: Audio-based context recognition. IEEE TASLP 14(1), 321–329 (2006)
Google Scholar
Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: ISMIR 2010, pp. 339–344 (2010)
Google Scholar
Han, B., Hwang, E.: Environmental sound classification based on feature collaboration. In: ICME 2009, pp. 542–545 (2009)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82–97 (2012)
Article Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)
Article MATH MathSciNet Google Scholar
van der Maaten, L., Hinton, G.: Visualizing high-dimensional data using t-sne. JMLR 9, 2579–2605 (2008)
MATH Google Scholar
Rahman Mohamed, A., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE TASLP 20(1), 745–770 (2012)
Google Scholar
Rahman Mohamed, A., Sainath, T.N., Dahl, G., Ramabhadran, B., Hinton, G.E., Picheny, M.A.: Deep belief networks using discriminative features for phone recognition. In: ICASSP 2011, pp. 5060–5063 (2011)
Google Scholar
Su, F., Yang, L., Lu, T., Wang, G.: Environmental sound classification for scene recognition using local discriminant bases and HMM. In: ACM Multimedia 2011, pp. 1389–1392 (2011)
Google Scholar
Wang, J., Wang, J., He, K., Hsu, C.: Environmental sound classification using hybrid SVM/KNN classifier and mpeg-7 audio low-level descriptor. In: IJCNN 2006, pp. 1731–1735 (2006)
Google Scholar
Yang, L., Su, F.: Auditory context classification using random forests. In: ICASSP 2012, pp. 2349–2352 (2012)
Google Scholar
Zhang, X.L., Wu, J.: Deep belief networks based voice activity detection. IEEE TASLP 21(4), 697–710 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Like Xue & Feng Su

Authors

Like Xue
View author publications
You can also search for this author in PubMed Google Scholar
Feng Su
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Technology, P.O. Box 123, 2007, Sydney, NSW, Australia
Xiangjian He , Dacheng Tao & Muhammad Abul Hasan , &
University of Newcastle, University Dr, Callaghan, 2308, NSW, Australia
Suhuai Luo
National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95, Zhongguancun East Road, 100190, Beijing, P.R. China
Changsheng Xu
Shanghai Jitotong University, 800 Dong Chuan Rd, 200240, Shanghai, China
Jie Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, L., Su, F. (2015). Auditory Scene Classification with Deep Belief Network. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8935. Springer, Cham. https://doi.org/10.1007/978-3-319-14445-0_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-14445-0_30
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14444-3
Online ISBN: 978-3-319-14445-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics