Skip to main content

Auditory Scene Classification with Deep Belief Network

  • Conference paper
MultiMedia Modeling (MMM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8935))

Included in the following conference series:

Abstract

Effective modeling and analyzing of an auditory scene is crucial to many context-aware and content-based multimedia applications. In this paper, we explore the effectiveness of the multiple-layer generative deep neural network model in discovering the underlying higher level and highly non-linear probabilistic representations from acoustic data of the unstructured auditory scenes. We first create a more compact and representative description of the input audio clip by focusing on the salient regions of data and modeling their contextual correlations. Next, we exploit deep belief network (DBN) to unsupervisedly discover and generate the high-level descriptions of scene audio as the activations of units on higher hidden layers of the trained DBN model, which are finally classified to certain category of scene by either the discriminative output layer of DBN or a separate classifier like support vector machine (SVM). The experiment reveals the effectiveness of the proposed DBN-based classification approach for auditory scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auditory scene analysis & classification dataset NJU-AS-10, http://cs.nju.edu.cn/sufeng/data/audioscene/

  2. Baluja, S., Covell, M.: Audio fingerprinting: Combining computer vision & data stream processing. In: ICASSP 2007, pp. 213–216 (April 2007)

    Google Scholar 

  3. Cai, R., Lu, L., Hanjalic, A., Zhang, H., Cai, L.: A flexible framework for key audio effects detection and auditory context inference. IEEE TASLP 14(3), 1026–1039 (2006)

    Google Scholar 

  4. Chu, S., Narayanan, S., Kuo, C.C.J.: Environmental sound recognition with time-frequency audio features. IEEE TASLP 17(6), 1142–1158 (2009)

    Google Scholar 

  5. Chu, W., Cheng, W., Wu, J.: Generative and discriminative modeling toward semantic context detection in audio tracks. In: MMM 2005, pp. 38–45 (2005)

    Google Scholar 

  6. Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: An overview. In: ICASSP 2013, pp. 8599–8603 (2013)

    Google Scholar 

  7. Eronen, A.J., Peltonen, V.T., Tuomi, J.T., Klapuri, A.P., Fagerlund, S., Sorsa, T., Lorho, G., Huopaniemi, J.: Audio-based context recognition. IEEE TASLP 14(1), 321–329 (2006)

    Google Scholar 

  8. Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: ISMIR 2010, pp. 339–344 (2010)

    Google Scholar 

  9. Han, B., Hwang, E.: Environmental sound classification based on feature collaboration. In: ICME 2009, pp. 542–545 (2009)

    Google Scholar 

  10. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82–97 (2012)

    Article  Google Scholar 

  11. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  12. van der Maaten, L., Hinton, G.: Visualizing high-dimensional data using t-sne. JMLR 9, 2579–2605 (2008)

    MATH  Google Scholar 

  13. Rahman Mohamed, A., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE TASLP 20(1), 745–770 (2012)

    Google Scholar 

  14. Rahman Mohamed, A., Sainath, T.N., Dahl, G., Ramabhadran, B., Hinton, G.E., Picheny, M.A.: Deep belief networks using discriminative features for phone recognition. In: ICASSP 2011, pp. 5060–5063 (2011)

    Google Scholar 

  15. Su, F., Yang, L., Lu, T., Wang, G.: Environmental sound classification for scene recognition using local discriminant bases and HMM. In: ACM Multimedia 2011, pp. 1389–1392 (2011)

    Google Scholar 

  16. Wang, J., Wang, J., He, K., Hsu, C.: Environmental sound classification using hybrid SVM/KNN classifier and mpeg-7 audio low-level descriptor. In: IJCNN 2006, pp. 1731–1735 (2006)

    Google Scholar 

  17. Yang, L., Su, F.: Auditory context classification using random forests. In: ICASSP 2012, pp. 2349–2352 (2012)

    Google Scholar 

  18. Zhang, X.L., Wu, J.: Deep belief networks based voice activity detection. IEEE TASLP 21(4), 697–710 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Xue, L., Su, F. (2015). Auditory Scene Classification with Deep Belief Network. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8935. Springer, Cham. https://doi.org/10.1007/978-3-319-14445-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14445-0_30

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14444-3

  • Online ISBN: 978-3-319-14445-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics