Skip to main content

Semi-supervised Learning of Bottleneck Feature for Music Genre Classification

  • Conference paper
  • First Online:
  • 2282 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Abstract

A good representation of the audio is important for music genre classification. Deep neural networks (DNN) enable a better approach to learn the representation of audio. The representation learned from DNN, which is known as bottleneck feature, is widely used for speech and audio related application. However, in general, it needs a large amount of transcribed data to learn an effective bottleneck feature extractor. While, in reality, the amount of transcribed data is often limited. In this paper, we investigate a semi-supervised learning to train the bottleneck feature for music data. Then, the bottleneck feature is used for music genre classification. Since the target dataset contains few data, which cannot be used train a reliable bottleneck DNN, we train the DNN bottleneck extractor on a large out-of-domain un-transcribed dataset in semi-supervised way. Experimental results show that with the learned bottleneck feature, the proposed system can perform better than the state-of-the-art best methods on GTZAN dataset.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Li, T., Ogihara, M., Li, Q.: A comparative study on content-based musicgenre classification. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR 2003, pp. 282–289. ACM, New York (2003)

    Google Scholar 

  2. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  3. Mikolov, T., Zweig, G.: Context dependent recurrent neural network language model. In: SLT, pp. 234–239 (2012)

    Google Scholar 

  4. Mcloughlin, I., Zhang, H., Xie, Z., Song, Y., Xiao, W.: Robust sound event classification using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 540–552 (2015)

    Article  Google Scholar 

  5. Yang, X., Chen, Q., Zhou, S., Wang, X.: Deep belief networks for automatic music genre classification ntM, vol. 92, no. 11, pp. 2433–2436 (2011)

    Google Scholar 

  6. Lu, L., Renals, S.: Probabilistic linear discriminant analysis with bottleneck feature for speech recognition. In: Interspeech (2014)

    Google Scholar 

  7. Sigtia, S., Dixon, S.: Improved music feature learning with deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6959–6963 (2014)

    Google Scholar 

  8. Andén, J., Mallat, S.: Deep scattering spectrum, CoRR abs/1304.6763 (2013)

    Google Scholar 

  9. Nguyen, Q.B., Gehring, J., Kilgour, K., Waibel, A.: Optimizing deep bottleneck feature extraction. In: IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), pp. 152–156 (2013)

    Google Scholar 

  10. Liu, D., Wei, S., Guo, W., Bao, Y., Xiong, S., Dai, L.: Lattice based optimization of bottleneck feature extractor with linear transformation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5617–5621 (2014)

    Google Scholar 

  11. Vu, N.T., Weiner, J., Schultz, T.: Investigating the learning effect of multilingual bottle-neck feature for ASR. In: Interspeech (2014)

    Google Scholar 

  12. Li, J., Zheng, R., Xu, B.: Investigation of cross-lingual bottleneck feature in hybrid ASR system. In: Interspeech (2014)

    Google Scholar 

  13. Do, V.H., Xiao, X., Chng, E.S., Li, H.: Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR. In: Interspeech (2014)

    Google Scholar 

  14. Zhang, Y., Chuangsuwanich, E., Glass, J.: Language id-based training of multilingual stacked bottleneck features. In: Interspeech (2014)

    Google Scholar 

  15. Xu, H., Su, H., Chng, E.S., Haizhou, L.: Semi-supervised training for bottleneck feature based DNN-HMM hybrid system. In: Interspeech (2014)

    Google Scholar 

  16. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)

    Article  Google Scholar 

  17. Dai, J., Liu, W., Ni, C., Dong, L., Yang, H.: Multilingual deep neural network for music genre classification. In: Interspeech (2015)

    Google Scholar 

  18. Eck, D., Montréal, U.D.: Learning features from music audio with deep belief networks. In: International Society for Music Information Retrieval Conference (ISMIR) (2010)

    Google Scholar 

  19. Henaff, M., Jarrett, K., Kavukcuoglu, K., Lecun, Y.: Unsupervised learning of sparse features for scalable audio classification. In: International Society for Music Information Retrieval Conference (ISMIR) (2011)

    Google Scholar 

  20. Ismir. http://ismir2004.ismir.net/genre_contest/index.html

  21. Anden, J., Mallat, S.: Multiscale scattering for audio classification. In: International Society for Music Information Retrieval Conference (ISMIR), pp. 657–662 (2011)

    Google Scholar 

  22. scattering. http://www.di.ens.fr/data/scattering/

  23. Kaldi. http://kaldi.sourceforge.net

  24. Panagakis, Y., Kotropoulos, C., Arce, G.R.: Music genre classification using locality preserving non-negative tensor factorization and sparse representations. In: International Society for Music Information Retrieval Conference (ISMIR), pp. 249–254 (2009)

    Google Scholar 

  25. Lee, C.H., Shih, J.L., Yu, K.M., Lin, H.S.: Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Trans. Multimedia 11(4), 670–682 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by following two parts: The China National Nature Science Foundation (No. 61573357, No. 61503382No. 61403370, No. 61273267 and No. 91120303, No. 61305027), and technical development project of state grid corporation of China entitled machine learning based Research and application of key technology for multi-media recognition and stream processing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenju Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Dai, J., Liu, W., Zheng, H., Xue, W., Ni, C. (2016). Semi-supervised Learning of Bottleneck Feature for Music Genre Classification. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_45

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3005-5_45

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3004-8

  • Online ISBN: 978-981-10-3005-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics