Semi-supervised Learning of Bottleneck Feature for Music Genre Classification

Dai, Jia; Liu, Wenju; Zheng, Hao; Xue, Wei; Ni, Chongjia

doi:10.1007/978-981-10-3005-5_45

Jia Dai^16,17,
Wenju Liu^16,17,
Hao Zheng^16,17,
Wei Xue^16,17 &
…
Chongjia Ni¹⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Included in the following conference series:

Chinese Conference on Pattern Recognition

2335 Accesses

Abstract

A good representation of the audio is important for music genre classification. Deep neural networks (DNN) enable a better approach to learn the representation of audio. The representation learned from DNN, which is known as bottleneck feature, is widely used for speech and audio related application. However, in general, it needs a large amount of transcribed data to learn an effective bottleneck feature extractor. While, in reality, the amount of transcribed data is often limited. In this paper, we investigate a semi-supervised learning to train the bottleneck feature for music data. Then, the bottleneck feature is used for music genre classification. Since the target dataset contains few data, which cannot be used train a reliable bottleneck DNN, we train the DNN bottleneck extractor on a large out-of-domain un-transcribed dataset in semi-supervised way. Experimental results show that with the learned bottleneck feature, the proposed system can perform better than the state-of-the-art best methods on GTZAN dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bottleneck Based Front-End for Diarization Systems

Bottleneck Feature Extraction in Punjabi Adult Speech Recognition System

Bottleneck Feature-Based Hybrid Deep Autoencoder Approach for Indian Language Identification

Article 04 March 2020

References

Li, T., Ogihara, M., Li, Q.: A comparative study on content-based musicgenre classification. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR 2003, pp. 282–289. ACM, New York (2003)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Mikolov, T., Zweig, G.: Context dependent recurrent neural network language model. In: SLT, pp. 234–239 (2012)
Google Scholar
Mcloughlin, I., Zhang, H., Xie, Z., Song, Y., Xiao, W.: Robust sound event classification using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 540–552 (2015)
Article Google Scholar
Yang, X., Chen, Q., Zhou, S., Wang, X.: Deep belief networks for automatic music genre classification ntM, vol. 92, no. 11, pp. 2433–2436 (2011)
Google Scholar
Lu, L., Renals, S.: Probabilistic linear discriminant analysis with bottleneck feature for speech recognition. In: Interspeech (2014)
Google Scholar
Sigtia, S., Dixon, S.: Improved music feature learning with deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6959–6963 (2014)
Google Scholar
Andén, J., Mallat, S.: Deep scattering spectrum, CoRR abs/1304.6763 (2013)
Google Scholar
Nguyen, Q.B., Gehring, J., Kilgour, K., Waibel, A.: Optimizing deep bottleneck feature extraction. In: IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), pp. 152–156 (2013)
Google Scholar
Liu, D., Wei, S., Guo, W., Bao, Y., Xiong, S., Dai, L.: Lattice based optimization of bottleneck feature extractor with linear transformation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5617–5621 (2014)
Google Scholar
Vu, N.T., Weiner, J., Schultz, T.: Investigating the learning effect of multilingual bottle-neck feature for ASR. In: Interspeech (2014)
Google Scholar
Li, J., Zheng, R., Xu, B.: Investigation of cross-lingual bottleneck feature in hybrid ASR system. In: Interspeech (2014)
Google Scholar
Do, V.H., Xiao, X., Chng, E.S., Li, H.: Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR. In: Interspeech (2014)
Google Scholar
Zhang, Y., Chuangsuwanich, E., Glass, J.: Language id-based training of multilingual stacked bottleneck features. In: Interspeech (2014)
Google Scholar
Xu, H., Su, H., Chng, E.S., Haizhou, L.: Semi-supervised training for bottleneck feature based DNN-HMM hybrid system. In: Interspeech (2014)
Google Scholar
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)
Article Google Scholar
Dai, J., Liu, W., Ni, C., Dong, L., Yang, H.: Multilingual deep neural network for music genre classification. In: Interspeech (2015)
Google Scholar
Eck, D., Montréal, U.D.: Learning features from music audio with deep belief networks. In: International Society for Music Information Retrieval Conference (ISMIR) (2010)
Google Scholar
Henaff, M., Jarrett, K., Kavukcuoglu, K., Lecun, Y.: Unsupervised learning of sparse features for scalable audio classification. In: International Society for Music Information Retrieval Conference (ISMIR) (2011)
Google Scholar
Ismir. http://ismir2004.ismir.net/genre_contest/index.html
Anden, J., Mallat, S.: Multiscale scattering for audio classification. In: International Society for Music Information Retrieval Conference (ISMIR), pp. 657–662 (2011)
Google Scholar
scattering. http://www.di.ens.fr/data/scattering/
Kaldi. http://kaldi.sourceforge.net
Panagakis, Y., Kotropoulos, C., Arce, G.R.: Music genre classification using locality preserving non-negative tensor factorization and sparse representations. In: International Society for Music Information Retrieval Conference (ISMIR), pp. 249–254 (2009)
Google Scholar
Lee, C.H., Shih, J.L., Yu, K.M., Lin, H.S.: Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Trans. Multimedia 11(4), 670–682 (2009)
Article Google Scholar

Download references

Acknowledgements

This research was supported by following two parts: The China National Nature Science Foundation (No. 61573357, No. 61503382No. 61403370, No. 61273267 and No. 91120303, No. 61305027), and technical development project of state grid corporation of China entitled machine learning based Research and application of key technology for multi-media recognition and stream processing.

Author information

Authors and Affiliations

NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Jia Dai, Wenju Liu, Hao Zheng & Wei Xue
University of Chinese Academy of Sciences, Beijing, China
Jia Dai, Wenju Liu, Hao Zheng & Wei Xue
School of Mathematic and Quantitative Economics, Shandong University of Finance and Economics, Shandong, China
Chongjia Ni

Authors

Jia Dai
View author publications
You can also search for this author in PubMed Google Scholar
Wenju Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xue
View author publications
You can also search for this author in PubMed Google Scholar
Chongjia Ni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenju Liu .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an, China
Xuelong Li
Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China
Xilin Chen
Tsinghua University , Beijing, China
Jie Zhou
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
University of Electronic Science and Technology, Chengdu, Sichuan, China
Hong Cheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dai, J., Liu, W., Zheng, H., Xue, W., Ni, C. (2016). Semi-supervised Learning of Bottleneck Feature for Music Genre Classification. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_45

Download citation

DOI: https://doi.org/10.1007/978-981-10-3005-5_45
Published: 22 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics