Abstract
A good representation of the audio is important for music genre classification. Deep neural networks (DNN) enable a better approach to learn the representation of audio. The representation learned from DNN, which is known as bottleneck feature, is widely used for speech and audio related application. However, in general, it needs a large amount of transcribed data to learn an effective bottleneck feature extractor. While, in reality, the amount of transcribed data is often limited. In this paper, we investigate a semi-supervised learning to train the bottleneck feature for music data. Then, the bottleneck feature is used for music genre classification. Since the target dataset contains few data, which cannot be used train a reliable bottleneck DNN, we train the DNN bottleneck extractor on a large out-of-domain un-transcribed dataset in semi-supervised way. Experimental results show that with the learned bottleneck feature, the proposed system can perform better than the state-of-the-art best methods on GTZAN dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, T., Ogihara, M., Li, Q.: A comparative study on content-based musicgenre classification. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR 2003, pp. 282–289. ACM, New York (2003)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Mikolov, T., Zweig, G.: Context dependent recurrent neural network language model. In: SLT, pp. 234–239 (2012)
Mcloughlin, I., Zhang, H., Xie, Z., Song, Y., Xiao, W.: Robust sound event classification using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 540–552 (2015)
Yang, X., Chen, Q., Zhou, S., Wang, X.: Deep belief networks for automatic music genre classification ntM, vol. 92, no. 11, pp. 2433–2436 (2011)
Lu, L., Renals, S.: Probabilistic linear discriminant analysis with bottleneck feature for speech recognition. In: Interspeech (2014)
Sigtia, S., Dixon, S.: Improved music feature learning with deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6959–6963 (2014)
Andén, J., Mallat, S.: Deep scattering spectrum, CoRR abs/1304.6763 (2013)
Nguyen, Q.B., Gehring, J., Kilgour, K., Waibel, A.: Optimizing deep bottleneck feature extraction. In: IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), pp. 152–156 (2013)
Liu, D., Wei, S., Guo, W., Bao, Y., Xiong, S., Dai, L.: Lattice based optimization of bottleneck feature extractor with linear transformation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5617–5621 (2014)
Vu, N.T., Weiner, J., Schultz, T.: Investigating the learning effect of multilingual bottle-neck feature for ASR. In: Interspeech (2014)
Li, J., Zheng, R., Xu, B.: Investigation of cross-lingual bottleneck feature in hybrid ASR system. In: Interspeech (2014)
Do, V.H., Xiao, X., Chng, E.S., Li, H.: Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR. In: Interspeech (2014)
Zhang, Y., Chuangsuwanich, E., Glass, J.: Language id-based training of multilingual stacked bottleneck features. In: Interspeech (2014)
Xu, H., Su, H., Chng, E.S., Haizhou, L.: Semi-supervised training for bottleneck feature based DNN-HMM hybrid system. In: Interspeech (2014)
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)
Dai, J., Liu, W., Ni, C., Dong, L., Yang, H.: Multilingual deep neural network for music genre classification. In: Interspeech (2015)
Eck, D., Montréal, U.D.: Learning features from music audio with deep belief networks. In: International Society for Music Information Retrieval Conference (ISMIR) (2010)
Henaff, M., Jarrett, K., Kavukcuoglu, K., Lecun, Y.: Unsupervised learning of sparse features for scalable audio classification. In: International Society for Music Information Retrieval Conference (ISMIR) (2011)
Anden, J., Mallat, S.: Multiscale scattering for audio classification. In: International Society for Music Information Retrieval Conference (ISMIR), pp. 657–662 (2011)
scattering. http://www.di.ens.fr/data/scattering/
Kaldi. http://kaldi.sourceforge.net
Panagakis, Y., Kotropoulos, C., Arce, G.R.: Music genre classification using locality preserving non-negative tensor factorization and sparse representations. In: International Society for Music Information Retrieval Conference (ISMIR), pp. 249–254 (2009)
Lee, C.H., Shih, J.L., Yu, K.M., Lin, H.S.: Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Trans. Multimedia 11(4), 670–682 (2009)
Acknowledgements
This research was supported by following two parts: The China National Nature Science Foundation (No. 61573357, No. 61503382No. 61403370, No. 61273267 and No. 91120303, No. 61305027), and technical development project of state grid corporation of China entitled machine learning based Research and application of key technology for multi-media recognition and stream processing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dai, J., Liu, W., Zheng, H., Xue, W., Ni, C. (2016). Semi-supervised Learning of Bottleneck Feature for Music Genre Classification. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_45
Download citation
DOI: https://doi.org/10.1007/978-981-10-3005-5_45
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)