A Bag-of-Tones Model with MFCC Features for Musical Genre Classification

Qin, Zengchang; Liu, Wei; Wan, Tao

doi:10.1007/978-3-642-53914-5_48

Zengchang Qin²⁵,
Wei Liu^25,26 &
Tao Wan^27,28

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8346))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2427 Accesses
5 Citations

Abstract

Musical genres are categorical labels created by humans to characterize pieces of music. These labels may be highly subjective but typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. In this paper, we propose a model for music genre classification. The new model is referred to as the bag-of-tones (BOT) model which follows the conceptually similar idea of the bag-of-words (BOW) model in natural language processing and the bag-of-feature (BOF) model in image processing. The basic low-level music features such as Mel-frequency cepstral coefficients (MFCC) are clustered into a set of codewords referred to as “tones”. By using such a model, each piece of music can be represented by a new feature vector of distribution on tones. Classical machine learning models such as support vector machines (SVM) can be applied for genre classification. The model is tested using two datasets. We found that the polynomial kernel function has the best performance in the SVM classification. By comparing to the previous work, we found the new proposed model outperform classical models on a given benchmark dataset. In general, this model can be used to structure the large collections of music available on the Web. It can play an important role in automatic digital music categorization and retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dannenberg, R.B., Thom, B., Watson, D.: A machine learning approach to musical style recognition. In: Proc. International Computer Music Conference (1997)
Google Scholar
Chai, W., Barry, V.: Folk music classification using hidden Markov models. In: Proceedings of International Conference on Artificial Intelligence, vol. 6 (2001)
Google Scholar
Shan, M.K., Kuo, F.-F.: Music style mining and classification by melody. IEICE Transactions on Information and Systems 86(3), 655–659 (2003)
Google Scholar
Matityaho, B., Furst, M.: Neural network based model for classification of music type. In: Eighteenth Convention of Electrical and Electronics Engineers in Israel. IEEE (1995)
Google Scholar
Han, K.-P., Park, Y.-S., Jeon, S.-G., Lee, G.-C.: Genre classification system of TV sound signals based on a spectrogram analysis. IEEE Transactions on Consumer Electronics 44(1), 33–42 (1998)
Article Google Scholar
Pye, D.: Content-based methods for the management of digital music. In: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6. IEEE (2000)
Google Scholar
Jiang, D.N., Lu, L., Zhang, H.J., Tao, J.-H.: Music type classification by spectral contrast feature. In: Proceedings of the IEEE International Conference on Multimedia and Expo, ICME 2002, vol. 1. IEEE (2002)
Google Scholar
Liu, N.H.: Comparison of content-based music recommendation using different distance estimation methods. Applied Intelligence 38(2), 160–174 (2013)
Article Google Scholar
Logan, B.: Mel frequency cepstral coefficients for music modeling. In: MUSIC IR (2000)
Google Scholar
Qin, Z., Thint, M., Huang, Z.: Ranking answers by hierarchical topic models. In: Chien, B.-C., Hong, T.-P., Chen, S.-M., Ali, M. (eds.) IEA/AIE 2009. LNCS (LNAI), vol. 5579, pp. 103–112. Springer, Heidelberg (2009)
Chapter Google Scholar
Zhao, Q., Qin, Z., Wan, T.: What is the Basic Semantic Unit of Chinese Language? A Computational Approach Based on Topic Models. In: Kanazawa, M., Kornai, A., Kracht, M., Seki, H. (eds.) MOL 12. LNCS (LNAI), vol. 6878, pp. 143–157. Springer, Heidelberg (2011)
Chapter Google Scholar
Zhao, Q., Qin, Z., Wan, T.: Topic modeling of Chinese language using character-word relations. In: Lu, B.-L., Zhang, L., Kwok, J. (eds.) ICONIP 2011, Part III. LNCS, vol. 7064, pp. 139–147. Springer, Heidelberg (2011)
Chapter Google Scholar
Yuan, X., Yu, J., Qin, Z., Wan, T.: A bag-of-features model with integrated SIFT-LBP features for content-based image retrieval. In: Proceedings of the International Conference on Image Processing, pp. 1061–1064 (2011)
Google Scholar
Yu, J., Qin, Z., Wan, T., Zhang, X.: Feature integration analysis of bag-of-features model for image retrieval. Neurocomputing 120, 355–364 (2013)
Article Google Scholar
Lie, L., Jiang, H., Zhang, H.: A robust audio classification and segmentation method. In: Proceedings of the Ninth ACM International Conference on Multimedia (2001)
Google Scholar
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10, 293–302 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Computing and Machine Learning Lab, School of ASEE, Beihang University, Beijing, 100191, China
Zengchang Qin & Wei Liu
School of Advanced Engineering, Beihang University, Beijing, 100191, China
Wei Liu
School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China
Tao Wan
Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, 44106, USA
Tao Wan

Authors

Zengchang Qin
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, University of Alberta, T6G 2E8, Edmonton, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qin, Z., Liu, W., Wan, T. (2013). A Bag-of-Tones Model with MFCC Features for Musical Genre Classification. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_48

Download citation

DOI: https://doi.org/10.1007/978-3-642-53914-5_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53913-8
Online ISBN: 978-3-642-53914-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics