Genre Classification and the Invariance of MFCC Features to Key and Tempo

Li, Tom L. H.; Chan, Antoni B.

doi:10.1007/978-3-642-17832-0_30

Genre Classification and the Invariance of MFCC Features to Key and Tempo

Tom L. H. Li²¹ &
Antoni B. Chan²¹

Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6523))

Abstract

Musical genre classification is a promising yet difficult task in the field of musical information retrieval. As a widely used feature in genre classification systems, MFCC is typically believed to encode timbral information, since it represents short-duration musical textures. In this paper, we investigate the invariance of MFCC to musical key and tempo, and show that MFCCs in fact encode both timbral and key information. We also show that musical genres, which should be independent of key, are in fact influenced by the fundamental keys of the instruments involved. As a result, genre classifiers based on the MFCC features will be influenced by the dominant keys of the genre, resulting in poor performance on songs in less common keys. We propose an approach to address this problem, which consists of augmenting classifier training and prediction with various key and tempo transformations of the songs. The resulting genre classifier is invariant to key, and thus more timbre-oriented, resulting in improved classification accuracy in our experiments.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Audacity, the free, cross-platform sound editor, http://audacity.sourceforge.net/
Bergstra, J., Casagrande, N., Erhan, D., Eck, D., Kégl, B.: Aggregate features and Adaboost for music classification. Machine Learning 65(2), 473–484 (2006)
Article Google Scholar
Bridle, J.S., Brown, M.D.: An experimental automatic word recognition system. JSRU Report, 1003 (1974)
Google Scholar
Computer audition toolbox, http://cosmal.ucsd.edu/cal/projects/catbox/catbox.htm
Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley and Sons, Chichester (2006)
MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B., et al.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Ellis, D.: Classifying music audio with timbral and chroma features. In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 339–340 (2007)
Google Scholar
Li, T., Tzanetakis, G.: Factors in automatic musical genre classification of audio signals. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 143–146 (2003)
Google Scholar
Lu, L., Zhang, H.J., Li, S.Z.: Content-based audio classification and segmentation by using support vector machines. Multimedia Systems 8(6), 482–492 (2003)
Article Google Scholar
Mandel, M., Ellis, D.: Song-level features and support vector machines for music classification. In: Proc. ISMIR, pp. 594–599. Citeseer (2005)
Google Scholar
Meng, A., Ahrendt, P., Larsen, J.: Improving music genre classification by short time feature integration. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), vol. 5 (2005)
Google Scholar
Pachet, F., Aucouturier, J.J.: Improving timbre similarity: How high is the sky?. Journal of negative results in speech and audio sciences 1(1) (2004)
Google Scholar
Pearce, D., Hirsch, H.G.: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Sixth International Conference on Spoken Language Processing. Citeseer (2000)
Google Scholar
Simard, P.Y., Steinkraus, D., Platt, J.: Best practices for convolutional neural networks applied to visual document analysis. In: International Conference on Document Analysis and Recogntion (ICDAR), pp. 958–962. IEEE Computer Society, Los Alamitos (2003)
Google Scholar
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on speech and audio processing 10(5), 293–302 (2002)
Article Google Scholar
Verhelst, W., Roelands, M.: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: IEEE International Conference on Acoustic Speech and Signal Processing, vol. 2, Institute of Electrical Engineers Inc., (IEE) (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Hong Kong
Tom L. H. Li & Antoni B. Chan

Authors

Tom L. H. Li
View author publications
You can also search for this author in PubMed Google Scholar
Antoni B. Chan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Taiwan Ocean University, Keelung, Taiwan
Kuo-Tien Lee & Jun-Wei Hsieh &
National Chiao Tung University, Hsinchu, Taiwan
Wen-Hsiang Tsai
Academia Sinica, Taipei, Taiwan
Hong-Yuan Mark Liao
Cornell University, Ithaca, NY, USA
Tsuhan Chen
National Kaohsiung First University of Science and Technology, Kaohsiung, Taiwan
Chien-Cheng Tseng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, T.L.H., Chan, A.B. (2011). Genre Classification and the Invariance of MFCC Features to Key and Tempo. In: Lee, KT., Tsai, WH., Liao, HY.M., Chen, T., Hsieh, JW., Tseng, CC. (eds) Advances in Multimedia Modeling. MMM 2011. Lecture Notes in Computer Science, vol 6523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17832-0_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-17832-0_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17831-3
Online ISBN: 978-3-642-17832-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics