Abstract
Musical genre classification is a promising yet difficult task in the field of musical information retrieval. As a widely used feature in genre classification systems, MFCC is typically believed to encode timbral information, since it represents short-duration musical textures. In this paper, we investigate the invariance of MFCC to musical key and tempo, and show that MFCCs in fact encode both timbral and key information. We also show that musical genres, which should be independent of key, are in fact influenced by the fundamental keys of the instruments involved. As a result, genre classifiers based on the MFCC features will be influenced by the dominant keys of the genre, resulting in poor performance on songs in less common keys. We propose an approach to address this problem, which consists of augmenting classifier training and prediction with various key and tempo transformations of the songs. The resulting genre classifier is invariant to key, and thus more timbre-oriented, resulting in improved classification accuracy in our experiments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Audacity, the free, cross-platform sound editor, http://audacity.sourceforge.net/
Bergstra, J., Casagrande, N., Erhan, D., Eck, D., Kégl, B.: Aggregate features and Adaboost for music classification. Machine Learning 65(2), 473–484 (2006)
Bridle, J.S., Brown, M.D.: An experimental automatic word recognition system. JSRU Report, 1003 (1974)
Computer audition toolbox, http://cosmal.ucsd.edu/cal/projects/catbox/catbox.htm
Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley and Sons, Chichester (2006)
Dempster, A.P., Laird, N.M., Rubin, D.B., et al.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)
Ellis, D.: Classifying music audio with timbral and chroma features. In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 339–340 (2007)
Li, T., Tzanetakis, G.: Factors in automatic musical genre classification of audio signals. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 143–146 (2003)
Lu, L., Zhang, H.J., Li, S.Z.: Content-based audio classification and segmentation by using support vector machines. Multimedia Systems 8(6), 482–492 (2003)
Mandel, M., Ellis, D.: Song-level features and support vector machines for music classification. In: Proc. ISMIR, pp. 594–599. Citeseer (2005)
Meng, A., Ahrendt, P., Larsen, J.: Improving music genre classification by short time feature integration. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), vol. 5 (2005)
Pachet, F., Aucouturier, J.J.: Improving timbre similarity: How high is the sky?. Journal of negative results in speech and audio sciences 1(1) (2004)
Pearce, D., Hirsch, H.G.: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Sixth International Conference on Spoken Language Processing. Citeseer (2000)
Simard, P.Y., Steinkraus, D., Platt, J.: Best practices for convolutional neural networks applied to visual document analysis. In: International Conference on Document Analysis and Recogntion (ICDAR), pp. 958–962. IEEE Computer Society, Los Alamitos (2003)
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on speech and audio processing 10(5), 293–302 (2002)
Verhelst, W., Roelands, M.: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: IEEE International Conference on Acoustic Speech and Signal Processing, vol. 2, Institute of Electrical Engineers Inc., (IEE) (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, T.L.H., Chan, A.B. (2011). Genre Classification and the Invariance of MFCC Features to Key and Tempo. In: Lee, KT., Tsai, WH., Liao, HY.M., Chen, T., Hsieh, JW., Tseng, CC. (eds) Advances in Multimedia Modeling. MMM 2011. Lecture Notes in Computer Science, vol 6523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17832-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-17832-0_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17831-3
Online ISBN: 978-3-642-17832-0
eBook Packages: Computer ScienceComputer Science (R0)