Skip to main content

Genre Classification and the Invariance of MFCC Features to Key and Tempo

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6523))

Abstract

Musical genre classification is a promising yet difficult task in the field of musical information retrieval. As a widely used feature in genre classification systems, MFCC is typically believed to encode timbral information, since it represents short-duration musical textures. In this paper, we investigate the invariance of MFCC to musical key and tempo, and show that MFCCs in fact encode both timbral and key information. We also show that musical genres, which should be independent of key, are in fact influenced by the fundamental keys of the instruments involved. As a result, genre classifiers based on the MFCC features will be influenced by the dominant keys of the genre, resulting in poor performance on songs in less common keys. We propose an approach to address this problem, which consists of augmenting classifier training and prediction with various key and tempo transformations of the songs. The resulting genre classifier is invariant to key, and thus more timbre-oriented, resulting in improved classification accuracy in our experiments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Audacity, the free, cross-platform sound editor, http://audacity.sourceforge.net/

  2. Bergstra, J., Casagrande, N., Erhan, D., Eck, D., Kégl, B.: Aggregate features and Adaboost for music classification. Machine Learning 65(2), 473–484 (2006)

    Article  Google Scholar 

  3. Bridle, J.S., Brown, M.D.: An experimental automatic word recognition system. JSRU Report, 1003 (1974)

    Google Scholar 

  4. Computer audition toolbox, http://cosmal.ucsd.edu/cal/projects/catbox/catbox.htm

  5. Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley and Sons, Chichester (2006)

    MATH  Google Scholar 

  6. Dempster, A.P., Laird, N.M., Rubin, D.B., et al.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  7. Ellis, D.: Classifying music audio with timbral and chroma features. In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 339–340 (2007)

    Google Scholar 

  8. Li, T., Tzanetakis, G.: Factors in automatic musical genre classification of audio signals. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 143–146 (2003)

    Google Scholar 

  9. Lu, L., Zhang, H.J., Li, S.Z.: Content-based audio classification and segmentation by using support vector machines. Multimedia Systems 8(6), 482–492 (2003)

    Article  Google Scholar 

  10. Mandel, M., Ellis, D.: Song-level features and support vector machines for music classification. In: Proc. ISMIR, pp. 594–599. Citeseer (2005)

    Google Scholar 

  11. Meng, A., Ahrendt, P., Larsen, J.: Improving music genre classification by short time feature integration. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), vol. 5 (2005)

    Google Scholar 

  12. Pachet, F., Aucouturier, J.J.: Improving timbre similarity: How high is the sky?. Journal of negative results in speech and audio sciences 1(1) (2004)

    Google Scholar 

  13. Pearce, D., Hirsch, H.G.: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Sixth International Conference on Spoken Language Processing. Citeseer (2000)

    Google Scholar 

  14. Simard, P.Y., Steinkraus, D., Platt, J.: Best practices for convolutional neural networks applied to visual document analysis. In: International Conference on Document Analysis and Recogntion (ICDAR), pp. 958–962. IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

  15. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on speech and audio processing 10(5), 293–302 (2002)

    Article  Google Scholar 

  16. Verhelst, W., Roelands, M.: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: IEEE International Conference on Acoustic Speech and Signal Processing, vol. 2, Institute of Electrical Engineers Inc., (IEE) (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, T.L.H., Chan, A.B. (2011). Genre Classification and the Invariance of MFCC Features to Key and Tempo. In: Lee, KT., Tsai, WH., Liao, HY.M., Chen, T., Hsieh, JW., Tseng, CC. (eds) Advances in Multimedia Modeling. MMM 2011. Lecture Notes in Computer Science, vol 6523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17832-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17832-0_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17831-3

  • Online ISBN: 978-3-642-17832-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics