Feature learning and deep architectures: new directions for music informatics

Humphrey, Eric J.; Bello, Juan P.; LeCun, Yann

doi:10.1007/s10844-013-0248-5

Feature learning and deep architectures: new directions for music informatics

Published: 12 July 2013

Volume 41, pages 461–481, (2013)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Eric J. Humphrey¹,
Juan P. Bello¹ &
Yann LeCun²

3424 Accesses
73 Citations
11 Altmetric
Explore all metrics

Abstract

As we look to advance the state of the art in content-based music informatics, there is a general sense that progress is decelerating throughout the field. On closer inspection, performance trajectories across several applications reveal that this is indeed the case, raising some difficult questions for the discipline: why are we slowing down, and what can we do about it? Here, we strive to address both of these concerns. First, we critically review the standard approach to music signal analysis and identify three specific deficiencies to current methods: hand-crafted feature design is sub-optimal and unsustainable, the power of shallow architectures is fundamentally limited, and short-time analysis cannot encode musically meaningful structure. Acknowledging breakthroughs in other perceptual AI domains, we offer that deep learning holds the potential to overcome each of these obstacles. Through conceptual arguments for feature learning and deeper processing architectures, we demonstrate how deep processing models are more powerful extensions of current methods, and why now is the time for this paradigm shift. Finally, we conclude with a discussion of current challenges and the potential impact to further motivate an exploration of this promising research area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

A review on the long short-term memory model

Article 13 May 2020

Notes

Music Information Retrieval Evaluation eXchange (MIREX): http://www.music-ir.org/mirex/.
Million Song Dataset.
MIR Toolbox, Chroma Toolbox, MARSYAS, Echonest API.

References

Andén, J., & Mallat, S. (2011). Multiscale scattering for audio classification. In Proc. 12th Int. Conf. on Music Information Retrieval (ISMIR).
Bello, J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., Sandler, M. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Audio, Speech and Language Processing, 13(5), 1035–1047.
Article Google Scholar
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1–127.
Article MathSciNet MATH Google Scholar
Bengio, Y., Courville, A.C., Vincent, P. (2012). Unsupervised feature learning and deep learning: a review and new perspectives. arXiv:1206.5538.
Bengio, Y., & LeCun, Y. (2007). Scaling learning algorithms towards AI. In Large-Scale Kernel Machines (Vol. 34).
Berenzweig, A., Logan, B., Ellis, D.P., Whitman, B. (2004). A large-scale evaluation of acoustic and subjective music-similarity measures. Computer Music Journal, 28(2), 63–76.
Article Google Scholar
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y. (2010). Theano: A CPU and GPU math expression compiler. In Proc. of the Python for Scientific computing conf. (SciPy).
Bertin-Mahieux, T., & Ellis, D.P.W. (2012). Large-scale cover song recognition using the 2D fourier transform magnitude. In Proc. 13th Int. Conf. on Music Information Retrieval (ISMIR) (pp. 241–246).
Bishop, C. (2006). Pattern recognition and machine learning. Springer.
Cabral, G., & Pachet, F. (2006). Recognizing chords with EDS: Part One. Computer Music Modeling and Retrieval (pp. 185–195).
Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M. (2008). Content-based music information retrieval: current directions and future challenges. Proceedings of the IEEE, 96(4), 668–696.
Article Google Scholar
Cho, T., & Bello, J.P. (2011). A feature smoothing method for chord recognition using recurrence plots. In Proc. 12th Int. Conf. on Music Information Retrieval (ISMIR).
Chordia, P., Sastry, A., Sentürk, S. (2011). Predictive tabla modelling using variable-length markov and hidden markov models. Journal of New Music Research, 40(2), 105–118.
Article Google Scholar
Collobert, R., Kavukcuoglu, K., Farabet, C. (2011). Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop.
Dannenberg, R. (1984). An on-line algorithm for real-time accompaniment. In Proc. Int. Computer Music Conf. (pp. 193–198).
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4), 357–366.
Article Google Scholar
Dieleman, S., Brakel, P., Schrauwen, B. (2011). Audio-based music classification with a pretrained convolutional network. In Proc. 12th Int. Conf. on Music Information Retrieval (ISMIR).
Dixon, S. (2007). Evaluation of the audio beat tracking system Beatroot. Journal of New Music Research, 36(1), 39–50.
Article Google Scholar
Edward, W., & Kolen, J.F. (1994). Resonance and the perception of musical meter. Connection Science, 6(2–3), 177–208.
Google Scholar
Flexer, A., Schnitzer, D., Schlueter, J. (2012). A MIREX meta-analysis of hubness in audio music similarity. In Proc. 13th Int. Conf. on Music Information Retrieval (ISMIR) (pp. 175–180).
Fujishima, T. (1999). Realtime chord recognition of musical sound: a system using common lisp music. In Proc. int. computer music conf.
Goto, M., & Muraoka, Y. (1995). A real-time beat tracking system for audio signals. In Proc. int. computer music conf. (pp. 171–174).
Grosche, P., & Müller, M. (2011). Extracting predominant local pulse information from music recordings. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1688–1701.
Article Google Scholar
Hadsell, R., Chopra, S., LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In Proc. Computer Vision and Pattern Recognition conf. (CVPR). IEEE Press.
Hamel, P., Wood, S., Eck, D. (2009). Automatic identification of instrument classes in polyphonic and poly-instrument audio. In Proc. 10th Int. Conf. on Music Information Retrieval (ISMIR).
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine. doi:10.1109/MSP.2012.2205597.
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
Article MathSciNet MATH Google Scholar
Humphrey, E.J., & Bello, J.P. (2012). Rethinking automatic chord recognition with convolutional neural networks. In Proc. Int. Conf. on Machine Learning and Applications.
Humphrey, E.J., Bello, J.P., LeCun, Y. (2012). Moving beyond feature design: Deep architectures and automatic feature learning in music informatics. In Proc. 13th Int. Conf. on Music Information Retrieval (ISMIR).
Humphrey, E.J., Glennon, A.P., Bello, J.P. (2010). Non-linear semantic embedding for organizing large instrument sample libraries. In Proc. ICMLA.
Klapuri, A., & Davy, M. (2006). Signal processing methods for music transcription. Springer.
Le, Q., Monga, R., Devin, M., Corrado, G., Chen, K., Ranzato, M., Dean, J., Ng, A. (2012). Building high-level features using large scale unsupervised learning. In Proc. Int. Conf. on Machine Learning (ICML).
Le, Q.V., Ngiam, J., Chen, Z., Chia, D., Koh, P.W., Ng, A.Y. (2010). Tiled convolutional neural networks. In Advances in Neural Information Processing Systems (Vol. 23).
Le Roux, N., & Bengio, Y. (2008). Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation, 20(6), 1631–1649.
Article MathSciNet MATH Google Scholar
LeCun, Y. (2012). Learning invariant feature hierarchies. In Computer vision–ECCV 2012. Workshops and demonstrations (pp. 496–505). Springer.
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., Huang, F. (2006). A tutorial on energy-based learning. Predicting Structured Data.
Leveau, P., Sodoyer, D., Daudet, L. (2007). Automatic instrument recognition in a polyphonic mixture using sparse representations. In Proc. 8th Int. Conf. on Music Information Retrieval (ISMIR).
Levy, M., Noland, K., Sandler, M. (2007). A comparison of timbral and harmonic music segmentation algorithms. In 2007 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) (Vol. 4, pp. 1433–1436). IEEE.
Levy, M., & Sandler, M. (2009). Music information retrieval using social tags and audio. IEEE Transactions on Multimedia, 11(3), 383–395.
Article Google Scholar
Lyon, R., Rehn, M., Bengio, S., Walters, T., Chechik, G. (2010). Sound retrieval and ranking using sparse auditory representations. Neural computation, 22(9), 2390–2416.
Article MATH Google Scholar
Mandel, M., & Ellis, D. (2005). Song-level features and support vector machines for music classification. In Proc. 6th Int. Conf. on Music Information Retrieval (ISMIR).
Mauch, M., & Dixon, S. (2010). Simultaneous estimation of chords and musical context from audio. IEEE Transactions on Audio, Speech and Language Processing, 18(6), 1280–1289.
Article Google Scholar
McFee, B., & Lanckriet, G. (2012). Hypergraph models of playlist dialects. In Proc. 13th Int. Conf. on Music Information Retrieval (ISMIR).
Müller, M., Ellis, D., Klapuri, A., Richard, G. (2011). Signal processing for music analysis. Journal Selected Topics in Signal Processing, 5(6), 1088–1110.
Article Google Scholar
Müller, M., & Ewert, S. (2011). Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In Proc. 12th Int. Conf. on Music Information Retrieval (ISMIR). Miami, USA.
Nam, J., Ngiam, J., Lee, H., Slaney, M. (2011). A classification-based polyphonic piano transcription approach using learned feature representations. In Proc. 12th Int. Conf. on Music Information Retrieval (ISMIR).
Scheirer, E.D. (1998). Tempo and beat analysis of acoustic musical signals. Journal of the Acoustical Society of America, 103(1), 588–601.
Article Google Scholar
Schmidt, E.M., & Kim, Y.E. (2011). Modeling the acoustic structure of musical emotion with deep belief networks. In Proc. neural information processing systems.
Sheh, A., & Ellis, D.P.W. (2003). Chord segmentation and recognition using em-trained hidden markov models. In Proc. 4th Int. Conf. on Music Information Retrieval (ISMIR).
Slaney, M. (2011). Web-scale multimedia analysis: does content matter? IEEE Multimedia, 18(2), 12–15.
Article Google Scholar
Sumi, K., Arai, M., Fujishima, T., Hashimoto, S. (2012). A music retrieval system using chroma and pitch features based on conditional random fields. In 2012 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1997–2000). IEEE.
Zils, A., & Pachet, F. (2004). Automatic extraction of music descriptors from acoustic signals using EDS. In Proc. AES.

Download references

Author information

Authors and Affiliations

Music and Audio Research Laboratory (MARL), New York University, 35 West 4th St., New York, NY, 10003, USA
Eric J. Humphrey & Juan P. Bello
Courant Institute, New York University, 35 West 4th St., New York, NY, 10003, USA
Yann LeCun

Authors

Eric J. Humphrey
View author publications
You can also search for this author in PubMed Google Scholar
Juan P. Bello
View author publications
You can also search for this author in PubMed Google Scholar
Yann LeCun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eric J. Humphrey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Humphrey, E.J., Bello, J.P. & LeCun, Y. Feature learning and deep architectures: new directions for music informatics. J Intell Inf Syst 41, 461–481 (2013). https://doi.org/10.1007/s10844-013-0248-5

Download citation

Received: 19 November 2012
Revised: 10 March 2013
Accepted: 09 May 2013
Published: 12 July 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10844-013-0248-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature learning and deep architectures: new directions for music informatics

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review on the long short-term memory model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature learning and deep architectures: new directions for music informatics

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review on the long short-term memory model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation