Using Oracle Analysis for Decomposition-Based Automatic Music Transcription

O’Hanlon, Ken; Nagano, Hidehisa; Plumbley, Mark D.

doi:10.1007/978-3-642-41248-6_19

Ken O’Hanlon¹⁸,
Hidehisa Nagano^18,19 &
Mark D. Plumbley¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7900))

Included in the following conference series:

International Symposium on Computer Music Modeling and Retrieval

Abstract

One approach to Automatic Music Transcription (AMT) is to decompose a spectrogram with a dictionary matrix that contains a pitch-labelled note spectrum atom in each column. AMT performance is typically measured using frame-based comparison, while an alternative perspective is to use an event-based analysis. We have previously proposed an AMT system, based on the use of structured sparse representations. The method is described and experimental results are given, which are seen to be promising. An inspection of the graphical AMT output known as a piano roll may lead one to think that the performance may be slightly better than is suggested by the AMT metrics used. This leads us to perform an oracle analysis of the AMT system, with some interesting outcomes which may have implications for decomposition based AMT in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aharon, M., Elad, M., Bruckstein, A.M.: K-SVD and its Non-negative Variant for Dictionary Design. In: Proceedings of the SPIE Conference Wavelets, pp. 327-339 (2005)
Google Scholar
Daudet, L.: Sparse and Structured Decompositions of Signals with the Molecular Matching Pursuit. IEEE Transactions on Audio, Speech and Language Processing 14(5), 1808–1816 (2006)
Article Google Scholar
Emiya, V., Badeau, R., David, B.: Multipitch Estimation of Piano Sounds using a New Probabilistic Spectral Smoothness Principle. IEEE Transactions on Audio, Speech and Language 18(6), 1643–1654 (2010)
Article Google Scholar
Leveau, P., Vincent, E., Richard, G., Daudet, L.: Instrument-Specific Harmonic Atoms for Mid-Level Music Representation. IEEE Transactions on Audio, Speech and Language 16(1), 116–128 (2008)
Article Google Scholar
Poliner, G., Ellis, D.: A Discrimative Model for Polyphonic Piano Transcription. EURASIP Journal Advances in Signal Processing (8), 154–162 (2007)
Google Scholar
Smaragdis, P., Brown, J.C.: Non-negative Matrix Factorization for Polyphonic Music Transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2003)
Google Scholar
O’Hanlon, K., Nagano, H., Plumbley, M.D.: Structured Sparsity for Automatic Music Transcription. In: IEEE International Conference on Audio, Speech and Signal Processing (2012)
Google Scholar
Abdallah, S.A., Plumbley, M.D.: Polyphonic Transcription by Non-negative Sparse Coding of Power Spectra. In: Proceedings of ISMIR, pp. 318–325 (2004)
Google Scholar
Benetos, E., Dixon, S.: Multiple-Instrument Polyphonic Music Transcription using a Convolutive Probabilistic Model. In: Proceedings of the Sound and Music Computing Conference (2011)
Google Scholar
Bertin, N., Badeau, R., Vincent, E.: Enforcing Harmonicity and Smoothness in Bayesian Non-negative Matrix Factorization applied to Polyphonic Music Transcription. IEEE Transactions on Audio, Speech, and Language Processing 18(3), 538–549 (2010)
Article Google Scholar
Pati, Y.C., Rezaiifar, R.: Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. In: Proceedings of the 27th Annual Asilomar Conference on Signals, Systems and Computers, pp. 40–44 (1993)
Google Scholar
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic Decomposition by Basis Pursuit. SIAM Journal on Scientific Computing 20, 33–61 (1998)
Article MathSciNet Google Scholar
Vincent, E., Bertin, N., Badeau, R.: Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation. IEEE Transactions on Audio, Speech and Language Processing 18(3), 528–537 (2010)
Article Google Scholar
Hoyer, P.O.: Non-negative sparse coding. In: Proceedings of the 2002 IEEE Workshop on Neural Networks for Signal Processing, pp. 557–565 (2002)
Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. In: Advances in Neural Information Processing Systems (NIPS), pp. 556–562 (2000)
Google Scholar
O’Hanlon, K., Nagano, H., Plumbley, M.D.: Group Non-negative Basis Pursuit for Automatic Music Transcription. In: Proceedings of the Workshop on Music and Machine Learning (MML) at ICML (2012)
Google Scholar
O’Hanlon, K., Plumbley, M.D.: Greedy Non-negative Group Sparsity. In: Proceedings of the 3rd IMA Conference on Numerical Linear Algebra and Optimisation (2012)
Google Scholar
Bock, S., Schedl, M.: Polyphonic Piano Note Transcription with Recurrent Neural Networks. In: Proceedings of the 2012 International Conference on Acoustics, Speech and Signal Processing, pp. 121–124 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Queen Mary University of London, UK
Ken O’Hanlon, Hidehisa Nagano & Mark D. Plumbley
NTT Communication Science Laboratories, NTT Corporation, UK
Hidehisa Nagano

Authors

Ken O’Hanlon
View author publications
You can also search for this author in PubMed Google Scholar
Hidehisa Nagano
View author publications
You can also search for this author in PubMed Google Scholar
Mark D. Plumbley
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS - LMA, 31 Chemin Joseph Aiguier, 13402, Marseille Cedex 20, France
Mitsuko Aramaki , Richard Kronland-Martinet & Sølvi Ystad , &
Centre for Digital Music, Queen Mary University of London, Mile End Road, E1 4NS, London, UK
Mathieu Barthet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

O’Hanlon, K., Nagano, H., Plumbley, M.D. (2013). Using Oracle Analysis for Decomposition-Based Automatic Music Transcription. In: Aramaki, M., Barthet, M., Kronland-Martinet, R., Ystad, S. (eds) From Sounds to Music and Emotions. CMMR 2012. Lecture Notes in Computer Science, vol 7900. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41248-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-41248-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41247-9
Online ISBN: 978-3-642-41248-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics