Skip to main content

Using Oracle Analysis for Decomposition-Based Automatic Music Transcription

  • Conference paper
From Sounds to Music and Emotions (CMMR 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7900))

Included in the following conference series:

Abstract

One approach to Automatic Music Transcription (AMT) is to decompose a spectrogram with a dictionary matrix that contains a pitch-labelled note spectrum atom in each column. AMT performance is typically measured using frame-based comparison, while an alternative perspective is to use an event-based analysis. We have previously proposed an AMT system, based on the use of structured sparse representations. The method is described and experimental results are given, which are seen to be promising. An inspection of the graphical AMT output known as a piano roll may lead one to think that the performance may be slightly better than is suggested by the AMT metrics used. This leads us to perform an oracle analysis of the AMT system, with some interesting outcomes which may have implications for decomposition based AMT in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aharon, M., Elad, M., Bruckstein, A.M.: K-SVD and its Non-negative Variant for Dictionary Design. In: Proceedings of the SPIE Conference Wavelets, pp. 327-339 (2005)

    Google Scholar 

  2. Daudet, L.: Sparse and Structured Decompositions of Signals with the Molecular Matching Pursuit. IEEE Transactions on Audio, Speech and Language Processing 14(5), 1808–1816 (2006)

    Article  Google Scholar 

  3. Emiya, V., Badeau, R., David, B.: Multipitch Estimation of Piano Sounds using a New Probabilistic Spectral Smoothness Principle. IEEE Transactions on Audio, Speech and Language 18(6), 1643–1654 (2010)

    Article  Google Scholar 

  4. Leveau, P., Vincent, E., Richard, G., Daudet, L.: Instrument-Specific Harmonic Atoms for Mid-Level Music Representation. IEEE Transactions on Audio, Speech and Language 16(1), 116–128 (2008)

    Article  Google Scholar 

  5. Poliner, G., Ellis, D.: A Discrimative Model for Polyphonic Piano Transcription. EURASIP Journal Advances in Signal Processing (8), 154–162 (2007)

    Google Scholar 

  6. Smaragdis, P., Brown, J.C.: Non-negative Matrix Factorization for Polyphonic Music Transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2003)

    Google Scholar 

  7. O’Hanlon, K., Nagano, H., Plumbley, M.D.: Structured Sparsity for Automatic Music Transcription. In: IEEE International Conference on Audio, Speech and Signal Processing (2012)

    Google Scholar 

  8. Abdallah, S.A., Plumbley, M.D.: Polyphonic Transcription by Non-negative Sparse Coding of Power Spectra. In: Proceedings of ISMIR, pp. 318–325 (2004)

    Google Scholar 

  9. Benetos, E., Dixon, S.: Multiple-Instrument Polyphonic Music Transcription using a Convolutive Probabilistic Model. In: Proceedings of the Sound and Music Computing Conference (2011)

    Google Scholar 

  10. Bertin, N., Badeau, R., Vincent, E.: Enforcing Harmonicity and Smoothness in Bayesian Non-negative Matrix Factorization applied to Polyphonic Music Transcription. IEEE Transactions on Audio, Speech, and Language Processing 18(3), 538–549 (2010)

    Article  Google Scholar 

  11. Pati, Y.C., Rezaiifar, R.: Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. In: Proceedings of the 27th Annual Asilomar Conference on Signals, Systems and Computers, pp. 40–44 (1993)

    Google Scholar 

  12. Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic Decomposition by Basis Pursuit. SIAM Journal on Scientific Computing 20, 33–61 (1998)

    Article  MathSciNet  Google Scholar 

  13. Vincent, E., Bertin, N., Badeau, R.: Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation. IEEE Transactions on Audio, Speech and Language Processing 18(3), 528–537 (2010)

    Article  Google Scholar 

  14. Hoyer, P.O.: Non-negative sparse coding. In: Proceedings of the 2002 IEEE Workshop on Neural Networks for Signal Processing, pp. 557–565 (2002)

    Google Scholar 

  15. Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. In: Advances in Neural Information Processing Systems (NIPS), pp. 556–562 (2000)

    Google Scholar 

  16. O’Hanlon, K., Nagano, H., Plumbley, M.D.: Group Non-negative Basis Pursuit for Automatic Music Transcription. In: Proceedings of the Workshop on Music and Machine Learning (MML) at ICML (2012)

    Google Scholar 

  17. O’Hanlon, K., Plumbley, M.D.: Greedy Non-negative Group Sparsity. In: Proceedings of the 3rd IMA Conference on Numerical Linear Algebra and Optimisation (2012)

    Google Scholar 

  18. Bock, S., Schedl, M.: Polyphonic Piano Note Transcription with Recurrent Neural Networks. In: Proceedings of the 2012 International Conference on Acoustics, Speech and Signal Processing, pp. 121–124 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

O’Hanlon, K., Nagano, H., Plumbley, M.D. (2013). Using Oracle Analysis for Decomposition-Based Automatic Music Transcription. In: Aramaki, M., Barthet, M., Kronland-Martinet, R., Ystad, S. (eds) From Sounds to Music and Emotions. CMMR 2012. Lecture Notes in Computer Science, vol 7900. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41248-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41248-6_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41247-9

  • Online ISBN: 978-3-642-41248-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics