Skip to main content

Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription

  • Conference paper
  • First Online:
Music, Mind, and Embodiment (CMMR 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9617))

Included in the following conference series:

Abstract

While recent years have witnessed large progress in the algorithm of automatic music transcription (AMT), the development of general and sizable datasets for AMT evaluation is relatively stagnant, predominantly due to the fact that manually annotating and checking such datasets is labor-intensive and time-consuming. In this paper we propose a novel note-level annotation method for building AMT datasets by utilizing human’s ability in following music in real-time. To test the quality of the annotation, we further propose an efficient method in qualifying an AMT dataset based on the concepts of onset error difference and the tolerance computed from the evaluation result. According to the experiments on five piano solos and four woodwind quintets, we claim that the proposed annotation method is reliable for evaluation of AMT algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    In this paper we restrict the scope of AMT to the subtask of onset-only note tracking. Details can be found on the webpage of MIREX Multi-F0 Challenge [11].

  2. 2.

    In this paper the term “music excerpt” means a segment (usually 20–30 seconds) of audio content taken from a longer music composition.

  3. 3.

    http://www.sonicvisualiser.org/.

  4. 4.

    To paraphrase J.P. Bello et. al [1]: “Hand-marking is a painful and time-consuming task that leaves no room for the cross-validation of annotations.”.

  5. 5.

    Some onset detection datasets are built fully in manual method, but the datasets do not provide pitch information [10, 12].

  6. 6.

    http://c4dm.eecs.qmul.ac.uk/rdr/handle/123456789/27.

  7. 7.

    In our scenario we just need to find the “note name” rather than the accurate “fundamental frequency”. More specifically, we would not discriminate the fundamental frequencies of 440 Hz, 438 Hz or 442 Hz. Instead, we just need to identify “A4”. In other words, our pitch detection task allows an error up to a half semitone (\(\pm 3\)%).

  8. 8.

    https://sites.google.com/site/lisupage/research/new-methodology-of-building-polyphonic-datasets-for-amt.

  9. 9.

    http://www.irisa.fr/metiss/members/evincent/multipitch_estimation.m.

  10. 10.

    The family of \(\beta \)-divergence includes the Euclidean distance (\(\beta =2\)), Kullback- Leibler divergence (\(\beta \rightarrow 1\)) and Itakura-Saito divergence (\(\beta \rightarrow 0\)).

  11. 11.

    http://www.roland.com/products/hq_hyper_canvas/.

  12. 12.

    Also, we assume that FPs and FNs are also caused by the AMT algorithm.

  13. 13.

    Obviously, the onset error difference is larger than zero and smaller than the tolerance \(\delta \).

  14. 14.

    These criteria are arbitrary and are depending on how accurate we need for the annotation in real application.

References

  1. Bello, J.P., Daudet, L., Sandler, M.B.: Automatic Piano transcription using frequency and time-domain information. IEEE Trans. Audio Speech Lang. Process. 14(6), 2242–2251 (2006)

    Article  Google Scholar 

  2. Benetos, E., Cherla, S., Weyde, T.: An efficient shift-invariant model for polyphonic music transcription. In: 6th International Workshop on Machine Learning and Music (2013)

    Google Scholar 

  3. Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., Klapuri, A.: Automatic music transcription: challenges and future directions. J. Intell. Inf. Syst. 41(3), 407–434 (2013)

    Article  Google Scholar 

  4. Cheveigné, D.A., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4), 1917–1930 (2002)

    Article  Google Scholar 

  5. Dessein, A., Cont, A., Lemaitre, G.: Real-time polyphonic music transcription with nonnegative matrix factorization and beta-divergence. In: 11th International Society for Music Information Retrieval Conference, pp. 489–494 (2010)

    Google Scholar 

  6. Duan, Z., Pardo, B., Zhang, C.: Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Trans. Audio Speech Lang. Process. 18(8), 2121–2133 (2010)

    Article  Google Scholar 

  7. Emiya, V., Badeau, R., David, B.: Multipitch estimation of Piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2010)

    Article  Google Scholar 

  8. Ewert, S., Müller, M., Grosche, P.: High resolution audio synchronization using chroma onset features. In: Proceedings of the IEEE International Conference on Acoustics Speech. and Signal Processing, pp. 1869–1872 (2009)

    Google Scholar 

  9. Fritsch, J.: High Quality Musical Audio Source Separation. Master thesis, Queen Mary Centre for Digital Music (2012)

    Google Scholar 

  10. Holzapfel, A., Stylianou, Y., Gedik, A., Bozkurt, B.: Three dimensions of pitched instrument onset detection. IEEE Trans. Audio Speech Lang. Process. 18(6), 1517–1527 (2010)

    Article  Google Scholar 

  11. MIREX 2014 Multiple Fundamental Frequency Estimation and Tracking Challenge. http://www.music-ir.org/mirex/wiki/2014:Multiple_Fundamental_Frequency_Estimation_%26_Tracking

  12. MIREX 2014 Audio Onset Detection Challenge. http://www.music-ir.org/mirex/wiki/2014:Audio_Onset_Detection

  13. Poliner, G., Ellis, D.: A discriminative model for polyphonic piano transcription. EURASIP J. Adv. Sig. Process. 8, 154–162 (2007)

    MATH  Google Scholar 

  14. Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18(3), 528–537 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Patricia Hsu for performing the experiment of the proposed annotating method. Also the authors thank Che-Yuan Kevin Liang, Yuan-Ping Chen and Pei-I Chen for checking the annotations. This work was supported by a grant from the Ministry of Science and Technology under the contract MOST 102-2221-E-001-004-MY3 and the Academia Sinica Career Development Program. Dr. Li Su was further supported by the postdoctoral fellowship from the Academia Sinica.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Su .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Su, L., Yang, YH. (2016). Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription. In: Kronland-Martinet, R., Aramaki, M., Ystad, S. (eds) Music, Mind, and Embodiment. CMMR 2015. Lecture Notes in Computer Science(), vol 9617. Springer, Cham. https://doi.org/10.1007/978-3-319-46282-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46282-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46281-3

  • Online ISBN: 978-3-319-46282-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics