Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription

Su, Li; Yang, Yi-Hsuan

doi:10.1007/978-3-319-46282-0_20

Li Su¹⁷ &
Yi-Hsuan Yang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9617))

Included in the following conference series:

International Symposium on Computer Music Multidisciplinary Research

1489 Accesses
6 Citations

Abstract

While recent years have witnessed large progress in the algorithm of automatic music transcription (AMT), the development of general and sizable datasets for AMT evaluation is relatively stagnant, predominantly due to the fact that manually annotating and checking such datasets is labor-intensive and time-consuming. In this paper we propose a novel note-level annotation method for building AMT datasets by utilizing human’s ability in following music in real-time. To test the quality of the annotation, we further propose an efficient method in qualifying an AMT dataset based on the concepts of onset error difference and the tolerance computed from the evaluation result. According to the experiments on five piano solos and four woodwind quintets, we claim that the proposed annotation method is reliable for evaluation of AMT algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

1.
In this paper we restrict the scope of AMT to the subtask of onset-only note tracking. Details can be found on the webpage of MIREX Multi-F0 Challenge [11].
2.
In this paper the term “music excerpt” means a segment (usually 20–30 seconds) of audio content taken from a longer music composition.
3.
http://www.sonicvisualiser.org/.
4.
To paraphrase J.P. Bello et. al [1]: “Hand-marking is a painful and time-consuming task that leaves no room for the cross-validation of annotations.”.
5.
Some onset detection datasets are built fully in manual method, but the datasets do not provide pitch information [10, 12].
6.
http://c4dm.eecs.qmul.ac.uk/rdr/handle/123456789/27.
7.
In our scenario we just need to find the “note name” rather than the accurate “fundamental frequency”. More specifically, we would not discriminate the fundamental frequencies of 440 Hz, 438 Hz or 442 Hz. Instead, we just need to identify “A4”. In other words, our pitch detection task allows an error up to a half semitone (\(\pm 3\)%).
8.
https://sites.google.com/site/lisupage/research/new-methodology-of-building-polyphonic-datasets-for-amt.
9.
http://www.irisa.fr/metiss/members/evincent/multipitch_estimation.m.
10.
The family of \(\beta \)-divergence includes the Euclidean distance (\(\beta =2\)), Kullback- Leibler divergence (\(\beta \rightarrow 1\)) and Itakura-Saito divergence (\(\beta \rightarrow 0\)).
11.
http://www.roland.com/products/hq_hyper_canvas/.
12.
Also, we assume that FPs and FNs are also caused by the AMT algorithm.
13.
Obviously, the onset error difference is larger than zero and smaller than the tolerance \(\delta \).
14.
These criteria are arbitrary and are depending on how accurate we need for the annotation in real application.

References

Bello, J.P., Daudet, L., Sandler, M.B.: Automatic Piano transcription using frequency and time-domain information. IEEE Trans. Audio Speech Lang. Process. 14(6), 2242–2251 (2006)
Article Google Scholar
Benetos, E., Cherla, S., Weyde, T.: An efficient shift-invariant model for polyphonic music transcription. In: 6th International Workshop on Machine Learning and Music (2013)
Google Scholar
Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., Klapuri, A.: Automatic music transcription: challenges and future directions. J. Intell. Inf. Syst. 41(3), 407–434 (2013)
Article Google Scholar
Cheveigné, D.A., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4), 1917–1930 (2002)
Article Google Scholar
Dessein, A., Cont, A., Lemaitre, G.: Real-time polyphonic music transcription with nonnegative matrix factorization and beta-divergence. In: 11th International Society for Music Information Retrieval Conference, pp. 489–494 (2010)
Google Scholar
Duan, Z., Pardo, B., Zhang, C.: Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Trans. Audio Speech Lang. Process. 18(8), 2121–2133 (2010)
Article Google Scholar
Emiya, V., Badeau, R., David, B.: Multipitch estimation of Piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2010)
Article Google Scholar
Ewert, S., Müller, M., Grosche, P.: High resolution audio synchronization using chroma onset features. In: Proceedings of the IEEE International Conference on Acoustics Speech. and Signal Processing, pp. 1869–1872 (2009)
Google Scholar
Fritsch, J.: High Quality Musical Audio Source Separation. Master thesis, Queen Mary Centre for Digital Music (2012)
Google Scholar
Holzapfel, A., Stylianou, Y., Gedik, A., Bozkurt, B.: Three dimensions of pitched instrument onset detection. IEEE Trans. Audio Speech Lang. Process. 18(6), 1517–1527 (2010)
Article Google Scholar
MIREX 2014 Multiple Fundamental Frequency Estimation and Tracking Challenge. http://www.music-ir.org/mirex/wiki/2014:Multiple_Fundamental_Frequency_Estimation_%26_Tracking
MIREX 2014 Audio Onset Detection Challenge. http://www.music-ir.org/mirex/wiki/2014:Audio_Onset_Detection
Poliner, G., Ellis, D.: A discriminative model for polyphonic piano transcription. EURASIP J. Adv. Sig. Process. 8, 154–162 (2007)
MATH Google Scholar
Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18(3), 528–537 (2010)
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank Patricia Hsu for performing the experiment of the proposed annotating method. Also the authors thank Che-Yuan Kevin Liang, Yuan-Ping Chen and Pei-I Chen for checking the annotations. This work was supported by a grant from the Ministry of Science and Technology under the contract MOST 102-2221-E-001-004-MY3 and the Academia Sinica Career Development Program. Dr. Li Su was further supported by the postdoctoral fellowship from the Academia Sinica.

Author information

Authors and Affiliations

Center for Information and Technology Innovaiton, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei, 115, Taiwan
Li Su & Yi-Hsuan Yang

Authors

Li Su
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Hsuan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Su .

Editor information

Editors and Affiliations

CNRS - LMA, Marseille Cedex 20, France
Richard Kronland-Martinet
CNRS - LMA, Marseille, France
Mitsuko Aramaki
CNRS - LMA, Marseille, France
Sølvi Ystad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Su, L., Yang, YH. (2016). Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription. In: Kronland-Martinet, R., Aramaki, M., Ystad, S. (eds) Music, Mind, and Embodiment. CMMR 2015. Lecture Notes in Computer Science(), vol 9617. Springer, Cham. https://doi.org/10.1007/978-3-319-46282-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-46282-0_20
Published: 23 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46281-3
Online ISBN: 978-3-319-46282-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics