Abstract
While recent years have witnessed large progress in the algorithm of automatic music transcription (AMT), the development of general and sizable datasets for AMT evaluation is relatively stagnant, predominantly due to the fact that manually annotating and checking such datasets is labor-intensive and time-consuming. In this paper we propose a novel note-level annotation method for building AMT datasets by utilizing human’s ability in following music in real-time. To test the quality of the annotation, we further propose an efficient method in qualifying an AMT dataset based on the concepts of onset error difference and the tolerance computed from the evaluation result. According to the experiments on five piano solos and four woodwind quintets, we claim that the proposed annotation method is reliable for evaluation of AMT algorithms.
Notes
- 1.
In this paper we restrict the scope of AMT to the subtask of onset-only note tracking. Details can be found on the webpage of MIREX Multi-F0 Challenge [11].
- 2.
In this paper the term “music excerpt” means a segment (usually 20–30 seconds) of audio content taken from a longer music composition.
- 3.
- 4.
To paraphrase J.P. Bello et. al [1]: “Hand-marking is a painful and time-consuming task that leaves no room for the cross-validation of annotations.”.
- 5.
- 6.
- 7.
In our scenario we just need to find the “note name” rather than the accurate “fundamental frequency”. More specifically, we would not discriminate the fundamental frequencies of 440 Hz, 438 Hz or 442 Hz. Instead, we just need to identify “A4”. In other words, our pitch detection task allows an error up to a half semitone (\(\pm 3\)%).
- 8.
- 9.
- 10.
The family of \(\beta \)-divergence includes the Euclidean distance (\(\beta =2\)), Kullback- Leibler divergence (\(\beta \rightarrow 1\)) and Itakura-Saito divergence (\(\beta \rightarrow 0\)).
- 11.
- 12.
Also, we assume that FPs and FNs are also caused by the AMT algorithm.
- 13.
Obviously, the onset error difference is larger than zero and smaller than the tolerance \(\delta \).
- 14.
These criteria are arbitrary and are depending on how accurate we need for the annotation in real application.
References
Bello, J.P., Daudet, L., Sandler, M.B.: Automatic Piano transcription using frequency and time-domain information. IEEE Trans. Audio Speech Lang. Process. 14(6), 2242–2251 (2006)
Benetos, E., Cherla, S., Weyde, T.: An efficient shift-invariant model for polyphonic music transcription. In: 6th International Workshop on Machine Learning and Music (2013)
Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., Klapuri, A.: Automatic music transcription: challenges and future directions. J. Intell. Inf. Syst. 41(3), 407–434 (2013)
Cheveigné, D.A., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4), 1917–1930 (2002)
Dessein, A., Cont, A., Lemaitre, G.: Real-time polyphonic music transcription with nonnegative matrix factorization and beta-divergence. In: 11th International Society for Music Information Retrieval Conference, pp. 489–494 (2010)
Duan, Z., Pardo, B., Zhang, C.: Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Trans. Audio Speech Lang. Process. 18(8), 2121–2133 (2010)
Emiya, V., Badeau, R., David, B.: Multipitch estimation of Piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2010)
Ewert, S., Müller, M., Grosche, P.: High resolution audio synchronization using chroma onset features. In: Proceedings of the IEEE International Conference on Acoustics Speech. and Signal Processing, pp. 1869–1872 (2009)
Fritsch, J.: High Quality Musical Audio Source Separation. Master thesis, Queen Mary Centre for Digital Music (2012)
Holzapfel, A., Stylianou, Y., Gedik, A., Bozkurt, B.: Three dimensions of pitched instrument onset detection. IEEE Trans. Audio Speech Lang. Process. 18(6), 1517–1527 (2010)
MIREX 2014 Multiple Fundamental Frequency Estimation and Tracking Challenge. http://www.music-ir.org/mirex/wiki/2014:Multiple_Fundamental_Frequency_Estimation_%26_Tracking
MIREX 2014 Audio Onset Detection Challenge. http://www.music-ir.org/mirex/wiki/2014:Audio_Onset_Detection
Poliner, G., Ellis, D.: A discriminative model for polyphonic piano transcription. EURASIP J. Adv. Sig. Process. 8, 154–162 (2007)
Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18(3), 528–537 (2010)
Acknowledgments
The authors would like to thank Patricia Hsu for performing the experiment of the proposed annotating method. Also the authors thank Che-Yuan Kevin Liang, Yuan-Ping Chen and Pei-I Chen for checking the annotations. This work was supported by a grant from the Ministry of Science and Technology under the contract MOST 102-2221-E-001-004-MY3 and the Academia Sinica Career Development Program. Dr. Li Su was further supported by the postdoctoral fellowship from the Academia Sinica.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Su, L., Yang, YH. (2016). Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription. In: Kronland-Martinet, R., Aramaki, M., Ystad, S. (eds) Music, Mind, and Embodiment. CMMR 2015. Lecture Notes in Computer Science(), vol 9617. Springer, Cham. https://doi.org/10.1007/978-3-319-46282-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-46282-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46281-3
Online ISBN: 978-3-319-46282-0
eBook Packages: Computer ScienceComputer Science (R0)