An effective method for audio-to-score alignment using onsets and modified constant Q spectra

Chen, Chunta; Jang, Jyh-Shing Roger

doi:10.1007/s11042-018-6349-y

An effective method for audio-to-score alignment using onsets and modified constant Q spectra

Published: 05 July 2018

Volume 78, pages 2017–2044, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

426 Accesses
1 Citation
Explore all metrics

Abstract

This paper proposes an effective algorithm for polyphonic audio-to-score alignment that aligns a polyphonic music performance to its corresponding score. The proposed framework consists of three steps: onset detection, note matching, and dynamic programming. In the first step, onsets are detected and then onset features are extracted by applying the constant Q transform around each onset. A similarity matrix is computed using a note-matching function to evaluate the similarity between concurrent notes in the music score and onsets in the audio recording. Finally, dynamic programming is used to extract the optimal alignment path in the similarity matrix. We compared five onset detectors and three spectrum difference vectors at selected audio onsets. The experimental results revealed that our method achieved higher precision than did the other algorithms included for comparison. This paper also proposes an online approach based on onset detection that can detect most notes within only 10 ms. Based on our experimental results, this online approach outperforms all methods included for comparison when the tolerance window is 50 ms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Musical note onset detection based on a spectral sparsity measure

Article Open access 28 July 2021

Music Similarity Evaluation Based on Onsets

Note onset detection based on sparse decomposition

Article 05 May 2015

Notes

The Music Information Retrieval Evaluation eXchange (MIREX, http://www.music-ir.org/mirex) is an annual evaluation campaign for MIR algorithms. Score following is one of the evaluation tasks.
The revised labels of the dataset can be downloaded in https://github.com/audioscoredata/audio-to-score-label

References

Arzt A, Widmer G, Dixon S (2008) Automatic page turning for musicians via real-time machine listening. Proceedings of European Conference on Artificial Intelligence (ECAI), p 241–245
Bello JP, Daudet L, Abdallah S, Duxbury C, Davies M, Sandler MB (2005) A tutorial on onset detection in music signals. IEEE Trans Audio Speech Lang Process 13:1035–1047
Article Google Scholar
Böck S, Widmer G (2013) Maximum filter vibrato suppression for onset detection. Proceedings of the 16th International Conference on Digital Audio Effects, p 55–61
Böck S, Widmer G (2013) Local group delay based vibrato and tremolo suppression for onset detection. Proceedings of the 14th International Society of Music Information Retrieval Conference (ISMIR), p 361–366
Böck S, Korzeniowski F, Schlüter J, Krebs F, Widmer G (2016) madmom: a new Python audio and music signal processing library. Proceeding MM ‘16 Proceedings of the 2016 ACM on Multimedia Conference, p 1174–1178
Brown JC (1991) Calculation of a constant Q spectral transform. J Acoust Soc Am 89(1):425–434
Article Google Scholar
Cai J, Guo Y, Wang H, Wang Y (2014) Score-informed source separation based on real-time polyphonic score-to-audio alignment and bayesian harmonic model. International Conference on Computational Intelligence and Communication Networks, p 672–680
Carabias-Orti JJ, Rodriguez-Serrano FJ, Vera-Candeas P, Ruiz-Reyes N, Canadas-Quesada FJ (2015) An audio to score alignment framework using spectral factorization and dynamic time warping. 16th International Society for Music Information Retrieval (ISMIR) Conference, p 742–748
Chen C-T, Jang J-SR, Liou W (2014) Improved score-performance alignment algorithms on polyphonic music. Proceedings of the 39th IEEE International Conference on Acoustics, Speech and Signal Processing, p 1365–1369
Chen C-T, Jang J-SR, Liou W-S, Weng C-Y (2016) An efficient method for polyphonic audio-to-score alignment using onset detection and constant Q transform. Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, p 2802–2806
Cont A (2006) Realtime audio to score alignment for polyphonic music instruments, using sparse non-negative constraints and hierarchical HMMS. Proceedings of the 31st International Conference on Acoustics, Speech and Signal Processing, p 245–248
Cont A, Schwarz D, Schnell N, Raphael C (2007) Evaluation of real-time audio-to-score alignment. International Society on Music Information Retrieval, p 315–316
Dannenberg RB (1984) An on-line algorithm for real time accompaniment. Proceedings of the 1984 International Computer Music Conference, p 193–198
Dannenberg RB, Hu N (2003) Polyphonic audio matching for score following and intelligent audio editors. Proceedings of the 2003 International Computer Music Conference, San Francisco: International Computer Music Association, p 27–34
Degara-Quintela N, Pena A, Torres-Guijarro S (2009) A comparison of score-level fusion rules for onset detection in music signals. Proceedings of the 10th International Conference on Music Information Retrieval, p 117–121
Dixon S (2006) Onset detection revisited. Proceedings of the International Conference on Digital Audio Effects, p 133–137
Dorfer M, Arzt A, Widmer G (2017) Learning audio-sheet music correspondences for score identification and offline alignment. Proceedings of the International Society for Music Information Retrieval Conference, p 115–122
Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Top Signal Process 5(6):1205–1215
Article Google Scholar
Duxbury C, Bello JP, Davies M, Sandler MB (2003) A combined phase and amplitude based approach to onset detection for audio segmentation. Proceedings of the European Workshop on Image Analysis for Multimedia Interactive Services, p 275–280
Eyben F, Böck S, Schuller B, Graves A (2010) Universal onset detection with bidirectional long short-term memory neural networks. Proceedings of the 11th International Conference on Music Information Retrieval, p 589–594
Holzapfel A, Stylianou Y, Gedik AC, Bozkurt B (2010) Three dimensions of pitched instrument onset detection. IEEE Trans Audio Speech Lang Process 1517–1527
Hu N, Dannenberg RB, Tzanetakis G (2003) Polyphonic audio matching and alignment for music retrieval. Proceedings IEEE WASPAA, New Paltz, p 185–188
Joder C, Essid S, Richard G (2011) A conditional random field framework for robust and scalable audio-to-score matching. IEEE Trans Audio Speech Lang Process 19(8):2385–2397
Article Google Scholar
Joder C, Essid S, Richard G (2013) Learning optimal features for polyphonic audio-to-score alignment. IEEE Trans Audio Speech Lang Process 21(10):2118–2128
Article Google Scholar
Lacoste A, Eck D (2005) Onset detection with artificial neural networks. Proceedings of the International Conference on Music Information Retrieval
Lacoste A, Eck D (2007) A supervised classification algorithm for note onset detection. EURASIP J Appl Signal Process 153–166
Lerch A (2012) “Alignment”. An introduction to audio content analysis: applications in signal processing and music informatics. Wiley, Hoboken, p 148–149
Müller M (2007) Music synchronization. Information retrieval for music and motion. Springer, p 85–108
Ono N, Miyamoto K, Kameoka H, Le Roux J, Uchiyama Y, Tsunoo E, Nishimoto T, Sagayama S (2010) Harmonic and percussive sound separation and its application to MIR-related tasks. Adv Music Inf Retr 274:213–236
Article Google Scholar
Orio N, Schwarz D (2001) Alignment of monophonic and polyphonic music to a score. Proceedings 2001 ICMC, p 155–158
Orio N, Lemouton S, Schwarz D (2003) Score following: State of the art and new developments. Proceedings of the 2003 conference on New interfaces for musical expression, Montreal, Canada, p 34–41
Raffel C, Ellis DPW (2016) Optimizing DTW-based audio-to-MIDI alignment and matching. Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, p 81–85
Rodriguez-Serrano FJ, Carabias-Orti JJ, Vera-Candeas P, Martinez-Muñoz D (2017) Tempo driven audio-to-score alignment using spectral decomposition and online dynamic time warping. ACM Trans Intell Syst Technol 8(2):1–20
Article Google Scholar
Sako S, Yamamoto R, Kitamura T (2014) Ryry: a real-time score-following automatic accompaniment playback system capable of real performances with errors, repeats and jumps. Active Media Technology: 10th International Conference, AMT 2014, Warsaw, Poland, p 134–145
Salamon J, Gómez E, Ellis DPW, Richard G (2014) Melody extraction from polyphonic music signals: approaches, applications and challenges. IEEE Signal Process Mag 31(2):118–134
Article Google Scholar
Schlüter J, Böck S (2013) Musical onset detection with convolutional neural networks. International Workshop on Machine Learning and Music (MML), Prague, Czech Republic, p 1–4
Schlüter J, Böck S (2014) Improved musical onset detection with convolutional neural networks. Proceedings of the 39th International. Conference on Acoustics, Speech and Signal Processing, p 6979–6983
Song X, Ming Z, Nie L, Zhao Y-L, Chua T-S (2016) Volunteerism tendency prediction via harvesting multiple social networks. ACM Trans Inf Syst 34(2):10:1–10:27
Article Google Scholar
Tachibana H, Ono N, Kameoka H, Sagayama S (2014) Harmonic/percussive sound separation based on anisotropic smoothness of spectrograms. IEEE/ACM Trans Audio Speech Lang Process 22:2059–2073
Article Google Scholar
Tian M, Fazekas G, Black DAA, Sandler M (2014) Design and evaluation of onset detectors using different fusion policies. Proceedings of the International Society of Music Information Retrieval, p 631–636
Ueda Y, Uchiyama Y, Nishimoto T, Ono N, Sagayama S (2010) HMM-based approach for automatic chord detection using refined acoustic features. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, p 5518–5521
Wang S, Ewert S, Dixon S (2016) Robust and efficient joint alignment of multiple musical performances. IEEE Trans Audio Speech Lang Process 24(11):2132–2145
Article Google Scholar

Download references

Acknowledgments

This research is partially supported by Ministry of Science and Technology, ROC, under Grant no. MOST 104-2221-E-002-051-MY3.

Author information

Authors and Affiliations

Computer Science Department, National Tsing Hua University, Hsinchu City, Taiwan
Chunta Chen
Computer Science Department, National Taiwan University, Taipei City, Taiwan
Jyh-Shing Roger Jang

Authors

Chunta Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jyh-Shing Roger Jang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunta Chen.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In order to make our presentation of the proposed framework clear, here we list symbols and their definitions as follows.

w_m: The window size in frame number, where m is an integer
θ: The thresholding parameter of the peak picking
n_a: The frame index of the average time of a peak group
Hv: The harmonic curve which derives from the harmonic component of a music recording
Pv: The percussive curve which derives from the percussive component of a music recording
n_p: The frame index of local maxima of Pv.
n_h: The frame index of local maxima of Hv.
dA_i: The spectrum difference around an onset i.
\( {\psi}_i^m \): The spectrum difference vector that derives from the spectrum difference dA_i, where m means the type of processing
\( {\mathcal{g}}_j^{\ell } \): The set of note pitches and overtones of a concurrence j in the score, where ℓ is the number of overtones.
Ω: The overtone vector
S: The similarity matrix of the input audio and the music score
M: The number of detected onsets in the audio
N: The number of concurrences in the score
η: The local tempo coefficient

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, C., Jang, JS.R. An effective method for audio-to-score alignment using onsets and modified constant Q spectra. Multimed Tools Appl 78, 2017–2044 (2019). https://doi.org/10.1007/s11042-018-6349-y

Download citation

Received: 28 August 2017
Revised: 07 June 2018
Accepted: 29 June 2018
Published: 05 July 2018
Issue Date: January 2019
DOI: https://doi.org/10.1007/s11042-018-6349-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective method for audio-to-score alignment using onsets and modified constant Q spectra

Abstract

Access this article

Similar content being viewed by others

Musical note onset detection based on a spectral sparsity measure

Music Similarity Evaluation Based on Onsets

Note onset detection based on sparse decomposition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An effective method for audio-to-score alignment using onsets and modified constant Q spectra

Abstract

Access this article

Similar content being viewed by others

Musical note onset detection based on a spectral sparsity measure

Music Similarity Evaluation Based on Onsets

Note onset detection based on sparse decomposition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation