skip to main content
10.1145/3556223.3556255acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicccmConference Proceedingsconference-collections
research-article

Audio Feature Extraction for DTW-based Audio-to-Score Alignment

Authors Info & Claims
Published:16 October 2022Publication History

ABSTRACT

Audio-to-score alignment is one of the music information retrieval (MIR) tasks that concerns the real world time when notes appeared in a corresponding audio. Although recent studies based on synthesizing MIDI to audio then applying audio feature extraction techniques and DTW-based alignment have achieved about 10 milliseconds in mean alignment error for piano music, evaluation in a real-world scenario for robustness is preferable. In this paper, we implemented a standard DTW-based Audio-to-score alignment system with audio feature extraction techniques for musical onset enhancement, and evaluated the robustness in a real-world scenario, namely for MIR database building. Considering this type of usage, we used 3 different synthesizers and real-world performance data from CrestMusePEDB in order to simulate the absence of prior information about audio recording conditions and velocity information. As for result, velocity from real-world performance and the choice of synthesizer can ruin DTW-based alignment system by almost doubling the average mean error in most cases. We also made a practical attempt at combining phase-based onset feature extraction and conventional MIDI-audio alignment framework on real-world flute aligning, indicating the protentional benefits of combining different type of audio features.

References

  1. Müller, M. (2007). Information retrieval for music and motion (Vol. 2). Heidelberg: Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Müller, M. (2015). Fundamentals of music processing: Audio, analysis, algorithms, applications (Vol. 3). Cham: Springer.Google ScholarGoogle Scholar
  3. Foscarin, F., Mcleod, A., Rigaux, P., Jacquemard, F., & Sakai, M. (2020, October). ASAP: a dataset of aligned scores and performances for piano transcription. In International Society for Music Information Retrieval Conference (No. CONF, pp. 534-541).Google ScholarGoogle Scholar
  4. Shi, Z., Sapp, C., Arul, K., McBride, J., & Smith III, J. O. (2019, May). SUPRA: Digitizing the Stanford University Piano Roll Archive. In ISMIR (pp. 517-523).Google ScholarGoogle Scholar
  5. Ewert, S., Muller, M., & Grosche, P. (2009, April). High resolution audio synchronization using chroma onset features. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1869-1872). IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hu, N., Dannenberg, R. B., & Tzanetakis, G. (2003, October). Polyphonic audio matching and alignment for music retrieval. In 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No. 03TH8684) (pp. 185-188). IEEE.Google ScholarGoogle Scholar
  7. Kwon, T., Jeong, D., & Nam, J. (2017, July). Audio-to-Score Alignment of Piano Music Using RNN-based Automatic Music Transcription. In The 14th Sound and Music Computing Conference. SMCNetwork.Google ScholarGoogle Scholar
  8. Bello, J. P., Duxbury, C., Davies, M., & Sandler, M. (2004). On the use of phase and energy for musical onset detection in the complex domain. IEEE Signal Processing Letters, 11(6), 553-556.Google ScholarGoogle Scholar
  9. Hashida, M., Matsui, T., & Katayose, H. (2008). A New Music Database Describing Deviation Information of Performance Expressions. In ISMIR (pp. 489-494).Google ScholarGoogle Scholar
  10. FluidSynth, web resource. https://www.fluidsynth.org/Google ScholarGoogle Scholar
  11. Friberg, A., & Sundberg, J. (1993). Perception of just‐noticeable time displacement of a tone presented in a metrical sequence at different tempos. The Journal of The Acoustical Society of America, 94(3), 1859-1859.Google ScholarGoogle Scholar
  12. Juan P. Braga Brum (2018). "Traditional Flute Dataset for Score Alignment", web resource. https://www.kaggle.com/jbraga/traditional-flute-datasetGoogle ScholarGoogle Scholar

Index Terms

  1. Audio Feature Extraction for DTW-based Audio-to-Score Alignment

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICCCM '22: Proceedings of the 10th International Conference on Computer and Communications Management
      July 2022
      289 pages
      ISBN:9781450396349
      DOI:10.1145/3556223

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 October 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format