research-article

Audio Feature Extraction for DTW-based Audio-to-Score Alignment

Authors:
Yifan Ding

Graduate School of Science and Technology, University of Tsukuba, Japan

Graduate School of Science and Technology, University of Tsukuba, Japan
View Profile

,
Mizutani Tetsuya

Department of Information Engineering, University of Tsukuba, Japan

Department of Information Engineering, University of Tsukuba, Japan
View Profile

ICCCM '22: Proceedings of the 10th International Conference on Computer and Communications ManagementJuly 2022Pages 214–220https://doi.org/10.1145/3556223.3556255

Published:16 October 2022Publication History

ICCCM '22: Proceedings of the 10th International Conference on Computer and Communications Management

Pages 214–220

ABSTRACT

Audio-to-score alignment is one of the music information retrieval (MIR) tasks that concerns the real world time when notes appeared in a corresponding audio. Although recent studies based on synthesizing MIDI to audio then applying audio feature extraction techniques and DTW-based alignment have achieved about 10 milliseconds in mean alignment error for piano music, evaluation in a real-world scenario for robustness is preferable. In this paper, we implemented a standard DTW-based Audio-to-score alignment system with audio feature extraction techniques for musical onset enhancement, and evaluated the robustness in a real-world scenario, namely for MIR database building. Considering this type of usage, we used 3 different synthesizers and real-world performance data from CrestMusePEDB in order to simulate the absence of prior information about audio recording conditions and velocity information. As for result, velocity from real-world performance and the choice of synthesizer can ruin DTW-based alignment system by almost doubling the average mean error in most cases. We also made a practical attempt at combining phase-based onset feature extraction and conventional MIDI-audio alignment framework on real-world flute aligning, indicating the protentional benefits of combining different type of audio features.

References

Müller, M. (2007). Information retrieval for music and motion (Vol. 2). Heidelberg: Springer.Google ScholarDigital Library
Müller, M. (2015). Fundamentals of music processing: Audio, analysis, algorithms, applications (Vol. 3). Cham: Springer.Google Scholar
Foscarin, F., Mcleod, A., Rigaux, P., Jacquemard, F., & Sakai, M. (2020, October). ASAP: a dataset of aligned scores and performances for piano transcription. In International Society for Music Information Retrieval Conference (No. CONF, pp. 534-541).Google Scholar
Shi, Z., Sapp, C., Arul, K., McBride, J., & Smith III, J. O. (2019, May). SUPRA: Digitizing the Stanford University Piano Roll Archive. In ISMIR (pp. 517-523).Google Scholar
Ewert, S., Muller, M., & Grosche, P. (2009, April). High resolution audio synchronization using chroma onset features. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1869-1872). IEEE.Google ScholarDigital Library
Hu, N., Dannenberg, R. B., & Tzanetakis, G. (2003, October). Polyphonic audio matching and alignment for music retrieval. In 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No. 03TH8684) (pp. 185-188). IEEE.Google Scholar
Kwon, T., Jeong, D., & Nam, J. (2017, July). Audio-to-Score Alignment of Piano Music Using RNN-based Automatic Music Transcription. In The 14th Sound and Music Computing Conference. SMCNetwork.Google Scholar
Bello, J. P., Duxbury, C., Davies, M., & Sandler, M. (2004). On the use of phase and energy for musical onset detection in the complex domain. IEEE Signal Processing Letters, 11(6), 553-556.Google Scholar
Hashida, M., Matsui, T., & Katayose, H. (2008). A New Music Database Describing Deviation Information of Performance Expressions. In ISMIR (pp. 489-494).Google Scholar
FluidSynth, web resource. https://www.fluidsynth.org/Google Scholar
Friberg, A., & Sundberg, J. (1993). Perception of just‐noticeable time displacement of a tone presented in a metrical sequence at different tempos. The Journal of The Acoustical Society of America, 94(3), 1859-1859.Google Scholar
Juan P. Braga Brum (2018). "Traditional Flute Dataset for Score Alignment", web resource. https://www.kaggle.com/jbraga/traditional-flute-datasetGoogle Scholar

Index Terms

Audio Feature Extraction for DTW-based Audio-to-Score Alignment
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Music retrieval

Recommendations

Precise pitch profile feature extraction from musical audio for key detection

The majority of pieces of music, including classical and popular music,are composed using music scales, such as keys. The key or the scale information of a piece provides important clues on its high level musical content, like harmonic and melodic ...
Read More
Drum loop pattern extraction from polyphonic music audio
ICME'09: Proceedings of the 2009 IEEE international conference on Multimedia and Expo

Although drum loops are widely present in many audio recordings of modern style music, there is little research that deals with automatic extraction of drum loops in polyphonic music audio. This paper presents an approach for drum loop pattern ...
Read More
Chord Progressions Selection Based on Song Audio Features
Hybrid Artificial Intelligent Systems
Abstract
A chord progression is an essential building block in music. In the field of music theory is usually assumed that these progressions influence the mood, emotion, genre or other critical aspects of the songs, and also in the perception that they ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICCCM '22: Proceedings of the 10th International Conference on Computer and Communications Management
July 2022
289 pages
ISBN:9781450396349
DOI:10.1145/3556223

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Audio alignment
Chroma features
Music information retrieval
Onset feature extraction
Signal processing
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 77
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Audio Feature Extraction for DTW-based Audio-to-Score Alignment

ICCCM '22: Proceedings of the 10th International Conference on Computer and Communications Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Precise pitch profile feature extraction from musical audio for key detection

Drum loop pattern extraction from polyphonic music audio

Chord Progressions Selection Based on Song Audio Features

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Audio Feature Extraction for DTW-based Audio-to-Score Alignment

ICCCM '22: Proceedings of the 10th International Conference on Computer and Communications Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Precise pitch profile feature extraction from musical audio for key detection

Drum loop pattern extraction from polyphonic music audio

Chord Progressions Selection Based on Song Audio Features

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media