ABSTRACT
Automatic Music Transcription (AMT) is an important task in Music Information Retrieval (MIR). Many researchers have focused on the structure of Convolutional Neural Network (CNN) for transcription. In this paper, we construct a CNN-based (EB) piano music transcription model using the energy-balanced constant Q transform spectrogram, which is called EB-enhanced CNN model. Unlike standard CNN-based methods, the proposed model makes the energy of the input features more balanced, so that many previously missed pitches due to weak energy can be successfully detected. Training and evaluation are performed on the MAPS dataset, a public dataset for piano transcription. As a result, our technique achieves a 3.53% f1 score improvement compared with the state-of-the-art method on the MAPS ENSTDkCl subset.
- A. Cogliati, Z. Duan, and B. Wohlberg, “Piano transcription with convolutional sparse lateral inhibition,” IEEE Signal Processing Letters, vol 24, no 4, pp. 392-396, 2017.Google ScholarCross Ref
- F. Cong, S. Liu, and L. Guo, “A Parallel Fusion Approach to Piano Music Transcription based on Convolutional Neural Network,” in ICASSP, pp. 125131, 2018.Google ScholarDigital Library
- E. Benetos, S. Dixon, Z. Duan, and S. Ewert, “Automatic Music Transcription: An Overview,” IEEE Signal Processing Magazine, vol 36, no 1, pp. 20-30, 2018.Google ScholarCross Ref
- S. C. Kong, W. Xu, W. Liu, X. Gong, and J. T. Liu, “Onset-Aware Polyphonic Piano Transcription: A CNN-Based Approach,” the 9th International Workshop on Computer Science and Engineering, pp. 454-461, 2019.Google Scholar
- K. Dressler, “Multiple fundamental frequency extraction for MIREX 2012”, Eighth Music Information Retrieval Evaluation eXchange (MIREX), 2012.Google Scholar
- P. Smaragdis and J.C. Brown, “Non-negative matrix factorization for polyphonic music transcription”, in WASPAA, pp. 177-180, 2003.Google ScholarCross Ref
- S. Ewert, “An augmented Lagrangian method for piano transcription using equal loudness thresholding and LSTM-based decoding”, in WASPAA, pp. 146-150, 2017.Google ScholarCross Ref
- P. H. Peeling and A. T. Cemgil, “Generative spectrogram factorization models for polyphonic piano transcription,” IEEE transactions on audio, speech, and language processing, vol 18, no 3, pp. 519-527, 2009.Google Scholar
- S. Sigtia, E. Benetos and S. Dixon, “An end-to-end neural network for polyphonic piano music transcription,” IEEE/ACM Transactions on Audio, Speech, and Language, vol 24, no 5, pp. 927-939, 2016.Google ScholarDigital Library
- S. Sigtia and E. Benetos, “A hybrid recurrent neural network for music transcription,” IEEE international conference on acoustics, speech and signal, pp. 2061-2065, 2015.Google ScholarCross Ref
- C. Hawthorne, E. Elsen, and J. Song, “Onsets and frames: Dual-objective piano transcription”, in arXiv preprint 1710.11153, 2017.Google Scholar
- R. Kelz, S. Böck, and G. Widmer, “Deep polyphonic adsr piano note transcription,” IEEE International Conference on Acoustics, Speech and Signal, pp. 246250, 2019.Google ScholarCross Ref
- C. Hawthorne, E. Elsen, and J. Song, “Onsets and frames: Dual-objective piano transcription,” the 19th ISMIR Conference, 2018.Google Scholar
- C. Z. A. Huang and A. Vaswani, “Music transformer: Generating music with long-term structure,” International Conference on Learning Representations, pp. 102-110, 2019.Google Scholar
- X. Gong, W. Xu, and J. T. Liu, “ANALYSIS AND CORRECTION OF MAPS DATASET,” the 22nd International Conference on Digital Audio Effects, 2019.Google Scholar
- V. Emiya, R. Badeau, and B. David, “Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle,” IEEE Transactions on Audio, Speech, and Language Processing, vol 18, no 6, pp. 1643-1654, 2009.Google ScholarCross Ref
- Q. Wang, R. Zhou, and Y. Yan, “A two-stage approach to note-level transcription of a specific piano,” Applied Sciences, vol 7, no 9, pp. 901, 2017.Google ScholarCross Ref
Recommendations
Robust Piano Music Transcription Based on Computer Vision
HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial IntelligenceRecently, automatic music transcription aiming to convert acoustic music signals into symbolic notations attracts increasing attention. In order to deal with the challenges of automatic music transcription based on acoustic information, traditional ...
claVision: Visual Automatic Piano Music Transcription
NIME 2015: Proceedings of the international conference on New Interfaces for Musical ExpressionOne important problem in Musical Information Retrieval is Automatic Music Transcription, which is an automated conversion process from played music to a symbolic notation such as sheet music. Since the accuracy of previous audio-based transcription ...
Automatic Guitar Music Transcription
ACSAT '12: Proceedings of the 2012 International Conference on Advanced Computer Science Applications and TechnologiesThis paper presents a system that helps in automatically generating guitar tablatures and musical scores based on musical audio data. Information gathered from the audio consists of pitch, onsets and durations, chords, and beat and tempo. Major issues ...
Comments