research-article

An EB-enhanced CNN Model for Piano Music Transcription

Authors:
Juanting Liu

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

,
Wei Xu

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

,
Xianke Wang

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

,
Wenqing Cheng

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and ComputingFebruary 2021Pages 186–190https://doi.org/10.1145/3457682.3457710

Published:21 June 2021Publication History

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

Pages 186–190

ABSTRACT

Automatic Music Transcription (AMT) is an important task in Music Information Retrieval (MIR). Many researchers have focused on the structure of Convolutional Neural Network (CNN) for transcription. In this paper, we construct a CNN-based (EB) piano music transcription model using the energy-balanced constant Q transform spectrogram, which is called EB-enhanced CNN model. Unlike standard CNN-based methods, the proposed model makes the energy of the input features more balanced, so that many previously missed pitches due to weak energy can be successfully detected. Training and evaluation are performed on the MAPS dataset, a public dataset for piano transcription. As a result, our technique achieves a 3.53% f1 score improvement compared with the state-of-the-art method on the MAPS ENSTDkCl subset.

References

A. Cogliati, Z. Duan, and B. Wohlberg, “Piano transcription with convolutional sparse lateral inhibition,” IEEE Signal Processing Letters, vol 24, no 4, pp. 392-396, 2017.Google ScholarCross Ref
F. Cong, S. Liu, and L. Guo, “A Parallel Fusion Approach to Piano Music Transcription based on Convolutional Neural Network,” in ICASSP, pp. 125131, 2018.Google ScholarDigital Library
E. Benetos, S. Dixon, Z. Duan, and S. Ewert, “Automatic Music Transcription: An Overview,” IEEE Signal Processing Magazine, vol 36, no 1, pp. 20-30, 2018.Google ScholarCross Ref
S. C. Kong, W. Xu, W. Liu, X. Gong, and J. T. Liu, “Onset-Aware Polyphonic Piano Transcription: A CNN-Based Approach,” the 9th International Workshop on Computer Science and Engineering, pp. 454-461, 2019.Google Scholar
K. Dressler, “Multiple fundamental frequency extraction for MIREX 2012”, Eighth Music Information Retrieval Evaluation eXchange (MIREX), 2012.Google Scholar
P. Smaragdis and J.C. Brown, “Non-negative matrix factorization for polyphonic music transcription”, in WASPAA, pp. 177-180, 2003.Google ScholarCross Ref
S. Ewert, “An augmented Lagrangian method for piano transcription using equal loudness thresholding and LSTM-based decoding”, in WASPAA, pp. 146-150, 2017.Google ScholarCross Ref
P. H. Peeling and A. T. Cemgil, “Generative spectrogram factorization models for polyphonic piano transcription,” IEEE transactions on audio, speech, and language processing, vol 18, no 3, pp. 519-527, 2009.Google Scholar
S. Sigtia, E. Benetos and S. Dixon, “An end-to-end neural network for polyphonic piano music transcription,” IEEE/ACM Transactions on Audio, Speech, and Language, vol 24, no 5, pp. 927-939, 2016.Google ScholarDigital Library
S. Sigtia and E. Benetos, “A hybrid recurrent neural network for music transcription,” IEEE international conference on acoustics, speech and signal, pp. 2061-2065, 2015.Google ScholarCross Ref
C. Hawthorne, E. Elsen, and J. Song, “Onsets and frames: Dual-objective piano transcription”, in arXiv preprint 1710.11153, 2017.Google Scholar
R. Kelz, S. Böck, and G. Widmer, “Deep polyphonic adsr piano note transcription,” IEEE International Conference on Acoustics, Speech and Signal, pp. 246250, 2019.Google ScholarCross Ref
C. Hawthorne, E. Elsen, and J. Song, “Onsets and frames: Dual-objective piano transcription,” the 19th ISMIR Conference, 2018.Google Scholar
C. Z. A. Huang and A. Vaswani, “Music transformer: Generating music with long-term structure,” International Conference on Learning Representations, pp. 102-110, 2019.Google Scholar
X. Gong, W. Xu, and J. T. Liu, “ANALYSIS AND CORRECTION OF MAPS DATASET,” the 22nd International Conference on Digital Audio Effects, 2019.Google Scholar
V. Emiya, R. Badeau, and B. David, “Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle,” IEEE Transactions on Audio, Speech, and Language Processing, vol 18, no 6, pp. 1643-1654, 2009.Google ScholarCross Ref
Q. Wang, R. Zhou, and Y. Yan, “A two-stage approach to note-level transcription of a specific piano,” Applied Sciences, vol 7, no 9, pp. 901, 2017.Google ScholarCross Ref

Recommendations

Robust Piano Music Transcription Based on Computer Vision
HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence

Recently, automatic music transcription aiming to convert acoustic music signals into symbolic notations attracts increasing attention. In order to deal with the challenges of automatic music transcription based on acoustic information, traditional ...
Read More
claVision: Visual Automatic Piano Music Transcription
NIME 2015: Proceedings of the international conference on New Interfaces for Musical Expression

One important problem in Musical Information Retrieval is Automatic Music Transcription, which is an automated conversion process from played music to a symbolic notation such as sheet music. Since the accuracy of previous audio-based transcription ...
Read More
Automatic Guitar Music Transcription
ACSAT '12: Proceedings of the 2012 International Conference on Advanced Computer Science Applications and Technologies

This paper presents a system that helps in automatically generating guitar tablatures and musical scores based on musical audio data. Information gathered from the audio consists of pitch, onsets and durations, chords, and beat and tempo. Major issues ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing
February 2021
601 pages
ISBN:9781450389310
DOI:10.1145/3457682

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Polyphonic piano transcription
convolutional neural network
energy balance
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 64
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

An EB-enhanced CNN Model for Piano Music Transcription

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

ABSTRACT

References

Cited By

Recommendations

Robust Piano Music Transcription Based on Computer Vision

claVision: Visual Automatic Piano Music Transcription

Automatic Guitar Music Transcription

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

An EB-enhanced CNN Model for Piano Music Transcription

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

ABSTRACT

References

Cited By

Recommendations

Robust Piano Music Transcription Based on Computer Vision

claVision: Visual Automatic Piano Music Transcription

Automatic Guitar Music Transcription

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media