research-article

Robust Piano Music Transcription Based on Computer Vision

Authors:
Jun Li

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Wei Xu

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Yong Cao

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Wei Liu

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Wenqing Cheng

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China
View Profile

HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial IntelligenceJuly 2020Pages 92–97https://doi.org/10.1145/3409501.3409540

Published:25 August 2020Publication History

HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence

Pages 92–97

ABSTRACT

Recently, automatic music transcription aiming to convert acoustic music signals into symbolic notations attracts increasing attention. In order to deal with the challenges of automatic music transcription based on acoustic information, traditional approaches adopt hough transform to locate the piano keyboard and a weak classifier to detect pressed keys. However, the hough transform and weak classifier show insufficient detection ability in the changing environment. In this paper, we devise a robust visual piano transcription system using semantic segmentation for the piano keyboard detection and a CNN-based classifier to detect the pressed keys, which improves the frame-level transcription results. In addition, in view of lacking public datasets in the field of visual piano transcription, we further propose a new dataset for visual piano transcription. To demonstrate the effectiveness of our system, we evaluate it on both the published dataset and we proposed, and our system significantly outperforms the state-of-the-art approaches.

References

Benetos, Emmanouil, et al. "Automatic music transcription: challenges and future directions." Journal of Intelligent Information Systems 41.3 (2013): 407--434.Google ScholarDigital Library
Cheng, Tian, et al. "An attack/decay model for piano transcription." ISMIR, 2016.Google Scholar
Suteparuk, Potcharapol. "Detection of piano keys pressed in video." Dept. of Comput. Sci., Stanford Univ., Stanford, CA, USA, Tech. Rep (2014).Google Scholar
Akbari, Mohammad, and Howard Cheng. "Clavision: visual automatic piano music transcription." NIME. 2015.Google Scholar
Akbari, Mohammad, and Howard Cheng. "Real-time piano music transcription based on computer vision." IEEE Transactions on Multimedia 17.12 (2015): 2113--2121.Google ScholarDigital Library
Akbari, Mohammad, Jie Liang, and Howard Cheng. "A real-time system for online learning-based visual transcription of piano music." Multimedia Tools and Applications 77.19 (2018): 25513--25535.Google ScholarDigital Library
Moorer, James A. "On the transcription of musical sound by computer." Computer Music Journal (1977): 32--38.Google Scholar
Goodwin, Adam, and Richard Green. "Key detection for a virtual piano teacher." 2013 28th International Conference on Image and Vision Computing New Zealand (IVCNZ 2013). IEEE, 2013.Google Scholar
Vishal, Boga, and K. Deepak Lawrence. "Paper piano---Shadow analysis based touch interaction." 2017 2nd International Conference on Man and Machine Interfacing (MAMI). IEEE, 2017.Google Scholar
Kang S, Kim J, Yoon S. Virtual Piano using Computer Vision[J]. arXiv preprint arXiv:1910.12539, 2019.Google Scholar
Frisson, Christian, et al. "Multimodal guitar: Performance toolbox and study workbench." QPSR of the numediart research program. Ed. by Thierry Dutoit and Benoît Macq 2 (2009): 3.Google Scholar
Paleari, Marco, et al. "A multimodal approach to music transcription." 2008 15th IEEE International Conference on Image Processing. IEEE, 2008.Google Scholar
Wan, Yu Long, et al. "Automatic transcription of piano music using audio-vision fusion." Applied Mechanics and Materials. Vol. 333. Trans Tech Publications, 2013.Google Scholar
Wan, Yulong, et al. "Automatic Piano Music Transcription Using Audio-Visual Features." Chinese Journal of Electronics24.3 (2015): 596--603.Google ScholarCross Ref
Lee, Jangwon, et al. "Observing Pianist Accuracy and Form with Computer Vision." 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019.Google Scholar
Zhao, Hengshuang, et al. "Pyramid scene parsing network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.Google Scholar
Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.Google Scholar
Wada K. labelme: Image Polygonal Annotation with Python[J]. 2016.Google Scholar

Index Terms

Robust Piano Music Transcription Based on Computer Vision
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Automatic transcription of flamenco singing from polyphonic music recordings

Automatic note-level transcription is considered one of the most challenging tasks in music information retrieval. The specific case of flamenco singing transcription poses a particular challenge due to its complex melodic progressions, intonation ...
Read More
Real-Time Piano Music Transcription Based on Computer Vision
One important problem in musical information retrieval is automatic music transcription, which is an automated conversion process from played music to a symbolic notation such as MIDI file. Since the accuracy of previous audio-based transcription systems ...
Read More
Automatic music transcription based on non-negative matrix factorization
ICS'10: Proceedings of the 14th WSEAS international conference on Systems: part of the 14th WSEAS CSCC multiconference - Volume I

In this paper, we present a method for the automatic transcription of polyphonic piano music. The input to this method consists in piano music recordings stored in WAV files, while the pitch of all the notes in the corresponding score forms the output. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence
July 2020
276 pages
ISBN:9781450375603
DOI:10.1145/3409501

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 August 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Automatic Music Transcription
Computer Vision
Convolutional Neural Network
Semantic Segmentation
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 194
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Robust Piano Music Transcription Based on Computer Vision

HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic transcription of flamenco singing from polyphonic music recordings

Real-Time Piano Music Transcription Based on Computer Vision

Automatic music transcription based on non-negative matrix factorization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Robust Piano Music Transcription Based on Computer Vision

HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic transcription of flamenco singing from polyphonic music recordings

Real-Time Piano Music Transcription Based on Computer Vision

Automatic music transcription based on non-negative matrix factorization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media