ABSTRACT
Recently, automatic music transcription aiming to convert acoustic music signals into symbolic notations attracts increasing attention. In order to deal with the challenges of automatic music transcription based on acoustic information, traditional approaches adopt hough transform to locate the piano keyboard and a weak classifier to detect pressed keys. However, the hough transform and weak classifier show insufficient detection ability in the changing environment. In this paper, we devise a robust visual piano transcription system using semantic segmentation for the piano keyboard detection and a CNN-based classifier to detect the pressed keys, which improves the frame-level transcription results. In addition, in view of lacking public datasets in the field of visual piano transcription, we further propose a new dataset for visual piano transcription. To demonstrate the effectiveness of our system, we evaluate it on both the published dataset and we proposed, and our system significantly outperforms the state-of-the-art approaches.
- Benetos, Emmanouil, et al. "Automatic music transcription: challenges and future directions." Journal of Intelligent Information Systems 41.3 (2013): 407--434.Google ScholarDigital Library
- Cheng, Tian, et al. "An attack/decay model for piano transcription." ISMIR, 2016.Google Scholar
- Suteparuk, Potcharapol. "Detection of piano keys pressed in video." Dept. of Comput. Sci., Stanford Univ., Stanford, CA, USA, Tech. Rep (2014).Google Scholar
- Akbari, Mohammad, and Howard Cheng. "Clavision: visual automatic piano music transcription." NIME. 2015.Google Scholar
- Akbari, Mohammad, and Howard Cheng. "Real-time piano music transcription based on computer vision." IEEE Transactions on Multimedia 17.12 (2015): 2113--2121.Google ScholarDigital Library
- Akbari, Mohammad, Jie Liang, and Howard Cheng. "A real-time system for online learning-based visual transcription of piano music." Multimedia Tools and Applications 77.19 (2018): 25513--25535.Google ScholarDigital Library
- Moorer, James A. "On the transcription of musical sound by computer." Computer Music Journal (1977): 32--38.Google Scholar
- Goodwin, Adam, and Richard Green. "Key detection for a virtual piano teacher." 2013 28th International Conference on Image and Vision Computing New Zealand (IVCNZ 2013). IEEE, 2013.Google Scholar
- Vishal, Boga, and K. Deepak Lawrence. "Paper piano---Shadow analysis based touch interaction." 2017 2nd International Conference on Man and Machine Interfacing (MAMI). IEEE, 2017.Google Scholar
- Kang S, Kim J, Yoon S. Virtual Piano using Computer Vision[J]. arXiv preprint arXiv:1910.12539, 2019.Google Scholar
- Frisson, Christian, et al. "Multimodal guitar: Performance toolbox and study workbench." QPSR of the numediart research program. Ed. by Thierry Dutoit and Benoît Macq 2 (2009): 3.Google Scholar
- Paleari, Marco, et al. "A multimodal approach to music transcription." 2008 15th IEEE International Conference on Image Processing. IEEE, 2008.Google Scholar
- Wan, Yu Long, et al. "Automatic transcription of piano music using audio-vision fusion." Applied Mechanics and Materials. Vol. 333. Trans Tech Publications, 2013.Google Scholar
- Wan, Yulong, et al. "Automatic Piano Music Transcription Using Audio-Visual Features." Chinese Journal of Electronics24.3 (2015): 596--603.Google ScholarCross Ref
- Lee, Jangwon, et al. "Observing Pianist Accuracy and Form with Computer Vision." 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019.Google Scholar
- Zhao, Hengshuang, et al. "Pyramid scene parsing network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.Google Scholar
- Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.Google Scholar
- Wada K. labelme: Image Polygonal Annotation with Python[J]. 2016.Google Scholar
Index Terms
- Robust Piano Music Transcription Based on Computer Vision
Recommendations
Automatic transcription of flamenco singing from polyphonic music recordings
Automatic note-level transcription is considered one of the most challenging tasks in music information retrieval. The specific case of flamenco singing transcription poses a particular challenge due to its complex melodic progressions, intonation ...
Real-Time Piano Music Transcription Based on Computer Vision
One important problem in musical information retrieval is automatic music transcription, which is an automated conversion process from played music to a symbolic notation such as MIDI file. Since the accuracy of previous audio-based transcription systems ...
Automatic music transcription based on non-negative matrix factorization
ICS'10: Proceedings of the 14th WSEAS international conference on Systems: part of the 14th WSEAS CSCC multiconference - Volume IIn this paper, we present a method for the automatic transcription of polyphonic piano music. The input to this method consists in piano music recordings stored in WAV files, while the pitch of all the notes in the corresponding score forms the output. ...
Comments