A real-time system for online learning-based visual transcription of piano music

Akbari, Mohammad; Liang, Jie; Cheng, Howard

doi:10.1007/s11042-018-5803-1

A real-time system for online learning-based visual transcription of piano music

Published: 23 February 2018

Volume 77, pages 25513–25535, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mohammad Akbari¹,
Jie Liang¹ &
Howard Cheng²

903 Accesses
11 Citations
Explore all metrics

Abstract

In order to deal with the challenges arising from acoustic-based music information retrieval such as automatic music transcription, the video of the musical performances can be utilized. In this paper, a new real-time learning-based system for visually transcribing piano music using the CNN-SVM classification of the pressed black and white keys is presented. The whole process in this technique is based on visual analysis of the piano keyboard and the pianist’s hands and fingers. A high accuracy with an average F₁ score of 0.95 even under non-ideal camera view, hand coverage, and lighting conditions is achieved. The proposed system has a low latency (about 20 ms) in real-time music transcription. In addition, a new dataset for visual transcription of piano music is created and made available to researchers in this area. Since not all possible varying patterns of the data used in our work are available, an online learning approach is applied to efficiently update the original model based on the new data added to the training dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Piano Accompaniment Generation Method by Drum Rhythm Features with Selectable Difficulty Level

Application of audio visual tuning detection software in piano tuning teaching

Article 18 February 2019

Optical Music Recognition: Recent Advances, Current Challenges, and Future Directions

Notes

All videos can be downloaded from http://www.sfu.ca/akbari/MTA/Dataset.
The videos can be downloaded from http://www.sfu.ca/akbari/MTA/OnlineLearningExperiments.
The test videos and the classification results can be downloaded from http://www.sfu.ca/akbari/MTA.

References

Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: European conference on machine learning, pp 39–50
Akbari M (2014) claVision: bisual automatic piano music transcription. Master’s thesis, University of Lethbridge, Lethbridge
Akbari M, Cheng H (2015) claVision: visual automatic piano music transcription. In: Proceedings of the international conference on new interfaces for musical expression. Louisiana State University, Baton Rouge, pp 313–314
Akbari M, Cheng H (2015) Real-time piano music transcription based on computer vision. IEEE Trans Multimed 17(12):2113–2121
Article Google Scholar
Akbari M, Cheng H (2016), Methods and systems for visual music transcription. http://www.google.com/patents/US9418637. US Patent 9,418,637
Baniya BK, Lee J (2016) Importance of audio feature reduction in automatic music genre classification. Multimed Tools Appl 75(6):3013–3026
Article Google Scholar
Baur D, Seiffert F, Sedlmair M, Boring S (2010) The streams of our lives: visualizing listening histories in context. IEEE Trans Vis Comput Graph 16 (6):1119–1128
Article Google Scholar
Bazzica A, Liem C, Hanjalic A (2016) On detecting the playing/non-playing activity of musicians in symphonic music videos. Comput Vis Image Underst 144:188–204
Article Google Scholar
Ben-Hur A, Weston J (2010) A user’s guide to support vector machines. In: Data mining techniques for the life sciences, pp 223–239
Benetos E, Dixon S (2012) A shift-invariant latent variable model for automatic music transcription. Comput Music J 36(4):81–94
Article Google Scholar
Benetos E, Weyde T (2015) An efficient temporally-constrained probabilistic model for multiple-instrument music transcription. In: International society for music information retrieval, pp 701–707
Benetos E, Dixon S, Giannoulis D, Kirchhoff H, Klapuri A (2013) Automatic music transcription: challenges and future directions. J Intell Inf Syst 41:407–434
Article Google Scholar
Bertin N, Badeau R, Vincent E (2010) Enforcing harmonicity and smoothness in bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans Audio Speech Lang Process 18(3):538–549
Article Google Scholar
Böck S, Schedl M (2012) Polyphonic piano note transcription with recurrent neural networks. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 121–124
Borjian N, Kabir E, Seyedin S, Masehian E (2017) A query-by-example music retrieval system using feature and decision fusion. Multimed Tools Appl 1–25. https://doi.org/10.1007/s11042-017-4524-1
Brown S (2006) The perpetual music track: the phenomenon of constant musical imagery. J Conscious Stud 13(6):43–62
Google Scholar
Cao X, Sun L, Niu J, Wu R, Liu Y, Cai H (2015) Automatic composition of happy melodies based on relations. Multimed Tools Appl 74(21):9097–9115
Article Google Scholar
Cemgil AT, Kappen HJ, Barber D (2006) A generative model for music transcription. IEEE Trans Audio Speech Lang Process 14(2):679–694
Article Google Scholar
Chang C, Lin C (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Google Scholar
Chang H, Huang S, Wu J (2016) A personalized music recommendation system based on electroencephalography feedback. Multimed Tools Appl 1–20. https://doi.org/10.1007/s11042-015-3202-4
Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. ACM Sigkdd Explor Newsl 6(1):1–6
Article Google Scholar
Corrêa DC, Rodrigues FA (2016) A survey on symbolic data-based music genre classification. Exp Syst Appl 60:190–210
Article Google Scholar
Dannenberg RB (1993) Music representation issues, techniques, and systems. Comput Music J 17(3):20–30
Article Google Scholar
Davy M, Godsill SJ (2003) Bayesian harmonic models for musical signal analysis. Bayesian Stat 7:105–124
MathSciNet Google Scholar
de Souza C (2014) Accord.net framework. http://www.accord-framework.net
Downie JS (2003) Music information retrieval. Annu Rev Inf Sci Technol 37 (1):295–340
Article Google Scholar
Duan Z, Pardo B, Zhang C (2010) Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Trans Audio Speech Lang Process 18(8):2121–2133
Article Google Scholar
Farquad M, Bose I (2012) Preprocessing unbalanced data using support vector machine. Decis Support Syst 53(1):226–233
Article Google Scholar
Frisson C, Reboursière L, Chu W, Lähdeoja O, Mills Iii J, Picard C, Shen A, Todoroff T (2009) Multimodal guitar: performance toolbox and study workbench. QPSR of the Numediart Res Progr 2(3):67–84
Google Scholar
Geng M, Wang Y, Tian Y, Huang T (2016) Cnusvm: Hybrid cnn-uneven svm model for imbalanced visual learning. In: IEEE second international conference on multimedia big data (BigMM), pp 186–193
Gorodnichy DO, Yogeswaran A (2006) Detection and tracking of pianist hands and fingers. In: 2006 The 3rd Canadian conference on computer and robot vision, p 63
Gutiérrez S, García S (2016) Landmark-based music recognition system optimisation using genetic algorithms. Multimed Tools Appl 75(24):16905–16922
Article Google Scholar
Karpathy A (2016) Convnetsharp. https://github.com/cbovar/ConvNetSharp
Katarya R, Verma OP Efficient music recommender system using context graph and particle swarm. Multimed Tools Appl 1–15. https://doi.org/10.1007/s11042-017-4447-x
Klapuri AP (2003) Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Trans Speech Audio Process 11(6):804–816
Article Google Scholar
Klapuri A (2004) Automatic music transcription as we know it today. J New Music Res 33(3):269–282
Article Google Scholar
Laskov P, Gehl C, Krüger S, Müller K (2006) Incremental support vector learning: analysis, implementation and applications. J Mach Learn Res 7:1909–1936
MathSciNet MATH Google Scholar
Lin C, Weng R, Keerthi S (2008) Trust region newton method for logistic regression. J Mach Learn Res 9:627–650
MathSciNet MATH Google Scholar
Maler A (2013) Songs for hands: analyzing interactions of sign language and music. Music Theory Online 19(1):1–15
Article Google Scholar
Nanni L, Costa YM, Lumini A, Kim MY, Baek SR (2016) Combining visual and acoustic features for music genre classification. Exp Syst Appl 45:108–117
Article Google Scholar
Oka A, Hashimoto M (2013) Marker-less piano fingering recognition using sequential depth images. In: 2013 19th Korea-Japan joint workshop on frontiers of computer vision, (FCV), pp 1–4
Paleari M, Huet B, Schutz A, Slock D (2008) A multimodal approach to music transcription. In: 15th IEEE international conference on image processing, pp 93–96
Peeling PH, Godsill SJ (2011) Multiple pitch estimation using non-homogeneous poisson processes. IEEE J Sel Top Sign Process 5(6):1133–1143
Article Google Scholar
Pertusa A, Iñesta JM (2005) Polyphonic monotimbral music transcription using dynamic networks. Pattern Recogn Lett 26(12):1809–1818
Article Google Scholar
Poast M (2000) Color music: visual color notation for musical expression. Leonardo 33(3):215–221
Article Google Scholar
Quested G, Boyle R, Ng K (2008) Polyphonic note tracking using multimodal retrieval of musical events. In: Proceedings of the international computer music conference (ICMC)
Reboursière L, Frisson C, Lähdeoja O, Mills Iii J, Picard C, Todoroff T (2010) MultimodalGuitar: a toolbox for augmented guitar performances. In: Proceedings of the New Interfaces for Musical Expression++ (NIME++)
Scarr J, Green R (2010) Retrieval of guitarist fingering information using computer vision. In: 25th international conference of image and vision computing New Zealand (IVCNZ), pp 1–7
Schindler A, Rauber A (2016) Harnessing music-related visual stereotypes for music information retrieval. ACM Trans Intell Syst Technol (TIST) 8(2):20
Google Scholar
Seger RA, Wanderley MM, Koerich AL (2014) Automatic detection of musicians’ ancillary gestures based on video analysis. Exp Syst Appl 41(4):2098–2106
Article Google Scholar
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
Sigtia S, Benetos E, Boulanger-Lewandowski N, Weyde T, d’Avila Garcez AS, Dixon S (2015) A hybrid recurrent neural network for music transcription. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2061–2065
Sigtia S, Benetos E, Dixon S (2016) An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 24(5):927–939
Article Google Scholar
Sotirios M, Georgios P (2008) Computer vision method for pianist’s fingers information retrieval. In: Proceedings of the 10th international conference on information integration and web-based applications & services, iiWAS ’08. ACM, pp 604–608
Stober S, Nürnberger A (2013) Adaptive music retrieval–a state of the art. Multimed Tools Appl 65(3):467–494
Article Google Scholar
Suteparuk P (2014) Detection of piano keys pressed in video. Tech. rep., Department of Computer Science, Stanford University
Tavares TF, Odowichuck G, Zehtabi S, Tzanetakis G (2012) Audio-visual vibraphone transcription in real time. In: 2012 IEEE 14th international workshop on multimedia signal processing (MMSP), pp 215–220
Tavares TF, Barbedo JGA, Attux R, Lopes A (2013) Survey on automatic transcription of music. J Braz Comput Soc 19(4):589–604
Article Google Scholar
Taweewat P, Wutiwiwatchai C (2013) Musical pitch estimation using a supervised single hidden layer feed-forward neural network. Exp Syst Appl 40(2):575–589
Article Google Scholar
Thompson WF, Graham P, Russo FA (2005) Seeing music performance: visual influences on perception and experience. Semiotica 2005(156):203–227
Article Google Scholar
Tsai C, Lin C, Lin C (2014) Incremental and decremental training for linear classification. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 343–352
Yoshii K, Goto M (2012) A nonparametric bayesian multipitch analyzer based on infinite latent harmonic allocation. IEEE Trans Audio Speech Lang Process 20 (3):717–730
Article Google Scholar
Zhang B, Wang Y (2009) Automatic music transcription using audio-visual fusion for violin practice in home environment. Tech. Rep. TRA7/09, School of Computing, National University of Singapore
Zhang B, Zhu J, Wang Y, Leow WK (2007) Visual analysis of fingering for pedagogical violin transcription. In: Proceedings of the 15th international conference on multimedia, pp 521–524

Download references

Acknowledgements

This work was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada under grant RGPIN312262, STPGP447223, RGPAS478109, and RGPIN288300.

Author information

Authors and Affiliations

School of Engineering Science, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada
Mohammad Akbari & Jie Liang
Department of Mathematics and Computer Science, University of Lethbridge, 4401 University Drive, Lethbridge, AB, Canada
Howard Cheng

Authors

Mohammad Akbari
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liang
View author publications
You can also search for this author in PubMed Google Scholar
Howard Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Akbari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Akbari, M., Liang, J. & Cheng, H. A real-time system for online learning-based visual transcription of piano music. Multimed Tools Appl 77, 25513–25535 (2018). https://doi.org/10.1007/s11042-018-5803-1

Download citation

Received: 10 March 2017
Revised: 02 February 2018
Accepted: 14 February 2018
Published: 23 February 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11042-018-5803-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A real-time system for online learning-based visual transcription of piano music

Abstract

Access this article

Similar content being viewed by others

Automatic Piano Accompaniment Generation Method by Drum Rhythm Features with Selectable Difficulty Level

Application of audio visual tuning detection software in piano tuning teaching

Optical Music Recognition: Recent Advances, Current Challenges, and Future Directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A real-time system for online learning-based visual transcription of piano music

Abstract

Access this article

Similar content being viewed by others

Automatic Piano Accompaniment Generation Method by Drum Rhythm Features with Selectable Difficulty Level

Application of audio visual tuning detection software in piano tuning teaching

Optical Music Recognition: Recent Advances, Current Challenges, and Future Directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation