skip to main content
research-article

Audiovisual Tool for Solfège Assessment

Published: 16 December 2016 Publication History

Abstract

Solfège is a general technique used in the music learning process that involves the vocal performance of melodies, regarding the time and duration of musical sounds as specified in the music score, properly associated with the meter-mimicking performed by hand movement. This article presents an audiovisual approach for automatic assessment of this relevant musical study practice. The proposed system combines the gesture of meter-mimicking (video information) with the melodic transcription (audio information), where hand movement works as a metronome, controlling the time flow (tempo) of the musical piece. Thus, meter-mimicking is used to align the music score (ground truth) with the sung melody, allowing assessment even in time-dynamic scenarios. Audio analysis is applied to achieve the melodic transcription of the sung notes and the solfège performances are evaluated by a set of Bayesian classifiers that were generated from real evaluations done by experts listeners.

References

[1]
Frédéric Bevilacqua, Bruno Zamborlin, Anthony Sypniewski, Norbert Schnell, Fabrice Guédy, and Nicolas Rasamimanana. 2010. Continuous realtime gesture following and recognition. In Proceedings of the 8th International Conference on Gesture in Embodied Communication and Human-Computer Interaction (GW’09). Springer-Verlag, Berlin, 73--84.
[2]
Alain de Cheveigné and Hideki Kawahara. 2002. YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111, 4 (Apr. 2002), 1917--1930.
[3]
Richard O. Duda, Peter E. Hart, and David G. Stork. 2001. Pattern Classification (2nd ed.). Wiley-Interscience.
[4]
Emilia Gómez and J. Bonada. 2013. Towards computer-assisted flamenco transcription: An experimental comparison of automatic transcription algorithms as applied to a Cappella singing. Comput. Music J. 37 (2013), 73--90.
[5]
Emile Jaques-Dalcroze. 2014. Rhythm, Music and Education. Read Books Ltd.
[6]
Eamonn J. Keogh and Michael J. Pazzani. 2001. Derivative dynamic time warping. In Proceedings of First International Conference on Data Mining (SDM’01).
[7]
Seong-Ju Kim. 1992. The metrically trimmed mean as a robust estimator of location. Ann. Stat. 20, 3 (Sep. 1992), 1534--1547.
[8]
Anssi Klapuri and Manuel Davy. 2006. Signal Processing Methods for Music Transcription. Springer-Verlag, New York, NY.
[9]
Maartje Koning. 2015. A New Illusion in the Perception of Relative Pitch Intervals. Ph.D. Dissertation. Faculty of Humanities of the University of Amsterdam.
[10]
Chang-Hung Lin, Yuan-Shan Lee, Ming-Yen Chen, and Jia-Ching Wang. 2014. Automatic singing evaluating system based on acoustic features and rhythm. In Proceedings of IEEE International Conference on Orange Technologies (ICOT’14). 165--168.
[11]
Pieter-Jan Maes, Denis Amelynck, Micheline Lesaffre, Marc Leman, and D. K. Arvind. 2013. The “conducting master”: An interactive, real-time gesture monitoring system based on spatiotemporal motion templates. Int. J. Hum. Comput. Interact. 29, 7 (2013), 471--487.
[12]
Marcella Mandanici and Sylviane Sapir. 2012. Disembodied voices: A kinect virtual choir conductor. In Proceedings of the 9th Sound and Music Computing Conference, Sound and Music Computing (Eds.). 271--276.
[13]
Matthias Mauch and Simon Dixon. 2014. pYIN: A fundamental frequency estimator using probabilistic threshold distributions. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’14). IEEE, 659--663.
[14]
Emilio Molina, Ana M. Barbancho, Lorenzo J. Tardón, and Isabel Barbancho. 2014. Evaluation framework for automatic singing transcription. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR’14). ISMIR, 567--572.
[15]
Emilio Molina, Isabel Barbancho, Emilia Gómez, Ana M. Barbancho, and Lorenzo J. Tardón. 2013. Fundamental frequency alignment vs. note-based melodic similarity for singing voice assessment. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). 744--748.
[16]
Emilio Molina, Lorenzo J. Tardón, Ana M. Barbancho, and Isabel Barbancho. 2015. SiPTH: Singing transcription based on hysteresis defined on the pitch-time curve. IEEE/ACM Trans. Audio, Speech Lang. Process. 23, 2 (Feb 2015), 252--263.
[17]
Meinard Müller. 2007. Information Retrieval for Music and Motion. Springer-Verlag, Berlin.
[18]
Meinard Müller. 2015. Fundamentals of Music Processing -- . Springer-Verlag, Berlin.
[19]
Eugene Narmour. 1990. The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model. The University of Chicago Press.
[20]
Max Rudolf. 1980. The Grammar of Conducting (2nd ed.). Schirmer Books Inc., New York, NY.
[21]
Matti Ryynänen and Anssi Klapuri. 2004. Modelling of note events for singing transcription. In Proceedings of ISCA—Tutorial and Research Workshop on Statistical and Perceptual Audio. MIT Press.
[22]
Rodrigo Schramm, Helena de Souza Nunes, and Cláudio Rosito Jung. 2015a. Automatic Solfège assessment. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR’15). 183--189.
[23]
Rodrigo Schramm, Cláudio Rosito Jung, and Eduardo Reck Miranda. 2015b. Dynamic time warping for music conducting gestures evaluation. IEEE Trans. Multimed. 17, 2 (Feb 2015), 243--255.
[24]
Keith Swanwick. 1994. Musical Knowledge, Intuition, Analysis and Music Education. Routledge, Londres.
[25]
Robert F. Tate. 1954. Correlation between a discrete and a continuous variable. Point-biserial correlation. Ann. Math. Stat. 25, 3 (1954), 603--607.
[26]
Leng-Wee Toh, W. Chao, and Yi-Shin Chen. 2013. An interactive conducting system using kinect. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). 1--6.
[27]
Timo Viitaniemi, Anssi Klapuri, and Antti Eronen. 2003. A probabilistic model for the transcription of single-voice melodies. In Proceedings of the 2003 Finnish Signal Processing Symposium. 59--63.
[28]
Andrew R. Webb. 2011. Statistical Pattern Recognition (3rd ed.). Wiley, Chichester, UK.
[29]
Yang Zhang and T. F. Edgar. 2008. A robust dynamic time warping algorithm for batch trajectory synchronization. In Proceedings of American Control Conference. 2864--2869.
[30]
Katie Zhukov. 2015. Challenging approaches to assessment of instrumental learning. In Assessment in Music Education: From Policy to Practice, Don Lebler, Gemmal Carey, and Scott D. Harrison (Eds.). Vol. 16. Springer International, Switzerland.

Cited By

View all
  • (2022)DeepSolfège: Recognizing Solfège Hand Signs Using Convolutional Neural NetworksAdvances in Visual Computing10.1007/978-3-030-90439-5_4(39-50)Online publication date: 1-Jan-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 13, Issue 1
February 2017
278 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3012406
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 December 2016
Accepted: 01 October 2016
Revised: 01 July 2016
Received: 01 March 2016
Published in TOMM Volume 13, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Sight-singing
  2. Solfège
  3. automatic assessment
  4. melodic transcription
  5. meter-mimicking
  6. music education

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • CAPES Foundation
  • Ministry of Education of Brazil

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)DeepSolfège: Recognizing Solfège Hand Signs Using Convolutional Neural NetworksAdvances in Visual Computing10.1007/978-3-030-90439-5_4(39-50)Online publication date: 1-Jan-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media