A Machine Learning Approach to Study Expressive Performance Deviations in Classical Guitar

Giraldo, Sergio; Nasarre, Alberto; Heroux, Isabelle; Ramirez, Rafael

doi:10.1007/978-3-030-43887-6_48

Sergio Giraldo⁸,
Alberto Nasarre⁸,
Isabelle Heroux⁹ &
…
Rafael Ramirez⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1168))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1460 Accesses

Abstract

Expression is the added value of a musical performance, in which deviations in timing, energy, and articulation are introduced by musicians. Computational models have been proposed aiming at understanding and modelling the expressive content of music performances, to convey concrete expressive intentions. However, little work has been done to investigate the intrinsic variations that musicians might introduce, i.e. when no specific expressive indications are provided. In this contribution, we present a machine learning approach to study the expressive variations that nine different guitarists introduce when performing the same musical piece, for which no performance indications are provided. We study the correlations on the variations in timing and energy. We extract features from the score to obtain predictive models for each musician to later cross-validate among them. Preliminary results indicate that musicians use similar variations when applying these variations, based on correlation measures. Also, similar correlation indexes are found on the cross-validation exercise.

You have full access to this open access chapter, Download conference paper PDF

An Evaluation of Score Descriptors Combined with Non-linear Models of Expressive Dynamics in Music

Modeling, Analyzing, Identifying, and Synthesizing Expressive Popular Music Performances

An evaluation of linear and non-linear models of expressive dynamics in classical piano and symphonic music

Article Open access 09 March 2017

Keywords

1 Introduction

Musicians introduce deviations to the score when performing a musical piece in order to achieve a particular expressive intention. Computational expressive music performance modelling (CEMPM) aims to characterise such deviations using computational techniques (e.g. machine learning techniques). In this context, CEMPM aims to formulate a hypothesis on the expressive devices musicians use when performing (consciously or unconsciously), which can be empirically verified on measured performance data. Empirical models are often obtained from the quantitative analysis of musical performances, based on measurements of timing, dynamics, and articulation (e.g Shaffer et al. 1985; Clarke 1985; Gabrielsson 1987; Palmer 1996a; Repp 1999; Goebl 2001, to name a few). State of the art reviews are presented in Gabrielsson (2003). Computational models have been implemented as rule-based models (Friberg et al. 2000; the KTH model), mathematical models (Todd 1992), structure-level models (Mazzola 2002).

Machine Learning techniques have been used to predict performance variations in timing, articulation and energy (e.g. Widmer 2002), to model concrete expressive intentions (e.g. mood, musical style, performer etc). Most of the literature focus on classical piano music (e.g. Widmer 2002). The piano keys work as ON/OFF switching devices (e.g. MIDI pianos), which simplifies the process of data acquisition, where performance data has to be converted into machine readable data. Some exceptions can be found in in jazz saxophone music where case-based reasoning (Arcos et al. 1998) and inductive logic programming (Ramirez et al. 2011) have been used. Jazz guitar expressive performance modelling has been studied by Giraldo and Ramirez (2016a, b), in which special emphasis is done in melodic ornamentation.

However, few studies have been done in the context of classical guitar, aiming to study the intrinsic variations performers introduce when no specific expressive intentions are provided. In this study, we present a machine learning approach in which CEMPM techniques are applied to study the expressive variations that nine different guitarists introduce when performing the same musical piece, for which no performance indications are provided. We study the correlations on the variations in timing and energy. We extract features from the score to obtain predictive models for each musician to later cross-validate among them.

2 Materials and Methods

For this study we obtained recordings of nine professional guitarists performing the same musical piece. The piece was written for classical guitar, and was composed specifically for this study. Musicians did not knew the piece before hand, and any particular expressive/performance indications were provided (nor written or verbally). The performers were allowed to freely introduce the expressive variations to their taste/criteria. Musicians were also allowed to practice the piece as long as they wanted, until they were satisfied with their interpretation, before recording. The recordings took place at different studios/institutions and were collected by the Department of Music form the Faculty of Arts of the University of Quebec in Montreal (UQAM) Canada.

2.1 Framework

The general framework of the project is depicted in Fig. 1.

Data Processing. The musical score was created in musicXML format from which we obtained machine readable data (MIDI type) information of each note, i.e. its onset (in seconds), duration (in seconds), pitch, and velocity (which refers to volume). We used the score as the dead-pan performance (i.e. robotic or inexpressive performance).

In a second stage we obtained machine readable data of the performance in MIDI type format. This process was performed in a semi-automatic fashion, where we used score-informed Non-negative Matrix Factorisation (NMF). The NMF method decomposes an input spectrogram $X \in ^{KxN}$ with K frequency bins and N frames as:

$$\begin{aligned} X = WH \end{aligned}$$

(1)

where $W \in ^{KxR}$ contains the spectral bases for each of the R pitches and $H \in ^{RxN}$ is the pitch activity matrix across time. The number of R pitches and the W and W matrices initial weights were initialised, informed by the score (for an overview see Clarke 1985). Later, manual correction was performed over the spectrum. Finally, energy information (i.e velocity) was obtained from the RMS value, calculated over the audio wave, in between the obtained note boundaries.

Similarly, we performed automatic beat extraction (Zapata et al. 2014), followed by manual correction to obtain the beat information (in seconds) over the audio signal.

Data-set Creation. Feature extraction from the score was performed by extracting local information of the notes in the score (e.g pitch, duration, etc.) as well as contextual information in which the note occurs (e.g. previous/next interval, metrical strength, harmonic/melodic analysis, etc.). For an overview see Giraldo and Ramirez (2016a, b). A total of 27 descriptors were extracted for each note. Later, deviations in tempo variation, measured in Beats Per Minute (BPM) and Inter Onset Interval (IOI) for each note/performer, were calculated by considering the difference among the theoretical BPM/IOI values in the score and the corresponding values in the performance. Finally, we obtained data-sets for each of the nine performers, as well as for each of the three performance deviations considered (i.e. energy, BPM, and IOI deviations). A total of 27 data-sets were obtained, where each instance is composed by the feature set extracted for each note, and the considered deviations are the value to be predicted.

Machine Learning Modelling. Each of the nine performer data-sets were used as both train and test sets in a all-vs-all cross-validation fashion. This consisted in obtaining a predictive model for each performer (i.e all performer data sets were used as train set) and applying each of them to all the performers (i.e all the performer data sets were used as test set), and finally obtaining a model evaluation for each pair.

Evaluation. A preliminary evaluation consisted in obtaining the correlations among the actual deviations for each note, among all performers. Later, at the Machine Learning stage, the performance of the predictive models was addressed by obtaining the Correlation Coefficient (CC) among the predicted values of the model and the actual values at the test set. The algorithms considered were Support vector regression (SVR, with radial kernel), Regression Trees (RT, with pruning), and Artificial Neural Networks (ANN, fully connected with one hidden layer), for which CC’s on preliminary tests are presented in Table 1. Given that ANN out performed at the prediction of the three considered expressive deviations, in this paper we report on the CC obtained with ANN.

Table 1. Mean Correlation Coefficient (CC) comparison among models.

Full size table

3 Results

Figure 2 show the measured deviations in percentage of BPM of each consecutive note for each performer. It can be noticed the correspondence of peaks and valleys (with different amplitude/deviation degree) among performers. Figure 3 present the scaled graph of the obtained correlation coefficients using ANNs. The numbers on the vertical axis indicate the performer data sets (numbered from 1 to 9) used as train set, whereas the horizontal axis represent the performers data set when used as test set. At each intersection, the colour map represents the correlation coefficient obtained using each pair of train/test data sets. As expected, the diagonal shows higher correlations, representing the performance on the train set (i.e train and test set are the same performer). Higher correlations can be found at the BPM and IOI deviations. Also, a similar pattern can be observed on the CCs obtained among performers. This might indicate that the majority of the performers introduce similar timing variations based on the information provided by the score. This tendency can be observed as well at Fig. 2 (e.g. as seen at the ritardando introduced by most performers at the end of the piece). In contrast, lower level of correlations were obtained on the energy deviation models, which might indicate that the decision on the loudness of a note is less consistent among performers. However, other external factors, such as different recording conditions (e.g. the use of a different guitar, or recordings being done at different studios) might bias this result.

4 Conclusion

In this paper we have presented a machine learning approach based on computational modelling of expressive music performance to study the correlations on the intrinsic expressive deviations that musicians introduce when performing a musical piece. We have obtained recordings of the same musical piece by nine professional guitarists, in which any indications of expressiveness is indicated, and performers have freely choose on the expressive actions performed. We have extracted descriptors from the score, and measure the deviations introduced on the performance by each performer in terms of the BPM, IOI and Energy deviations. We have obtained machine learning models using ANNs, and for each performer, and cross-validated the performance among interpreters’ models based on CC. Preliminary results indicate, that performer take similar actions in terms of timing deviations, whereas less correlation was obtained in energy deviations.

References

Arcos, J.L., de Mantaras, R.L., Serra, X.: SaxEx: a case-based reasoning system for generating expressive performances. J. New Music Res. 27, 194–210 (1998)
Article Google Scholar
Clarke, E.F.: Some aspects of rhythm and expression in performances of Erik Saties Gnossienne No. 5. Music perception. Interdisc. J. 2, 299–328 (1985)
Google Scholar
Friberg, A., Colombo, V., Fryden, L., Sundberg, J.: Generating musical performances with Director Musices. Comput. Music J. 24, 23–29 (2000)
Article Google Scholar
Gabrielsson, A.: Once again: the theme from Mozarts piano sonata in a major (K.331). In: A. Gabrielsson (ed.) Action and Perception in Rhythm and Music, vol. 55, pp. 81–103. Publications issued by the Royal Swedish Academy of Music, Stockholm (1987)
Google Scholar
Gabrielsson, A.: Music performance research at the millenium. Psychol. Music 31, 221–272 (2003)
Article Google Scholar
Giraldo, S., Ramirez, R.: A machine learning approach to ornamentation modeling and synthesis in jazz guitar. J. Math. Music 10(2), 107–126 (2016)
Article MathSciNet Google Scholar
Giraldo, S.I., Ramirez, R.: A machine learning approach to discover rules for expressive performance actions in jazz guitar music. Front. Psychol. 7, 1965 (2016)
Article Google Scholar
Goebl, W.: Melody lead in piano performance: expressive device or artifact? J. Acoust. Soc. Am. 110, 563–572 (2001)
Article Google Scholar
Mazzola, G., Goller, S.: Performance and interpretation. J. New Music Res. 31(221–232), 563–572 (2002)
Google Scholar
Palmer, C.: Anatomy of a performance: sources of musical expression. Music Percept. 13, 433–453 (1996)
Article Google Scholar
Ramirez, R., Maestre, E., Serra, X.: A rule-based evolutionary approach to music performance modeling. IEEE Trans. Evol. Comput. 16(1), 96–107 (2011)
Article Google Scholar
Repp, B.H.: A microcosm of musical expression: II. Quantitative analysis of pianists dynamics in the initial measures of Chopins Etude in E major. J. Acoust. Soc. Am. 105, 1972–1988 (1999)
Google Scholar
Shaffer, L.H., Clarke, E.F., Todd, N.P.M.: Metre and rhythm in piano playing. Cognition 20, 61–77 (1985)
Article Google Scholar
Todd, N.P.M.: The dynamics of dynamics: a model of musical expression. J. Acoust. Soc. Am. 91, 3540–3550 (1992)
Article Google Scholar
Widmer, G.: Machine discoveries: a few simple, robust local expression principles. J. New Music Res. 31, 37–50 (2002)
Article Google Scholar
Zapata, J.R., Davies, M.E., Gmez, E.: Multi-feature beat tracking. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 816–825 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Pompeu Fabra University, Roc Boronat 138, 08018, Barcelona, Spain
Sergio Giraldo, Alberto Nasarre & Rafael Ramirez
Université du Québec à Montréal (UQAM), C.P. 8888, succ. Centre-ville Montréal (Québec) H3C 3P8, Montreal, Canada
Isabelle Heroux

Authors

Sergio Giraldo
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Nasarre
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Heroux
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Ramirez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergio Giraldo .

Editor information

Editors and Affiliations

Institut National des Sciences Appliquées, Rennes, France
Peggy Cellier
Maastricht University, Maastricht, The Netherlands
Kurt Driessens

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Giraldo, S., Nasarre, A., Heroux, I., Ramirez, R. (2020). A Machine Learning Approach to Study Expressive Performance Deviations in Classical Guitar. In: Cellier, P., Driessens, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham. https://doi.org/10.1007/978-3-030-43887-6_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-43887-6_48
Published: 28 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43886-9
Online ISBN: 978-3-030-43887-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)