Abstract
The modeling of artificial, human-level creativity is becoming more and more achievable. In recent years, neural networks have been successfully applied to different tasks such as image and music generation, demonstrating their great potential in realizing computational creativity. The fuzzy definition of creativity combined with varying goals of the evaluated generative systems, however, makes subjective evaluation seem to be the only viable methodology of choice. We review the evaluation of generative music systems and discuss the inherent challenges of their evaluation. Although subjective evaluation should always be the ultimate choice for the evaluation of creative results, researchers unfamiliar with rigorous subjective experiment design and without the necessary resources for the execution of a large-scale experiment face challenges in terms of reliability, validity, and replicability of the results. In numerous studies, this leads to the report of insignificant and possibly irrelevant results and the lack of comparability with similar and previous generative systems. Therefore, we propose a set of simple musically informed objective metrics enabling an objective and reproducible way of evaluating and comparing the output of music generative systems. We demonstrate the usefulness of the proposed metrics with several experiments on real-world data.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The deviation here refers to an element-wise standard deviation, which retains the dimension of each feature.
References
Agarwala N, Inoue Y, Sly A (2017) Music composition using recurrent neural networks. Stanford University, Technical Report in CS224
Ariza C (2009) The interrogator as critic: the turing test and the evaluation of generative music systems. Comput Music J 33(2):48–70
Asmus EP (1999) Music assessment concepts: a discussion of assessment concepts and models for student assessment introduces this special focus issue. Music Educ J 86(2):19–24
Babbitt M (1960) Twelve-tone invariants as compositional determinants. Music Q 46(2):246–259
Balaban M, Ebcioğlu K, Laske O (eds) (1992) Understanding music with AI: perspectives on music cognition. MIT Press, Cambridge
Bech S, Zacharov N (2007) Perceptual audio evaluation—theory, method and application. Wiley, London
Boot P, Volk A, de Haas WB (2016) Evaluating the role of repeated patterns in folk song classification and compression. J New Music Res 45(3):223–238
Bretan M, Weinberg G, Heck L (2017) A unit selection methodology for music generation using deep neural networks. In: International conference on computational creativity (ICCC). Atlanta, Georgia, USA
Briot JP, Hadjeres G, Pachet F (2019) Deep learning techniques for music generation—a survey. Springer, London
Chordia P, Rae A (2007) Raag recognition using pitch-class and pitch-class dyad distributions. In: International society of music information retrieval (ISMIR), pp 431–436. Vienna, Austria
Chu H, Urtasun R, Fidler S (2016) Song from pi: a musically plausible network for pop music generation. In: International conference on learning representations (ICLR). San Juan, Puerto Rico
Chuan CH, Herremans D (2018) Modeling temporal tonal relations in polyphonic music through deep networks with a novel image-based representation. In: Association for the advancement of artificial intelligence (AAAI). New Orleans, Louisiana, USA
Colton S, Pease A, Ritchie G (2001) The effect of input knowledge on creativity. In: Technical reports of the Navy Center for Applied Research in Artificial Intelligence. Washington, DC, USA
Dong HW, Hsiao WY, Yang LC, Yang YH (2018) Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Association for the advancement of artificial intelligence (AAAI). New Orleans, Louisiana, USA
Gatys LA, Ecker AS, Bethge M (2016) A neural algorithm of artistic style. In: The annual meeting of the vision sciences society. St. Pete Beach, Florida, USA
Geisser S (1993) Predictive inference, vol 55. CRC Press, Boca Raton
Geman D, Geman S, Hallonquist N, Younes L (2015) Visual turing test for computer vision systems. Proc Natl Acad Sci 112(12):3618–3623
Gero JS, Kannengiesser U (2004) The situated function–behaviour–structure framework. Des Stud 25(4):373–391
Gurumurthy S, Sarvadevabhatla RK, Radhakrishnan VB (2017) Deligan: generative adversarial networks for diverse and limited data. In: IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, Hawaii, USA
Guyot WM (1978) Summative and formative evaluation. J Bus Educ 54(3):127–129. https://doi.org/10.1080/00219444.1978.10534702
Hadjeres G, Pachet F (2016) Deepbach: a steerable model for bach chorales generation. In: International conference on machine learning (ICML). New York City, NY, USA
Hale CL, Green SK (2009) Six key principles for music assessment. Music Educ J 95(4):27–31. https://doi.org/10.1177/0027432109334772
Henrik Norbeck’s abc tunes. Last accessed Mar 2018. http://www.norbeck.nu/abc/
Huang CZA, Cooijmans T, Roberts A, Courville A, Eck D (2017) Counterpoint by convolution. In: International society of music information retrieval (ISMIR). Suzhou, China
Huang KC, Jung Q, Lu J (2017) Algorithmic music composition using recurrent neural networking. Stanford University, Technical Report in CS221
Huang X, Li Y, Poursaeed O, Hopcroft J, Belongie S (2016) Stacked generative adversarial networks. In: IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, Nevada, USA
Johnson DD (2017) Generating polyphonic music using tied parallel networks. In: International conference on evolutionary and biologically inspired music and art, pp 128–143. Amsterdam, The Netherlands
Jordanous A (2012) A standardised procedure for evaluating creative systems: computational creativity evaluation based on what it is to be creative. Cognit Comput 4(3):246–279
Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. In: International conference on learning representations (ICLR). Toulon, France
Krumhansl C, Toiviainen P et al (2000) Dynamics of tonality induction: a new method and a new model. In: International conference on music perception and cognition (ICMPC). Keele, UK
Lee K (2006) Automatic chord recognition from audio using enhanced pitch class profile. In: International computer music conference (ICMC). New Orleans, Louisiana, USA
Liang F, Gotham M, Johnson M, Shotton J (2017) Automatic stylistic composition of bach chorales with deep lstm. In: International society of music information retrieval (ISMIR). Suzhou, China
Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22(140):5–55
Marsden A (2013) Music, intelligence and artificiality. In: Readings in music and artificial intelligence, pp 25–38. Routledge
Meredith D (2016) Computational music analysis. Springer, Berlin
Meyer LB (2008) Emotion and meaning in music. University of Chicago Press, Chicago
Mogren O (2016) C-rnn-gan: continuous recurrent neural networks with adversarial training. In: Advances in neural information processing systems, constructive machine learning workshop (NIPS CML). Barcelona, Spain
Moog RA (1986) Midi: musical instrument digital interface. J Audio Eng Soc 34(5):394–404
Mroueh Y, Sercu T (2017) Fisher gan. In: Advances in neural information processing systems (NIPS). Long Beach, CA, USA
O’Brien C, Lerch A (2015) Genre-specific key profiles. In: International computer music conference (ICMC). Denton, Texas, USA
Pati KA, Gururani S, Lerch A (2018) Assessment of student music performances using deep neural networks. Appl Sci 8(4):507. https://doi.org/10.3390/app8040507. http://www.mdpi.com/2076-3417/8/4/507
Pearce M, Meredith D, Wiggins G (2002) Motivations and methodologies for automation of the compositional process. Music Sci 6(2):119–147
Pearce MT, Wiggins GA (2007) Evaluating cognitive models of musical composition. In: International joint workshop on computational creativity, pp 73–80. London, UK
Pease A, Colton S (2011) On impact and evaluation in computational creativity: a discussion of the turing test and an alternative proposal. In: Proceedings of the AISB symposium on AI and philosophy, p 39. York, United Kingdom
Pease T, Mattingly R (2003) Jazz composition: theory and practice. Berklee Press, Boston
Ritchie G (2007) Some empirical criteria for attributing creativity to a computer program. Minds Mach 17(1):67–99
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems (NIPS). Barcelona, Spain
Scott DW (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, Hoboken
Shin A, Crestel L, Kato H, Saito K, Ohnishi K, Yamaguchi M, Nakawaki M, Ushiku Y, Harada T (2017) Melody generation for pop music via word representation of musical properties. arXivpreprint arXiv:1710.11549
Silverman BW (1986) Density estimation for statistics and data analysis, vol 26. CRC Press, Boca Raton
Simon I, Morris D, Basu S (2008) Mysong: automatic accompaniment generation for vocal melodies. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 725–734. Florence, Italy
Sturm BL, Ben-Tal O (2017) Taking the models back to music practice: evaluating generative transcription models built using deep learning. J Creat Music Syst. https://doi.org/10.5920/JCMS.2017.09
Temperley D, Marvin EW (2008) Pitch-class distribution and the identification of key. Music Percept Interdiscip J 25(3):193–212
Theis L, van den Oord A, Bethge M (2016) A note on the evaluation of generative models. In: International conference on learning representations (ICLR). Caribe Hilton, San Juan, Puerto Rico. arXiv:1511.01844
Turing AM (1950) Computing machinery and intelligence. Mind 59(236):433–460
Turlach BA et al (1993) Bandwidth selection in kernel density estimation: a review. Université catholique de Louvain Louvain-la-Neuve
Verbeurgt K, Dinolfo M, Fayer M (2004) Extracting patterns in music for composition via Markov chains. In: International conference on industrial, engineering and other applications of applied intelligent systems, pp 1123–1132. Springer, Ottawa, ON, Canada (2004)
Waite E, Eck D, Roberts A, Abolafia D (2016) Project magenta: generating long-term structure in songs and stories. https://magenta.tensorflow.org/blog/2016/07/15/lookback-rnn-attention-rnn/
Wu CW, Gururani S, Laguna C, Pati A, Vidwans A, Lerch A (2016) Towards the objective assessment of music performances. In: International conference on music perception and cognition (ICMPC). Hyderabad, AP, India
Yang LC, Chou SY, Yang YH (2017) Midinet: a convolutional generative adversarial network for symbolic-domain music generation. In: International society of music information retrieval (ISMIR). Suzhou, China
Zbikowski LM (2002) Conceptualizing music: cognitive structure, theory, and analysis. Oxford University Press, Oxford
Zhang W, Wang J (2016) Design theory and methodology for enterprise systems. Enterp Inf Syst 10(3):245–248. https://doi.org/10.1080/17517575.2015.1080860
Zhang WJ, Yang G, Lin Y, Ji C, Gupta MM (2018) On definition of deep learning. In: World automation congress (WAC). Stevenson, Washington, USA
Zhou Z, Cai H, Rong S, Song Y, Ren K, Zhang W, Wang J, Yu Y (2018) Activation maximization generative adversarial nets. In: International conference on learning representations (ICLR). Vancouver, Canada
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Yang, LC., Lerch, A. On the evaluation of generative models in music. Neural Comput & Applic 32, 4773–4784 (2020). https://doi.org/10.1007/s00521-018-3849-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3849-7