skip to main content
10.1145/3638530.3664109acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

A User-Guided Generation Framework for Personalized Music Synthesis Using Interactive Evolutionary Computation

Published: 01 August 2024 Publication History

Abstract

The development of generative artificial intelligence (AI) has demonstrated notable advancements in the domain of music synthesis. However, a perceived lack of creativity in the generated content has drawn significant attention from the public. To address this, this paper introduces a novel approach to personalized music synthesis, incorporating a human-in-the-loop generation. This method leverages the dual strengths of interactive evolutionary computation, known for its capturing user preferences, and generative adversarial network, renowned for its capacity to autonomously produce high-quality music. The primary objective of this integration is to augment the credibility and diversity of generative AI in music synthesis, fostering computational artistic creativity in humans. Furthermore, a user-friendly interactive music player has been designed to facilitate users in the music synthesis process. The proposed method exemplifies a paradigm wherein users manipulate latent space through human-machine interaction, underscoring the pivotal role of humans in the synthesis of diverse and creative music.

References

[1]
Nuha Aldausari, Arcot Sowmya, Nadine Marcus, and Gelareh Mohammadi. 2022. Video generative adversarial networks: a review. ACM Computing Surveys (CSUR) 55, 2 (2022), 1--25.
[2]
Matthew Baas and Herman Kamper. 2023. GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models. In 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, Doha, Qatar, 906--911.
[3]
David Baidoo-Anu and Leticia Owusu Ansah. 2023. Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI 7, 1 (2023), 52--62.
[4]
Philip Bontrager, Wending Lin, Julian Togelius, and Sebastian Risi. 2018. Deep interactive evolution. In Computational Intelligence in Music, Sound, Art and Design: 7th International Conference(EvoMUSART). Springer, California, USA, 267--282.
[5]
Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, and Shlomo Dubnov. 2023. MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies. arXiv:2308.01546 [cs.SD]
[6]
Sung-Bae Cho and Joo-Young Lee. 2002. A human-oriented image retrieval system using interactive genetic algorithm. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 32, 3 (2002), 452--458.
[7]
Grant Cooper. 2023. Examining science education in chatgpt: An exploratory study of generative artificial intelligence. Journal of Science Education and Technology 32, 3 (2023), 444--452.
[8]
Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. 2020. Jukebox: A Generative Model for Music. arXiv:2005.00341 [eess.AS]
[9]
Chris Donahue, Julian McAuley, and Miller Puckette. 2019. Adversarial Audio Synthesis. arXiv:1802.04208 [cs.SD]
[10]
Malachy Eaton. 2013. An approach to the synthesis of humanoid robot dance using non-interactive evolutionary techniques. In 2013 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, Manchester, UK, 3305--3309.
[11]
Ziv Epstein, Aaron Hertzmann, Investigators of Human Creativity, Memo Akten, Hany Farid, Jessica Fjeld, Morgan R Frank, Matthew Groh, Laura Herman, Neil Leach, et al. 2023. Art and the science of generative AI. Science 380, 6650 (2023), 1110--1111.
[12]
Gunther Eysenbach et al. 2023. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Medical Education 9, 1 (2023), e46885.
[13]
Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, and Jesse Engel. 2022. Multi-instrument Music Synthesis with Spectrogram Diffusion. arXiv:2206.05408 [cs.SD]
[14]
Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck. 2019. Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset. arXiv:1810.12247 [cs.SD]
[15]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, HI, 1125--1134.
[16]
Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron Oord, Sander Dieleman, and Koray Kavukcuoglu. 2018. Efficient neural audio synthesis. In International Conference on Machine Learning. PMLR, Stockholm, Sweden, 2410--2419.
[17]
Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, and Matthew Sharifi. 2019. Fréchet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms. arXiv:1812.08466 [eess.AS]
[18]
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. 2021. DiffWave: A Versatile Diffusion Model for Audio Synthesis. arXiv:2009.09761 [eess.AS]
[19]
Kundan Kumar, Rithesh Kumar, Thibault De Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre De Brebisson, Yoshua Bengio, and Aaron C Courville. 2019. Melgan: Generative adversarial networks for conditional waveform synthesis. Advances in neural information processing systems 32 (2019), 1--12.
[20]
Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2019. Improved precision and recall metric for assessing generative models. Advances in Neural Information Processing Systems 32 (2019), 1--10.
[21]
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2016. Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning. ACM, New York, 1558--1566.
[22]
Jae Hyun Lim and Jong Chul Ye. 2017. Geometric GAN. arXiv:1705.02894 [stat.ML]
[23]
Jinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, and Zhou Zhao. 2022. Diffsinger: Singing voice synthesis via shallow diffusion mechanism. In Proceedings of the AAAI conference on artificial intelligence. AAAI, Virtual Event, 11020--11028.
[24]
Jen-Yu Liu, Yu-Hua Chen, Yin-Cheng Yeh, and Yi-Hsuan Yang. 2020. Unconditional audio generation with generative adversarial networks and cycle regularization. arXiv:2005.08526 [cs.SD]
[25]
Roisin Loughran and Michael O'Neill. 2020. Evolutionary music: applying evolutionary computation to the art of creating music. Genetic Programming and Evolvable Machines 21 (2020), 55--85.
[26]
Shihan Lu, Mianlun Zheng, Matthew C Fontaine, Stefanos Nikolaidis, and Heather Culbertson. 2022. Preference-driven texture modeling through interactive generation and search. IEEE Transactions on Haptics 15, 3 (2022), 508--520.
[27]
Janos Madar, Janos Abonyi, and Ferenc Szeifert. 2005. Interactive particle swarm optimization. In 5th International Conference on Intelligent Systems Design and Applications (ISDA'05). IEEE, Wroclaw, Poland, 314--319.
[28]
Dwilya Makiwan, Kaori Yoshida, and Mario Koppen. 2017. Interactive evolutionary computation of color palette design enhanced by impression words. In 2017 International Conference on Platform Technology and Service (PlatCon). IEEE, Busan, South Korea, 1--6.
[29]
Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, and Yoshua Bengio. 2017. SampleRNN: An unconditional end-to-end neural audio generation model. arXiv:1612.07837 [cs.SD]
[30]
Michael Muller, Lydia B Chilton, Anna Kantosalo, Q Vera Liao, Mary Lou Maher, Charles Patrick Martin, and Greg Walsh. 2023. GenAICHI 2023: Generative AI and HCI at CHI 2023. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg, Germany, 1--7.
[31]
Muhammad Ferjad Naeem, Seong Joon Oh, Youngjung Uh, Yunjey Choi, and Jaejun Yoo. 2020. Reliable fidelity and diversity metrics for generative models. In International Conference on Machine Learning. PMLR, Virtual Event, 7176--7185.
[32]
Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, and Christian Theobalt. 2023. Drag your gan: Interactive point-based manipulation on the generative image manifold. In ACM SIGGRAPH 2023 Conference Proceedings. ACM, New York, 1--11.
[33]
Wenjun Pang and KC Hui. 2010. Interactive evolutionary 3d fractal modeling. The Visual Computer 26 (2010), 1467--1483.
[34]
Yan Pei. 2023. A comprehensive and brief survey on interactive evolutionary computation in sound and music composition for algorithmic auditory and acoustic design with human-in-the-loop. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation. ACM, Lisbon, Portugal, 1990--1996.
[35]
Yan Pei and Hideyuki Takagi. 2013. Triple and quadruple comparison-based interactive differential evolution and differential evolution. In Proceedings of the twelfth workshop on Foundations of genetic algorithms XII. Springer, New York, 173--182.
[36]
Hua Peng, Huosheng Hu, Fei Chao, Changle Zhou, and Jing Li. 2016. Autonomous robotic choreography creation via semi-interactive evolutionary computation. International Journal of Social Robotics 8 (2016), 649--661.
[37]
Juan C Quiroz, Sushil J Louis, Anil Shankar, and Sergiu M Dascalu. 2007. Interactive genetic algorithms for user interface design. In 2007 IEEE congress on evolutionary computation. IEEE, Singapore, 1366--1373.
[38]
Tim Sainburg, Marvin Thielk, and Timothy Q Gentner. 2020. Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLoS computational biology 16, 10 (2020), e1008228.
[39]
Tim Salimans and Durk P Kingma. 2016. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Advances in neural information processing systems 29 (2016), 1--11.
[40]
Kai Shigemi, Shuji Komeiji, Takumi Mitsuhashi, Yasushi Iimura, Hiroharu Suzuki, Hidenori Sugano, Koichi Shinoda, Kohei Yatabe, and Toshihisa Tanaka. 2023. Synthesizing speech from ecog with a combination of transformer-based encoder and neural vocoder. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Rhodes Island, Greece, 1--5.
[41]
Xiaoyan Sun, Dunwei Gong, Yaochu Jin, and Shanshan Chen. 2013. A new surrogate-assisted interactive genetic algorithm with weighted semisupervised learning. IEEE Transactions on Cybernetics 43, 2 (2013), 685--698.
[42]
Gilbert Syswerda et al. 1989. Uniform crossover in genetic algorithms. In International Computer Games Association, Vol. 3. IOS Press, London, 2--9.
[43]
Koray Tahiroglu, Miranda Kastemaa, and Oskar Koli. 2021. Ganspacesynth: A hybrid generative adversarial network architecture for organising the latent space using a dimensionality reduction for real-time audio synthesis. In Proceedings of the 2nd Joint Conference on AI Music Creativity. AIMC, Virtual Event, 1--11.
[44]
Hideyuki Takagi and Miho Ohsaki. 2007. Interactive evolutionary computation-based hearing aid fitting. IEEE Transactions on Evolutionary Computation 11, 3 (2007), 414--427.
[45]
Hideyuki Takagi and Denis Pallez. 2009. Paired comparison-based interactive differential evolution. In 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC). IEEE, Coimbatore, India, 475--480.
[46]
Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, Yuanhao Yi, Lei He, et al. 2024. Naturalspeech: End-to-end text-to-speech synthesis with human-level quality. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
[47]
Nao Tokui and Hitoshi Iba. 2000. Music composition with interactive evolutionary computation. In Proceedings of the third international conference on generative art. ACM, Milan, Italy, 215--226. https://cir.nii.ac.jp/crid/1570854175183893888
[48]
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv:1609.03499 [cs.SD]
[49]
Yanan Wang, Yan Pei, Shindo Hayato, Qing Liu, and Hai-Peng Ren. 2023. An Interactive Differential Evolution Method with Human Auditory Perception for Sound Composition. IEEE Transactions on Cognitive and Developmental Systems (2023), 1--13.
[50]
Somlak Wannarumon, Erik LJ Bohez, and Kittinan Annanon. 2008. Aesthetic evolutionary algorithm for fractal-based user-centered jewelry design. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 22, 1 (2008), 19--39.
[51]
Lonce Wyse, Purnima Kamath, and Chitralekha Gupta. 2022. Sound Model Factory: An Integrated System Architecture for Generative Audio Modelling. In International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar). Springer, Madrid, Spain, 308--322.
[52]
Nicola Zaltron, Luisa Zurlo, and Sebastian Risi. 2020. Cg-gan: An interactive evolutionary gan-based approach for facial composite generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. AAAI, New York, NY, USA, 2544--2551.
[53]
Ning Zhang, Ruru Pan, Lei Wang, Yang Wu, and Weidong Gao. 2020. Pattern design and optimization of yarn-dyed plaid fabric using modified interactive genetic algorithm. The Journal of The Textile Institute 111, 11 (2020), 1652--1661.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '24 Companion: Proceedings of the Genetic and Evolutionary Computation Conference Companion
July 2024
2187 pages
ISBN:9798400704956
DOI:10.1145/3638530
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2024

Check for updates

Author Tags

  1. human-AI interaction
  2. co-creativity
  3. music synthesis
  4. interactive evolutionary computation
  5. generative adversarial network
  6. human-in-the-loop

Qualifiers

  • Research-article

Conference

GECCO '24 Companion
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 89
    Total Downloads
  • Downloads (Last 12 months)89
  • Downloads (Last 6 weeks)15
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media