skip to main content
10.1145/3610543.3626164acmconferencesArticle/Chapter ViewAbstractPublication Pagessiggraph-asiaConference Proceedingsconference-collections
research-article

Motion to Dance Music Generation using Latent Diffusion Model

Published:28 November 2023Publication History

Editorial Notes

The authors have requested minor, non-substantive changes to the Version of Record and, in accordance with ACM policies, a Corrected Version of Record was published on December 29, 2023. For reference purposes, the VoR may still be accessed via the Supplemental Material section on this page.

ABSTRACT

The role of music in games and animation, particularly in dance content, is essential for creating immersive and entertaining experiences. Although recent studies have made strides in generating dance music from videos, their practicality in integrating music into games and animation remains limited. In this context, we present a method capable of generating plausible dance music from 3D motion data and genre labels. Our approach leverages a combination of a UNET-based latent diffusion model and a pre-trained VAE model. To evaluate the performance of the proposed model, we employ evaluation metrics to assess various audio properties, including beat alignment, audio quality, motion-music correlation, and genre score. The quantitative results show that our approach outperforms previous methods. Furthermore, we demonstrate that our model can generate audio that seamlessly fits to in-the-wild motion data. This capability enables us to create plausible dance music that complements dynamic movements of characters and enhances overall audiovisual experience in interactive media. Examples from our proposed model are available at this link: https://dmdproject.github.io/.

Skip Supplemental Material Section

Supplemental Material

Video.mp4

mp4

89.9 MB

References

  1. Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Apostol Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. ArXiv abs/1609.08675 (2016).Google ScholarGoogle Scholar
  2. Gunjan Aggarwal and Devi Parikh. 2021. Dance2Music: Automatic Dance-driven Music Generation. arxiv:2107.06252 [cs.SD]Google ScholarGoogle Scholar
  3. Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matthew Sharifi, Neil Zeghidour, and C. Frank. 2023. MusicLM: Generating Music From Text. ArXiv abs/2301.11325 (2023).Google ScholarGoogle Scholar
  4. Pei-Chun Chang, Yong-Sheng Chen, and Chang-Hsing Lee. 2021. MS-SincResNet: Joint learning of 1D and 2D kernels using multi-scale SincNet and ResNet for music genre classification. In Proceedings of the 2021 International Conference on Multimedia Retrieval. 29–36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. 2020. Jukebox: A Generative Model for Music. arXiv preprint arXiv:2005.00341 (2020).Google ScholarGoogle Scholar
  6. Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, and Antonio Torralba. 2020. Foley Music: Learning to Generate Music from Videos. arxiv:2007.10984 [cs.CV]Google ScholarGoogle Scholar
  7. Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, and Kevin W. Wilson. 2016. CNN architectures for large-scale audio classification. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016), 131–135.Google ScholarGoogle Scholar
  8. Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. CoRR abs/2006.11239 (2020). arXiv:2006.11239https://arxiv.org/abs/2006.11239Google ScholarGoogle Scholar
  9. Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, and Zhou Zhao. 2023. Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models. arxiv:2301.12661 [cs.SD]Google ScholarGoogle Scholar
  10. Diederik P Kingma and Max Welling. 2022. Auto-Encoding Variational Bayes. arxiv:1312.6114 [stat.ML]Google ScholarGoogle Scholar
  11. Ruilong Li, Shan Yang, David A. Ross, and Angjoo Kanazawa. 2021. Learn to Dance with AIST++: Music Conditioned 3D Dance Generation. arxiv:2101.08779 [cs.CV]Google ScholarGoogle Scholar
  12. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1–248:16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. arxiv:1711.05101 [cs.LG]Google ScholarGoogle Scholar
  14. Simian Luo, Chuanhao Yan, Chenxu Hu, and Hang Zhao. 2023. Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models. arxiv:2306.17203 [cs.SD]Google ScholarGoogle Scholar
  15. Brian McFee, Colin Raffel, Dawen Liang, Daniel Ellis, Matt Mcvicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and Music Signal Analysis in Python. 18–24. https://doi.org/10.25080/Majora-7b98e3ed-003Google ScholarGoogle ScholarCross RefCross Ref
  16. Nathanaël Perraudin, Peter Balazs, and Peter L. Søndergaard. 2013. A fast Griffin-Lim algorithm. In 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 1–4. https://doi.org/10.1109/WASPAA.2013.6701851Google ScholarGoogle ScholarCross RefCross Ref
  17. Robin Rombach, A. Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 10674–10685.Google ScholarGoogle Scholar
  18. Lian Siyao, Weijiang Yu, Tianpei Gu, Chunze Lin, Quan Wang, Chen Qian, Chen Change Loy, and Ziwei Liu. 2022. Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), 11040–11049.Google ScholarGoogle ScholarCross RefCross Ref
  19. Robert Smith. 2022. Audio Diffusion. https://github.com/teticio/audio-diffusion.Google ScholarGoogle Scholar
  20. Jo-Han Tseng, Rodrigo Castellon, and C. Karen Liu. 2022. EDGE: Editable Dance Generation From Music. ArXiv abs/2211.10658 (2022).Google ScholarGoogle Scholar
  21. Yi Zhou, Connelly Barnes, Lu Jingwan, Yang Jimei, and Li Hao. 2019. On the Continuity of Rotation Representations in Neural Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  22. Ye Zhu, Kyle Olszewski, Yuehua Wu, Panos Achlioptas, Menglei Chai, Yan Yan, and S. Tulyakov. 2022a. Quantized GAN for Complex Music Generation from Dance Videos. ArXiv abs/2204.00604 (2022).Google ScholarGoogle Scholar
  23. Ye Zhu, Yuehua Wu, Kyle Olszewski, Jian Ren, S. Tulyakov, and Yan Yan. 2022b. Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation.Google ScholarGoogle Scholar

Index Terms

  1. Motion to Dance Music Generation using Latent Diffusion Model

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SA '23: SIGGRAPH Asia 2023 Technical Communications
          November 2023
          127 pages
          ISBN:9798400703140
          DOI:10.1145/3610543

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 28 November 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate178of869submissions,20%
        • Article Metrics

          • Downloads (Last 12 months)217
          • Downloads (Last 6 weeks)42

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format