skip to main content
10.1145/3394171.3413669acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Dynamic Future Net: Diversified Human Motion Generation

Published: 12 October 2020 Publication History

Abstract

Human motion modelling is crucial in many areas such as computergraphics, vision and virtual reality. Acquiring high-quality skele-tal motions is difficult due to the need for specialized equipmentand laborious manual post-posting, which necessitates maximiz-ing the use of existing data to synthesize new data. However, it is a challenge due to the intrinsic motion stochasticity of humanmotion dynamics, manifested in the short and long terms. In theshort term, there is strong randomness within a couple frames, e.g.one frame followed by multiple possible frames leading to differentmotion styles; while in the long term, there are non-deterministicaction transitions. In this paper, we present Dynamic Future Net,a new deep learning model where we explicitly focuses on the aforementioned motion stochasticity by constructing a generative model with non-trivial modelling capacity in temporal stochas-ticity. Given limited amounts of data, our model can generate a large number of high-quality motions with arbitrary duration, andvisually-convincing variations in both space and time. We evaluateour model on a wide range of motions and compare it with the state-of-the-art methods. Both qualitative and quantitative results show the superiority of our method, for its robustness, versatility and high-quality.

Supplementary Material

MP4 File (3394171.3413669.mp4)
In our work, we propose a method to generate inifinite long random human motion that transist from different actions.The main idea is that instead of predicting the next pose we first directly predict the future motion distribution and then the next pose distribution, from which we sample the human pose. We explain different component in our method and show some example. We also show the diversity of the motion generated by our method by sample 4096 different trajectory given the same initialization.

References

[1]
Chaitanya Ahuja and Louis-Philippe Morency. 2019. Language2Pose: Natural Language Grounded Pose Forecasting. In 2019 International Conference on 3D Vision (3DV). IEEE, 719--728.
[2]
Federico Bartoli, Giuseppe Lisanti, Lamberto Ballan, and Alberto Del Bimbo. 2018. Context-aware trajectory prediction. In 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 1941--1946.
[3]
Justin Bayer and Christian Osendorfer. 2014. Learning Stochastic Recurrent Networks. stat, Vol. 1050 (2014), 27.
[4]
Judith Butepage, Michael J Black, Danica Kragic, and Hedvig Kjellstrom. 2017 Deep representation learning for human motion prediction and classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6158--6166.
[5]
Nutan Chen, Maximilian Karl, and Patrick Van Der Smagt. 2016. Dynamic movement primitives in latent space of time-dependent variational autoencoders. In 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids). IEEE, 629--636.
[6]
Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the Properties of Neural Machine Translation: Encoder--Decoder Approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, Doha, Qatar, 103--111. https://doi.org/10.3115/v1/W14--4012
[7]
Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C Courville, and Yoshua Bengio. 2015. A recurrent latent variable model for sequential data. In Advances in neural information processing systems. 2980--2988.
[8]
Han Du, Erik Herrmann, Janis Sprenger, Noshaba Cheema, Somayeh Hosseini, Klaus Fischer, and Philipp Slusallek. 2019. Stylistic Locomotion Modeling with Conditional Variational Autoencoder. In Eurographics (Short Papers). 9--12.
[9]
Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. 2015. Recurrent network models for human dynamics. In Proceedings of the IEEE International Conference on Computer Vision. 4346--4354.
[10]
Anirudh Goyal Alias Parth Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Rosemary Ke, and Yoshua Bengio. 2017. Z-forcing: Training stochastic recurrent networks. In Advances in neural information processing systems. 6713--6723.
[11]
Alex Graves. 2013. Generating Sequences With Recurrent Neural Networks. CoRR, Vol. abs/1308.0850 (2013). http://dblp.uni-trier.de/db/journals/corr/corr1308.html#Graves13
[12]
Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, and Theophane Weber. 2019. Temporal Difference Variational Auto-Encoder. In International Conference on Learning Representations. https://openreview.net/forum?id=S1x4ghC9tQ
[13]
Xiao Guo and Jongmoo Choi. 2019. Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 2580--2587.
[14]
Ikhsanul Habibie, Daniel Holden, Jonathan Schwarz, Joe Yearsley, Taku Komura, Jun Saito, Ikuo Kusajima, Xi Zhao, Myung-Geol Choi, Ruizhen Hu, et almbox. 2017. A Recurrent Variational Autoencoder for Human Motion Synthesis. In BMVC.
[15]
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. 2020. Dream to Control: Learning Behaviors by Latent Imagination. In International Conference on Learning Representations. https://openreview.net/forum?id=S1lOTC4tDS
[16]
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. 2019. Learning Latent Dynamics for Planning from Pixels. In International Conference on Machine Learning. 2555--2565.
[17]
Félix G Harvey and Christopher Pal. 2018. Recurrent transition networks for character locomotion. In SIGGRAPH Asia 2018 Technical Briefs. 1--4.
[18]
Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow. 2019. Moglow: Probabilistic and controllable motion synthesis using normalising flows. arXiv preprint arXiv:1905.06598 (2019).
[19]
Alejandro Hernandez, Jurgen Gall, and Francesc Moreno-Noguer. 2019. Human motion prediction via spatio-temporal inpainting. In Proceedings of the IEEE International Conference on Computer Vision. 7134--7143.
[20]
Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned neural networks for character control. ACM Transactions on Graphics (TOG), Vol. 36, 4 (2017), 1--13.
[21]
Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics (TOG), Vol. 35, 4 (2016), 1--11.
[22]
Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning motion manifolds with convolutional autoencoders. In SIGGRAPH Asia 2015 Technical Briefs. 1--4.
[23]
Wei-Ning Hsu, Yu Zhang, and James Glass. 2017. Unsupervised learning of disentangled and interpretable representations from sequential data. In Advances in neural information processing systems. 1878--1889.
[24]
Junfeng Hu, Zhencheng Fan, Jun Liao, and Li Liu. 2019. Predicting Long-Term Skeletal Motions by a Spatio-Temporal Hierarchical Recurrent Network. arXiv preprint arXiv:1911.02404 (2019).
[25]
Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck. 2019. Music Transformer. In International Conference on Learning Representations. https://openreview.net/forum?id=rJe4ShAcF7
[26]
Maximilian Karl, Maximilian Soelch, Justin Bayer, and Patrick van der Smagt. 2017. DEEP VARIATIONAL BAYES FILTERS: UNSUPERVISED LEARNING OF STATE SPACE MODELS FROM RAW DATA. stat, Vol. 1050 (2017), 3.
[27]
Diederik P Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. stat, Vol. 1050 (2014), 1.
[28]
Jogendra Nath Kundu, Maharshi Gor, Phani Krishna Uppala, and Venkatesh Babu Radhakrishnan. 2019. Unsupervised feature learning of human actions as trajectories in pose embedding manifold. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1459--1467.
[29]
Chen Li and Gim Hee Lee. 2019. Generating multiple hypotheses for 3d human pose estimation with mixture density network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9887--9895.
[30]
Chen Li, Zhen Zhang, Wee Sun Lee, and Gim Hee Lee. 2018. Convolutional sequence to sequence model for human dynamics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5226--5234.
[31]
Wei Mao, Miaomiao Liu, Mathieu Salzmann, and Hongdong Li. 2019. Learning trajectory dependencies for human motion prediction. In Proceedings of the IEEE International Conference on Computer Vision. 9489--9497.
[32]
Julieta Martinez, Michael J Black, and Javier Romero. 2017. On human motion prediction using recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2891--2900.
[33]
Jianyuan Min and Jinxiang Chai. 2012. Motion Graphs: A Compact Generative Model for Semantic Motion Analysis and Synthesis. ACM Trans. Graph., Vol. 31, 6, Article 153 (Nov. 2012), 12 pages. https://doi.org/10.1145/2366145.2366172
[34]
Dario Pavllo, David Grangier, and Michael Auli. 2018. QuaterNet: A Quaternion-based Recurrent Model for Human Motion.
[35]
Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. 2018. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics (TOG), Vol. 37, 4 (2018), 1--14.
[36]
A. Safonova, Jessica Hodgins, and Nancy Pollard. 2004. Synthesizing physically realistic human motion in low dimensional. ACM Transactions on Graphics - TOG (01 2004).
[37]
Dmitriy Serdyuk, Nan Rosemary Ke, Alessandro Sordoni, Adam Trischler, Chris Pal, and Yoshua Bengio. 2018. Twin Networks: Matching the Future for Sequence Generation. In International Conference on Learning Representations. https://openreview.net/forum?id=BydLzGb0Z
[38]
Xiangbo Shu, Liyan Zhang, Guo-Jun Qi, Wei Liu, and Jinhui Tang. 2019. Spatiotemporal Co-attention Recurrent Neural Networks for Human-Skeleton Motion Prediction.
[39]
Sebastian Starke, He Zhang, Taku Komura, and Jun Saito. 2019. Neural state machine for character-scene interactions. ACM Transactions on Graphics (TOG), Vol. 38, 6 (2019), 1--14.
[40]
Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. [n.d.]. WaveNet: A Generative Model for Raw Audio. In 9th ISCA Speech Synthesis Workshop. 125--125.
[41]
He Wang, Edmond SL Ho, Hubert PH Shum, and Zhanxing Zhu. 2019 b. Spatio-temporal Manifold Learning for Human Motions via Long-horizon Modeling. IEEE transactions on visualization and computer graphics (2019).
[42]
Zhiyong Wang, Jinxiang Chai, and Shihong Xia. 2019 a. Combining recurrent neural networks and adversarial training for human motion synthesis and control. IEEE transactions on visualization and computer graphics (2019).
[43]
Zhenyi Wang, Ping Yu, Yang Zhao, Ruiyi Zhang, Yufan Zhou, Junsong Yuan, and Changyou Chen. 2019 c. Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions. CoRR, Vol. abs/1912.10150 (2019). arxiv: 1912.10150 http://arxiv.org/abs/1912.10150
[44]
Yi Zhou, Zimo Li, Shuangjiu Xiao, Chong He, Zeng Huang, and Hao Li. 2018. Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis. In International Conference on Learning Representations. https://openreview.net/forum?id=r11Q2SlRW

Cited By

View all
  • (2024)Decoupling Contact for Fine-Grained Motion Style TransferSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687609(1-11)Online publication date: 3-Dec-2024
  • (2024)Machine Learning Approaches for 3D Motion Synthesis and Musculoskeletal Dynamics Estimation: A SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.330875330:8(5810-5829)Online publication date: Aug-2024
  • (2024)A Two-Part Transformer Network for Controllable Motion SynthesisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.328440230:8(5047-5062)Online publication date: Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. generative models
  2. human motion
  3. neural networks

Qualifiers

  • Research-article

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)1
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Decoupling Contact for Fine-Grained Motion Style TransferSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687609(1-11)Online publication date: 3-Dec-2024
  • (2024)Machine Learning Approaches for 3D Motion Synthesis and Musculoskeletal Dynamics Estimation: A SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.330875330:8(5810-5829)Online publication date: Aug-2024
  • (2024)A Two-Part Transformer Network for Controllable Motion SynthesisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.328440230:8(5047-5062)Online publication date: Aug-2024
  • (2024)Two-Person Interaction Augmentation with Skeleton Priors2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00196(1900-1910)Online publication date: 17-Jun-2024
  • (2024)Rethinking Human Motion Prediction with Symplectic Integral2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00208(2134-2143)Online publication date: 16-Jun-2024
  • (2024)Human Motion Prediction Under Unexpected Perturbation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00149(1501-1511)Online publication date: 16-Jun-2024
  • (2023)Nonverbal social behavior generation for social robots using end-to-end learningThe International Journal of Robotics Research10.1177/0278364923120797443:5(716-728)Online publication date: 2-Nov-2023
  • (2023)Objective Evaluation Metric for Motion Generative Models: Validating Fréchet Motion Distance on Foot Skating and Over-smoothing Artifacts.Proceedings of the 16th ACM SIGGRAPH Conference on Motion, Interaction and Games10.1145/3623264.3624443(1-11)Online publication date: 15-Nov-2023
  • (2023)Controllable Group Choreography Using Contrastive DiffusionACM Transactions on Graphics10.1145/361835642:6(1-14)Online publication date: 5-Dec-2023
  • (2023)Neural Motion GraphSIGGRAPH Asia 2023 Conference Papers10.1145/3610548.3618181(1-11)Online publication date: 10-Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media