Abstract
In this paper we show how the task of motion tracking for physically simulated characters can be solved using supervised learning and optimizing a policy directly via back-propagation. To achieve this we make use of a world model trained to approximate a specific subset of the environment's transition function, effectively acting as a differentiable physics simulator through which the policy can be optimized to minimize the tracking error. Compared to popular model-free methods of physically simulated character control which primarily make use of Proximal Policy Optimization (PPO) we find direct optimization of the policy via our approach consistently achieves a higher quality of control in a shorter training time, with a reduced sensitivity to the rate of experience gathering, dataset size, and distribution.
Supplemental Material
- Emmanuel Bengio, Joelle Pineau, and Doina Precup. 2020. Interference and Generalization in Temporal Difference Learning. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 767--777. http://proceedings.mlr.press/v119/bengio20a.htmlGoogle Scholar
- Kevin Bergamin, Simon Clavet, Daniel Holden, and James Richard Forbes. 2019. DReCon: Data-Driven Responsive Control of Physics-Based Characters. ACM Trans. Graph. 38, 6, Article 206 (Nov. 2019), 11 pages. Google ScholarDigital Library
- Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. CoRR abs/1606.01540 (2016). arXiv:1606.01540 http://arxiv.org/abs/1606.01540Google Scholar
- David F. Brown, Adriano Macchietto, KangKang Yin, and Victor Zordan. 2013. Control of Rotational Dynamics for Ground Behaviors. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation (Anaheim, California) (SCA '13). ACM, New York, NY, USA, 55--61. Google ScholarDigital Library
- Nuttapong Chentanez, Matthias Müller, Miles Macklin, Viktor Makoviychuk, and Stefan Jeschke. 2018. Physics-based motion capture imitation with deep reinforcement learning. 1--10. Google ScholarDigital Library
- Silvia Chiappa, Sébastien Racanière, Daan Wierstra, and Shakir Mohamed. 2017. Recurrent Environment Simulators. CoRR abs/1704.02254 (2017). arXiv:1704.02254 http://arxiv.org/abs/1704.02254Google Scholar
- Alexander Clegg, Wenhao Yu, Jie Tan, C. Karen Liu, and Greg Turk. 2018. Learning to Dress: Synthesizing Human Dressing Motion via Deep Reinforcement Learning. ACM Trans. Graph. 37, 6, Article 179 (Dec. 2018), 10 pages. Google ScholarDigital Library
- Stelian Coros, Philippe Beaudoin, and Michiel van de Panne. 2010. Generalized Biped Walking Control. ACM Trans. Graph. 29, 4, Article 130 (July 2010), 9 pages. Google ScholarDigital Library
- Erwin Coumans. 2015. Bullet Physics Simulation. In ACM SIGGRAPH 2015 Courses (Los Angeles, California) (SIGGRAPH '15). ACM, New York, NY, USA, Article 7. Google ScholarDigital Library
- Marco da Silva, Yeuhi Abe, and Jovan Popovic. 2008. Simulation of Human Motion Data using Short-Horizon Model-Predictive Control. Computer Graphics Forum 27 (04 2008), 371 -- 380. Google ScholarCross Ref
- Marc Peter Deisenroth and Carl Edward Rasmussen. 2011. PILCO: A Model-Based and Data-Efficient Approach to Policy Search. In Proceedings of the 28th International Conference on International Conference on Machine Learning (Bellevue, Washington, USA) (ICML'11). Omnipress, Madison, WI, USA, 465--472. Google ScholarDigital Library
- Stefan Depeweg, José Miguel Hernández-Lobato, Finale Doshi-Velez, and Steffen Udluft. 2017. Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks. arXiv:1605.07127 [stat.ML]Google Scholar
- Jared Di Carlo, Patrick M. Wensing, Benjamin Katz, Gerardo Bledt, and Sangbae Kim. 2018. Dynamic Locomotion in the MIT Cheetah 3 Through Convex Model-Predictive Control. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 1--9. Google ScholarDigital Library
- Kai Ding, Libin Liu, Michiel van de Panne, and KangKang Yin. 2015. Learning Reduced-order Feedback Policies for Motion Skills. In Proceedings of the 14th ACM SIGGRAPH / Eurographics Symposium on Computer Animation (Los Angeles, California) (SCA '15). ACM, New York, NY, USA, 83--92. Google ScholarDigital Library
- Alexey Dosovitskiy and Vladlen Koltun. 2016. Learning to Act by Predicting the Future. CoRR abs/1611.01779 (2016). arXiv:1611.01779 http://arxiv.org/abs/1611.01779Google Scholar
- Haegwang Eom, Daseong Han, Joseph S. Shin, and Junyong Noh. 2019. Model Predictive Control with a Visuomotor System for Physics-Based Character Animation. ACM Trans. Graph. 39, 1, Article 3 (Oct. 2019), 11 pages. Google ScholarDigital Library
- Thomas Geijtenbeek and Nicolas Pronost. 2012. Interactive Character Animation Using Simulated Physics: A State-of-the-Art Review. Comput. Graph. Forum 31, 8 (Dec. 2012), 2492--2515. Google ScholarDigital Library
- F. Sebastin Grassia. 1998. Practical Parameterization of Rotations Using the Exponential Map. J. Graph. Tools 3, 3 (March 1998), 29--48. Google ScholarDigital Library
- Radek Grzeszczuk, Demetri Terzopoulos, and Geoffrey Hinton. 1998. NeuroAnimator: Fast Neural Network Emulation and Control of Physics-Based Models. In Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '98). Association for Computing Machinery, New York, NY, USA, 9--20. Google ScholarDigital Library
- David Ha and Jürgen Schmidhuber. 2018. World Models. CoRR abs/1803.10122 (2018). arXiv:1803.10122 http://arxiv.org/abs/1803.10122Google Scholar
- Perttu Hämäläinen, Sebastian Eriksson, Esa Tanskanen, Ville Kyrki, and Jaakko Lehtinen. 2014. Online Motion Synthesis Using Sequential Monte Carlo. ACM Trans. Graph. 33, 4, Article 51 (July 2014), 12 pages. Google ScholarDigital Library
- Perttu Hämäläinen, Joose Rajamäki, and C. Karen Liu. 2015. Online Control of Simulated Humanoids Using Particle Belief Propagation. ACM Trans. Graph. 34, 4, Article 81 (July 2015), 13 pages. Google ScholarDigital Library
- Félix G. Harvey, Mike Yurick, Derek Nowrouzezahrai, and Christopher Pal. 2020. Robust Motion In-Betweening. ACM Trans. Graph. 39, 4, Article 60 (July 2020), 12 pages. Google ScholarDigital Library
- Havok. 2021. Havok Physics. https://www.havok.com/havok-physics/Google Scholar
- Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin A. Riedmiller, and David Silver. 2017. Emergence of Locomotion Behaviours in Rich Environments. CoRR abs/1707.02286 (2017). arXiv:1707.02286 http://arxiv.org/abs/1707.02286Google Scholar
- Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2017. Deep Reinforcement Learning that Matters. (09 2017).Google Scholar
- Juan Camilo Gamboa Higuera, David Meger, and Gregory Dudek. 2018. Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement Learning. CoRR abs/1803.02291 (2018). arXiv:1803.02291 http://arxiv.org/abs/1803.02291Google Scholar
- Jonathan Ho and Stefano Ermon. 2016. Generative Adversarial Imitation Learning. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2016/file/cc7e2b878868cbae992d1fb743995d8f-Paper.pdf Google ScholarDigital Library
- Daniel Holden, Oussama Kanoun, Maksym Perepichka, and Tiberiu Popa. 2020. Learned Motion Matching. ACM Trans. Graph. 39, 4, Article 53 (July 2020), 13 pages. Google ScholarDigital Library
- Seokpyo Hong, Daseong Han, Kyungmin Cho, Joseph S. Shin, and Junyong Noh. 2019. Physics-Based Full-Body Soccer Motion Control for Dribbling and Shooting. ACM Trans. Graph. 38, 4, Article 74 (July 2019), 12 pages. Google ScholarDigital Library
- Ruizhen Hu, Juzhan Xu, Bin Chen, Minglun Gong, Hao Zhang, and Hui Huang. 2020. TAP-Net: Transport-and-Pack using Reinforcement Learning. CoRR abs/2009.01469 (2020). arXiv:2009.01469 https://arxiv.org/abs/2009.01469 Google ScholarDigital Library
- Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. 2019. Learning agile and dynamic motor skills for legged robots. CoRR abs/1901.08652 (2019). arXiv:1901.08652 http://arxiv.org/abs/1901.08652Google Scholar
- Sumit Jain and C. Karen Liu. 2011. Modal-space Control for Articulated Characters. ACM Trans. Graph. 30, 5, Article 118 (Oct. 2011), 12 pages. Google ScholarDigital Library
- Michael Janner, Justin Fu, Marvin Zhang, and Sergey Levine. 2019. When to Trust Your Model: Model-Based Policy Optimization. In Advances in Neural Information Processing Systems. Google ScholarDigital Library
- Seunghwan Lee, Moonseok Park, Kyoungmin Lee, and Jehee Lee. 2019. Scalable Muscle-Actuated Human Simulation and Control. ACM Trans. Graph. 38, 4, Article 73 (July 2019), 13 pages. Google ScholarDigital Library
- Yoonsang Lee, Sungeun Kim, and Jehee Lee. 2010. Data-driven Biped Control. ACM Trans. Graph. 29, 4, Article 129 (July 2010), 8 pages. Google ScholarDigital Library
- Yoonsang Lee, Moon Seok Park, Taesoo Kwon, and Jehee Lee. 2014. Locomotion Control for Many-muscle Humanoids. ACM Trans. Graph. 33, 6, Article 218 (Nov. 2014), 11 pages. Google ScholarDigital Library
- Libin Liu and Jessica Hodgins. August 2018. Learning Basketball Dribbling Skills Using Trajectory Optimization and Deep Reinforcement Learning. ACM Transactions on Graphics 37, 4 (August 2018). Google ScholarDigital Library
- Libin Liu and Jessica K. Hodgins. 2017. Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning. ACM Transactions on Graphics 36, 3 (2017). Google ScholarDigital Library
- Libin Liu, Michiel van de Panne, and Kangkang Yin. 2016. Guided Learning of Control Graphs for Physics-Based Characters. ACM Trans. Graph. 35, 3, Article 29 (May 2016), 14 pages. Google ScholarDigital Library
- Libin Liu, KangKang Yin, and Baining Guo. 2015. Improving Sampling-based Motion Control. Comput. Graph. Forum 34, 2 (May 2015), 415--423. Google ScholarDigital Library
- Libin Liu, KangKang Yin, Michiel van de Panne, Tianjia Shao, and Weiwei Xu. 2010. Sampling-based Contact-rich Motion Control. ACM Transctions on Graphics 29, 4 (2010), Article 128. Google ScholarDigital Library
- Lennart Ljung. 1999. System Identification (2nd Ed.): Theory for the User. Prentice Hall PTR, USA. Google ScholarDigital Library
- Ying-Sheng Luo, Jonathan Hans Soeseno, Trista Pei-Chun Chen, and Wei-Chao Chen. 2020. CARL: Controllable Agent with Reinforcement Learning for Quadruped Locomotion. CoRR abs/2005.03288 (2020). arXiv:2005.03288 https://arxiv.org/abs/2005.03288Google Scholar
- Adriano Macchietto, Victor Zordan, and Christian R. Shelton. 2009. Momentum Control for Balance. ACM Trans. Graph. 28, 3, Article 80 (July 2009), 8 pages. Google ScholarDigital Library
- Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, and Nicolas Heess. 2019a. Neural Probabilistic Motor Primitives for Humanoid Control. In International Conference on Learning Representations. https://openreview.net/forum?id=BJl6TjRcY7Google Scholar
- Josh Merel, Saran Tunyasuvunakool, Arun Ahuja, Yuval Tassa, Leonard Hasenclever, Vu Pham, Tom Erez, Greg Wayne, and Nicolas Heess. 2019b. Reusable neural skill embeddings for vision-guided whole body movement and object manipulation. CoRR abs/1911.06636 (2019). arXiv:1911.06636 http://arxiv.org/abs/1911.06636Google Scholar
- Uldarico Muico, Yongjoon Lee, Jovan Popović, and Zoran Popović. 2009. Contact-aware Nonlinear Control of Dynamic Characters. ACM Transactions on Graphics 28, 3 (2009). Google ScholarDigital Library
- Kourosh Naderi, Joose Rajamäki, and Perttu Hämäläinen. 2017. Discovering and Synthesizing Humanoid Climbing Movements. ACM Trans. Graph. 36, 4, Article 43 (July 2017), 11 pages. Google ScholarDigital Library
- Anusha Nagabandi, Gregory Kahn, Ronald S. Fearing, and Sergey Levine. 2017. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning. CoRR abs/1708.02596 (2017). arXiv:1708.02596 http://arxiv.org/abs/1708.02596Google Scholar
- Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L. Lewis, and Satinder P. Singh. 2015. Action-Conditional Video Prediction using Deep Networks in Atari Games. CoRR abs/1507.08750 (2015). arXiv:1507.08750 http://arxiv.org/abs/1507.08750 Google ScholarDigital Library
- Soohwan Park, Hoseok Ryu, Sunmin Lee, and Jehee Lee. 2019. Learning predict-and-simulate policies from unorganized human motion data. ACM Trans. Graph. 38, 6, Article 205 (Nov. 2019). Google ScholarDigital Library
- Dario Pavllo, Christoph Feichtenhofer, Michael Auli, and David Grangier. 2019. Modeling Human Motion with Quaternion-based Neural Networks. CoRR abs/1901.07677 (2019). arXiv:1901.07677 http://arxiv.org/abs/1901.07677Google Scholar
- Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. 2018a. Deep-Mimic: Example-guided Deep Reinforcement Learning of Physics-based Character Skills. ACM Trans. Graph. 37, 4, Article 143 (July 2018), 14 pages. Google ScholarDigital Library
- Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, and Sergey Levine. 2018b. SFV: Reinforcement Learning of Physical Skills from Videos. ACM Trans. Graph. 37, 6, Article 178 (Nov. 2018), 14 pages.Google ScholarDigital Library
- Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. 2021. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control. ACM Trans. Graph. 40, 4, Article 1 (July 2021), 15 pages. Google ScholarDigital Library
- PhysX. 2021. PhysX. https://developer.nvidia.com/physx-sdkGoogle Scholar
- Marc H. Raibert and Jessica K. Hodgins. 1991. Animation of Dynamic Legged Locomotion. In Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '91). Association for Computing Machinery, New York, NY, USA, 349--358. Google ScholarDigital Library
- Jürgen Schmidhuber. 1990. Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environments. Technical Report.Google Scholar
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. CoRR abs/1707.06347 (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347Google Scholar
- Kwang Won Sok, Manmyung Kim, and Jehee Lee. 2007. Simulating Biped Behaviors from Human Motion Data. ACM Trans. Graph. 26, 3, Article 107 (July 2007). Google ScholarDigital Library
- Richard S. Sutton. 1991. Dyna, an Integrated Architecture for Learning, Planning, and Reacting. SIGART Bull. 2, 4 (July 1991), 160--163. Google ScholarDigital Library
- Yuval Tassa, Tom Erez, and Emanuel Todorov. 2012. Synthesis and stabilization of complex behaviors through online trajectory optimization. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 4906--4913. Google ScholarCross Ref
- Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 5026--5033. Google ScholarCross Ref
- Niklas Wahlström, Thomas B. Schön, and Marc Peter Deisenroth. 2015. From Pixels to Torques: Policy Learning with Deep Dynamical Models. arXiv:1502.02251 [stat.ML]Google Scholar
- Tingwu Wang, Yunrong Guo, Maria Shugrina, and Sanja Fidler. 2020. UniCon: Universal Neural Controller For Physics-based Character Motion. arXiv:2011.15119 [cs.GR]Google Scholar
- Ronald J. Williams and David Zipser. 1989. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Computation 1, 2 (1989), 270--280. Google ScholarDigital Library
- Jungdam Won, Deepak Gopinath, and Jessica Hodgins. 2020. A Scalable Approach to Control Diverse Behaviors for Physically Simulated Characters. ACM Trans. Graph. 39, 4, Article 33 (July 2020), 12 pages. Google ScholarDigital Library
- Jungdam Won and Jehee Lee. 2019. Learning Body Shape Variation in Physics-Based Characters. ACM Trans. Graph. 38, 6, Article 207 (Nov. 2019), 12 pages. Google ScholarDigital Library
- Zhaoming Xie, Hung Yu Ling, Nam Hee Kim, and Michiel van de Panne. 2020. ALL-STEPS: Curriculum-driven Learning of Stepping Stone Skills. In Proc. ACM SIGGRAPH / Eurographics Symposium on Computer Animation. Google ScholarDigital Library
- Pei Xu and Ioannis Karamouzas. 2021. A GAN-Like Approach for Physics-Based Imitation Learning and Interactive Character Control. CoRR abs/2105.10066 (2021). arXiv:2105.10066 https://arxiv.org/abs/2105.10066 Google ScholarDigital Library
- KangKang Yin, Kevin Loken, and Michiel van de Panne. 2007. SIMBICON: Simple Biped Locomotion Control. ACM Trans. Graph. 26, 3, Article 105 (July 2007). Google ScholarDigital Library
- Wenhao Yu, Greg Turk, and C. Karen Liu. 2018. Learning Symmetry and Low-energy Locomotion. CoRR abs/1801.08093 (2018). arXiv:1801.08093 http://arxiv.org/abs/1801.08093Google Scholar
- He Zhang, Sebastian Starke, Taku Komura, and Jun Saito. 2018. Mode-Adaptive Neural Networks for Quadruped Motion Control. ACM Trans. Graph. 37, 4, Article 145 (July 2018), 11 pages. Google ScholarDigital Library
- Bo Zhou, Hongsheng Zeng, Fan Wang, Yunxiang Li, and Hao Tian. 2019. Efficient and Robust Reinforcement Learning with Uncertainty-based Value Expansion. CoRR abs/1912.05328 (2019). arXiv:1912.05328 http://arxiv.org/abs/1912.05328Google Scholar
Index Terms
- SuperTrack: motion tracking for physically simulated characters using supervised learning
Recommendations
Physics-based character controllers using conditional VAEs
High-quality motion capture datasets are now publicly available, and researchers have used them to create kinematics-based controllers that can generate plausible and diverse human motions without conditioning on specific goals (i.e., a task-agnostic ...
Real time automatic skeleton and motion estimation for character animation
International Workshop Motion in Games (MIG08)Motion capture is prevalent in the pipeline of realistic articulated character animation. To define accurate joint positions and joint orientations for the movement of a hierarchical human-like character without using a pre-defined skeleton remains a ...
From motion capture to action capture: a review of imitation learning techniques and their application to VR-based character animation
VRST '06: Proceedings of the ACM symposium on Virtual reality software and technologyWe present a novel method for virtual character animation that we call action capture. In this approach, virtual characters learn to imitate the actions of Virtual Reality (VR) users by tracking not only the users' movements but also their interactions ...
Comments