skip to main content
10.1145/3424636.3426894acmconferencesArticle/Chapter ViewAbstractPublication PagesmigConference Proceedingsconference-collections
research-article

Deep Integration of Physical Humanoid Control and Crowd Navigation

Published:22 November 2020Publication History

ABSTRACT

Many multi-agent navigation approaches make use of simplified representations such as a disk. These simplifications allow for fast simulation of thousands of agents but limit the simulation accuracy and fidelity. In this paper, we propose a fully integrated physical character control and multi-agent navigation method. In place of sample complex online planning methods, we extend the use of recent deep reinforcement learning techniques. This extension improves on multi-agent navigation models and simulated humanoids by combining Multi-Agent and Hierarchical Reinforcement Learning. We train a single short term goal-conditioned low-level policy to provide directed walking behaviour. This task-agnostic controller can be shared by higher-level policies that perform longer-term planning. The proposed approach produces reciprocal collision avoidance, robust navigation, and emergent crowd behaviours. Furthermore, it offers several key affordances not previously possible in multi-agent navigation including tunable character morphology and physically accurate interactions with agents and the environment. Our results show that the proposed method outperforms prior methods across environments and tasks, as well as, performing well in terms of zero-shot generalization over different numbers of agents and computation time.

Skip Supplemental Material Section

Supplemental Material

a15-haworth-video3.mp4
a15-haworth-video2.mp4
a15-haworth-video1.mp4

References

  1. Brian Allen and Petros Faloutsos. 2009a. Complex networks of simple neurons for bipedal locomotion. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4457–4462.Google ScholarGoogle ScholarCross RefCross Ref
  2. Brian Allen and Petros Faloutsos. 2009b. Evolved controllers for simulated locomotion. In Lecture Notes in Computer Science: Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, Vol. 5884 LNCS. Springer, 219–230.Google ScholarGoogle Scholar
  3. Glen Berseth, Mubbasir Kapadia, and Petros Faloutsos. 2015. Robust space-time footsteps for agent-based steering. Computer Animation and Virtual Worlds(2015).Google ScholarGoogle Scholar
  4. Glen Berseth, Xue Bin Peng, and Michiel van de Panne. 2018. Terrain RL Simulator. CoRR abs/1804.06424(2018). arxiv:1804.06424http://arxiv.org/abs/1804.06424Google ScholarGoogle Scholar
  5. Hugo Bruggeman, Wendy Zosh, and William H Warren. 2007. Optic flow drives human visuo-locomotor adaptation. Current biology 17, 23 (2007), 2035–2040.Google ScholarGoogle Scholar
  6. Lucian Bu, Robert Babu, Bart De Schutter, 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38, 2 (2008), 156–172.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(1998), 746–752.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Alain Dutech, Olivier Buffet, and François Charpillet. 2001. Multi-agent systems by incremental gradient reinforcement learning. In International Joint Conference on Artificial Intelligence, Vol. 17. Citeseer, 833–838.Google ScholarGoogle Scholar
  9. Petros Faloutsos, Michiel Van de Panne, and Demetri Terzopoulos. 2001. Composable controllers for physics-based character animation. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. ACM, 251–260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477(2018).Google ScholarGoogle Scholar
  11. Tao Geng, Bernd Porr, and Florentin Wörgötter. 2006. A reflexive neural network for dynamic biped walking control.Neural Computation 18, 5 (2006), 1156–96.Google ScholarGoogle Scholar
  12. Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. 2018. Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2255–2264.Google ScholarGoogle ScholarCross RefCross Ref
  13. Dongge Han, Wendelin Boehmer, Michael Wooldridge, and Alex Rogers. 2019. Multi-Agent Hierarchical Reinforcement Learning with Dynamic Termination. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2006–2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dirk Helbing, Illés Farkas, and Tamas Vicsek. 2000. Simulating dynamical features of escape panic. Nature 407, 6803 (2000), 487–490.Google ScholarGoogle ScholarCross RefCross Ref
  15. Dirk Helbing and Peter Molnar. 1995. Social force model for pedestrian dynamics. Physical review E 51, 5 (1995), 4282.Google ScholarGoogle Scholar
  16. Rico Jonschkowski and Oliver Brock. 2015. Learning state representations with robotic priors. Autonomous Robots 39, 3 (2015), 407–428.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L P Kaelbling. 1993. Learning to achieve goals. In International Joint Conference on Artificial Intelligence (IJCAI), Vol. vol.2. 1094 – 8.Google ScholarGoogle Scholar
  18. Mubbasir Kapadia, Nuria Pelechano, Jan Allbeck, and Norm Badler. 2015. Virtual crowds: Steps toward behavioral realism. Synthesis lectures on visual computing: computer graphics, animation, computational photography, and imaging 7, 4 (2015), 1–270.Google ScholarGoogle Scholar
  19. Mubbasir Kapadia, Matt Wang, Shawn Singh, Glenn Reinman, and Petros Faloutsos. 2011. Scenario space: characterizing coverage, quality, and failure of steering algorithms. In Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. ACM, 53–62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ioannis Karamouzas, Peter Heil, Pascal Van Beek, and Mark H Overmars. 2009. A predictive collision avoidance model for pedestrian simulation. In International workshop on motion in games. Springer, 41–52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sujeong Kim, StephenJ. Guy, Karl Hillesland, Basim Zafar, AdnanAbdul-Aziz Gutub, and Dinesh Manocha. 2014. Velocity-based modeling of physical interactions in dense crowds. The Visual Computer (2014), 1–15. https://doi.org/10.1007/s00371-014-0946-1Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).Google ScholarGoogle Scholar
  23. Andrew Kun and W. Thomas Miller III. 1996. Adaptive dynamic balance of a biped robot using neural networks. In Proceedings of the IEEE International Conference on Robotics and Automation, Vol. pages. IEEE, 240–245.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jaedong Lee, Jungdam Won, and Jehee Lee. 2018. Crowd simulation by deep reinforcement learning. In Proceedings of the 11th Annual International Conference on Motion, Interaction, and Games. ACM, 2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Michael L Littman. 1994. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994. Elsevier, 157–163.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ryan Lowe, YI WU, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Advances in Neural Information Processing Systems 30. 6379–6390.Google ScholarGoogle Scholar
  27. Francisco Martinez-Gil, Miguel Lozano, and Fernando Fernández. 2015. Strategies for simulating pedestrian navigation with multiple reinforcement learning agents. Autonomous Agents and Multi-Agent Systems 29, 1 (2015), 98–130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Siqi Liu, Dhruva Tirumala, Nicolas Heess, and Greg Wayne. 2018. Hierarchical visuomotor control of humanoids. CoRR abs/1811.09656(2018). arxiv:1811.09656http://arxiv.org/abs/1811.09656Google ScholarGoogle Scholar
  29. W. Thomas Miller III. 1994. Real-time neural network control of a biped walking robot. Control Systems, IEEE 14, 1 (1994), 41–48.Google ScholarGoogle ScholarCross RefCross Ref
  30. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google ScholarGoogle Scholar
  31. Ranjit Nair, Milind Tambe, Makoto Yokoo, David Pynadath, and Stacy Marsella. 2003. Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In IJCAI, Vol. 3. 705–711.Google ScholarGoogle Scholar
  32. OpenAI. 2018. OpenAI Five. https://blog.openai.com/openai-five/.Google ScholarGoogle Scholar
  33. Nuria Pelechano, Jan M Allbeck, Mubbasir Kapadia, and Norman I Badler. 2016. Simulating heterogeneous crowds with interactive behaviors. CRC Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xue Bin Peng, Glen Berseth, KangKang Yin, and Michiel Van De Panne. 2017. Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (TOG) 36, 4 (2017), 41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. 1889–1897.Google ScholarGoogle Scholar
  36. John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2016. High-dimensional continuous control using generalized advantage estimation. In International Conference on Learning Representations (ICLR 2016).Google ScholarGoogle Scholar
  37. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal Policy Optimization Algorithms. ArXiv e-prints (July 2017). arxiv:1707.06347 [cs.LG]Google ScholarGoogle Scholar
  38. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347(2017).Google ScholarGoogle Scholar
  39. David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In Proc. ICML.Google ScholarGoogle Scholar
  40. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, and et al.2017. Mastering the game of Go without human knowledge. Nature 550, 7676 (Oct 2017), 354–359.Google ScholarGoogle ScholarCross RefCross Ref
  41. Shawn Singh, Mubbasir Kapadia, Glenn Reinman, and Petros Faloutsos. 2011. Footstep navigation for dynamic crowds. Computer Animation and Virtual Worlds 22, 2-3 (2011), 151–158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Gentaro Taga, Yoko Yamaguchi, and Hiroshi Shinizu. 1991. Self-organized control of bipedal locomotion by neural oscillators in unpredicatable environments. Biological Cybernetics 65, 3 (1991), 147–159.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ming Tan. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning. 330–337.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Hongyao Tang, Jianye Hao, Tangjie Lv, Yingfeng Chen, Zongzhang Zhang, Hangtian Jia, Chunxu Ren, Yan Zheng, Changjie Fan, and Li Wang. 2018. Hierarchical deep multiagent reinforcement learning. arXiv preprint arXiv:1809.09332(2018).Google ScholarGoogle Scholar
  45. Daniel Thalmann and Soraia Raupp Musse. 2013. . Springer.Google ScholarGoogle Scholar
  46. Lisa Torrey. 2010. Crowd Simulation via Multi-agent Reinforcement Learning. In Proceedings of the Sixth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment(Stanford, California, USA) (AIIDE’10). AAAI Press, 89–94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Michael R Tucker, Jeremy Olivier, Anna Pagel, Hannes Bleuler, Mohamed Bouri, Olivier Lambercy, José del R Millán, Robert Riener, Heike Vallery, and Roger Gassert. 2015. Control strategies for active lower extremity prosthetics and orthotics: a review. Journal of neuroengineering and rehabilitation 12, 1(2015), 1.Google ScholarGoogle ScholarCross RefCross Ref
  48. Jur Van Den Berg, Stephen J Guy, Ming Lin, and Dinesh Manocha. 2011. Reciprocal n-body collision avoidance. In Robotics research. Springer, 3–19.Google ScholarGoogle Scholar
  49. Jur Van den Berg, Ming Lin, and Dinesh Manocha. 2008. Reciprocal velocity obstacles for real-time multi-agent navigation. In 2008 IEEE International Conference on Robotics and Automation. IEEE, 1928–1935.Google ScholarGoogle ScholarCross RefCross Ref
  50. William H Warren Jr, Bruce A Kay, Wendy D Zosh, Andrew P Duchon, and Stephanie Sahuc. 2001. Optic flow is used to control human walking. Nature neuroscience 4, 2 (2001), 213.Google ScholarGoogle Scholar
  51. Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. 2015. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in neural information processing systems. 2746–2754.Google ScholarGoogle Scholar
  52. David Wilkie, Jur Van Den Berg, and Dinesh Manocha. 2009. Generalized velocity obstacles. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5573–5578.Google ScholarGoogle ScholarCross RefCross Ref
  53. KangKang Yin, Kevin Loken, and Michiel van de Panne. 2007. SIMBICON: Simple Biped Locomotion Control. ACM Transactions on Graphics 26, 3 (2007), Article 105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Petr Zaytsev, S Javad Hasaneini, and Andy Ruina. 2015. Two steps is enough: no need to plan far ahead for walking balance. In 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6295–6300.Google ScholarGoogle ScholarCross RefCross Ref
  55. Amy Zhang, Nicolas Ballas, and Joelle Pineau. 2018. A dissection of overfitting and generalization in continuous reinforcement learning. arXiv preprint arXiv:1806.07937(2018).Google ScholarGoogle Scholar
  1. Deep Integration of Physical Humanoid Control and Crowd Navigation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MIG '20: Proceedings of the 13th ACM SIGGRAPH Conference on Motion, Interaction and Games
        October 2020
        190 pages
        ISBN:9781450381710
        DOI:10.1145/3424636

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 November 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate-9of-9submissions,100%

        Upcoming Conference

        MIG '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format