ABSTRACT
Many multi-agent navigation approaches make use of simplified representations such as a disk. These simplifications allow for fast simulation of thousands of agents but limit the simulation accuracy and fidelity. In this paper, we propose a fully integrated physical character control and multi-agent navigation method. In place of sample complex online planning methods, we extend the use of recent deep reinforcement learning techniques. This extension improves on multi-agent navigation models and simulated humanoids by combining Multi-Agent and Hierarchical Reinforcement Learning. We train a single short term goal-conditioned low-level policy to provide directed walking behaviour. This task-agnostic controller can be shared by higher-level policies that perform longer-term planning. The proposed approach produces reciprocal collision avoidance, robust navigation, and emergent crowd behaviours. Furthermore, it offers several key affordances not previously possible in multi-agent navigation including tunable character morphology and physically accurate interactions with agents and the environment. Our results show that the proposed method outperforms prior methods across environments and tasks, as well as, performing well in terms of zero-shot generalization over different numbers of agents and computation time.
Supplemental Material
- Brian Allen and Petros Faloutsos. 2009a. Complex networks of simple neurons for bipedal locomotion. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4457–4462.Google ScholarCross Ref
- Brian Allen and Petros Faloutsos. 2009b. Evolved controllers for simulated locomotion. In Lecture Notes in Computer Science: Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, Vol. 5884 LNCS. Springer, 219–230.Google Scholar
- Glen Berseth, Mubbasir Kapadia, and Petros Faloutsos. 2015. Robust space-time footsteps for agent-based steering. Computer Animation and Virtual Worlds(2015).Google Scholar
- Glen Berseth, Xue Bin Peng, and Michiel van de Panne. 2018. Terrain RL Simulator. CoRR abs/1804.06424(2018). arxiv:1804.06424http://arxiv.org/abs/1804.06424Google Scholar
- Hugo Bruggeman, Wendy Zosh, and William H Warren. 2007. Optic flow drives human visuo-locomotor adaptation. Current biology 17, 23 (2007), 2035–2040.Google Scholar
- Lucian Bu, Robert Babu, Bart De Schutter, 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38, 2 (2008), 156–172.Google ScholarDigital Library
- Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(1998), 746–752.Google ScholarDigital Library
- Alain Dutech, Olivier Buffet, and François Charpillet. 2001. Multi-agent systems by incremental gradient reinforcement learning. In International Joint Conference on Artificial Intelligence, Vol. 17. Citeseer, 833–838.Google Scholar
- Petros Faloutsos, Michiel Van de Panne, and Demetri Terzopoulos. 2001. Composable controllers for physics-based character animation. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. ACM, 251–260.Google ScholarDigital Library
- Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477(2018).Google Scholar
- Tao Geng, Bernd Porr, and Florentin Wörgötter. 2006. A reflexive neural network for dynamic biped walking control.Neural Computation 18, 5 (2006), 1156–96.Google Scholar
- Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. 2018. Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2255–2264.Google ScholarCross Ref
- Dongge Han, Wendelin Boehmer, Michael Wooldridge, and Alex Rogers. 2019. Multi-Agent Hierarchical Reinforcement Learning with Dynamic Termination. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2006–2008.Google ScholarDigital Library
- Dirk Helbing, Illés Farkas, and Tamas Vicsek. 2000. Simulating dynamical features of escape panic. Nature 407, 6803 (2000), 487–490.Google ScholarCross Ref
- Dirk Helbing and Peter Molnar. 1995. Social force model for pedestrian dynamics. Physical review E 51, 5 (1995), 4282.Google Scholar
- Rico Jonschkowski and Oliver Brock. 2015. Learning state representations with robotic priors. Autonomous Robots 39, 3 (2015), 407–428.Google ScholarDigital Library
- L P Kaelbling. 1993. Learning to achieve goals. In International Joint Conference on Artificial Intelligence (IJCAI), Vol. vol.2. 1094 – 8.Google Scholar
- Mubbasir Kapadia, Nuria Pelechano, Jan Allbeck, and Norm Badler. 2015. Virtual crowds: Steps toward behavioral realism. Synthesis lectures on visual computing: computer graphics, animation, computational photography, and imaging 7, 4 (2015), 1–270.Google Scholar
- Mubbasir Kapadia, Matt Wang, Shawn Singh, Glenn Reinman, and Petros Faloutsos. 2011. Scenario space: characterizing coverage, quality, and failure of steering algorithms. In Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. ACM, 53–62.Google ScholarDigital Library
- Ioannis Karamouzas, Peter Heil, Pascal Van Beek, and Mark H Overmars. 2009. A predictive collision avoidance model for pedestrian simulation. In International workshop on motion in games. Springer, 41–52.Google ScholarDigital Library
- Sujeong Kim, StephenJ. Guy, Karl Hillesland, Basim Zafar, AdnanAbdul-Aziz Gutub, and Dinesh Manocha. 2014. Velocity-based modeling of physical interactions in dense crowds. The Visual Computer (2014), 1–15. https://doi.org/10.1007/s00371-014-0946-1Google ScholarDigital Library
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).Google Scholar
- Andrew Kun and W. Thomas Miller III. 1996. Adaptive dynamic balance of a biped robot using neural networks. In Proceedings of the IEEE International Conference on Robotics and Automation, Vol. pages. IEEE, 240–245.Google ScholarCross Ref
- Jaedong Lee, Jungdam Won, and Jehee Lee. 2018. Crowd simulation by deep reinforcement learning. In Proceedings of the 11th Annual International Conference on Motion, Interaction, and Games. ACM, 2.Google ScholarDigital Library
- Michael L Littman. 1994. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994. Elsevier, 157–163.Google ScholarDigital Library
- Ryan Lowe, YI WU, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Advances in Neural Information Processing Systems 30. 6379–6390.Google Scholar
- Francisco Martinez-Gil, Miguel Lozano, and Fernando Fernández. 2015. Strategies for simulating pedestrian navigation with multiple reinforcement learning agents. Autonomous Agents and Multi-Agent Systems 29, 1 (2015), 98–130.Google ScholarDigital Library
- Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Siqi Liu, Dhruva Tirumala, Nicolas Heess, and Greg Wayne. 2018. Hierarchical visuomotor control of humanoids. CoRR abs/1811.09656(2018). arxiv:1811.09656http://arxiv.org/abs/1811.09656Google Scholar
- W. Thomas Miller III. 1994. Real-time neural network control of a biped walking robot. Control Systems, IEEE 14, 1 (1994), 41–48.Google ScholarCross Ref
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google Scholar
- Ranjit Nair, Milind Tambe, Makoto Yokoo, David Pynadath, and Stacy Marsella. 2003. Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In IJCAI, Vol. 3. 705–711.Google Scholar
- OpenAI. 2018. OpenAI Five. https://blog.openai.com/openai-five/.Google Scholar
- Nuria Pelechano, Jan M Allbeck, Mubbasir Kapadia, and Norman I Badler. 2016. Simulating heterogeneous crowds with interactive behaviors. CRC Press.Google ScholarDigital Library
- Xue Bin Peng, Glen Berseth, KangKang Yin, and Michiel Van De Panne. 2017. Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (TOG) 36, 4 (2017), 41.Google ScholarDigital Library
- John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. 1889–1897.Google Scholar
- John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2016. High-dimensional continuous control using generalized advantage estimation. In International Conference on Learning Representations (ICLR 2016).Google Scholar
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal Policy Optimization Algorithms. ArXiv e-prints (July 2017). arxiv:1707.06347 [cs.LG]Google Scholar
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347(2017).Google Scholar
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In Proc. ICML.Google Scholar
- David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, and et al.2017. Mastering the game of Go without human knowledge. Nature 550, 7676 (Oct 2017), 354–359.Google ScholarCross Ref
- Shawn Singh, Mubbasir Kapadia, Glenn Reinman, and Petros Faloutsos. 2011. Footstep navigation for dynamic crowds. Computer Animation and Virtual Worlds 22, 2-3 (2011), 151–158.Google ScholarDigital Library
- Gentaro Taga, Yoko Yamaguchi, and Hiroshi Shinizu. 1991. Self-organized control of bipedal locomotion by neural oscillators in unpredicatable environments. Biological Cybernetics 65, 3 (1991), 147–159.Google ScholarDigital Library
- Ming Tan. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning. 330–337.Google ScholarDigital Library
- Hongyao Tang, Jianye Hao, Tangjie Lv, Yingfeng Chen, Zongzhang Zhang, Hangtian Jia, Chunxu Ren, Yan Zheng, Changjie Fan, and Li Wang. 2018. Hierarchical deep multiagent reinforcement learning. arXiv preprint arXiv:1809.09332(2018).Google Scholar
- Daniel Thalmann and Soraia Raupp Musse. 2013. . Springer.Google Scholar
- Lisa Torrey. 2010. Crowd Simulation via Multi-agent Reinforcement Learning. In Proceedings of the Sixth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment(Stanford, California, USA) (AIIDE’10). AAAI Press, 89–94.Google ScholarDigital Library
- Michael R Tucker, Jeremy Olivier, Anna Pagel, Hannes Bleuler, Mohamed Bouri, Olivier Lambercy, José del R Millán, Robert Riener, Heike Vallery, and Roger Gassert. 2015. Control strategies for active lower extremity prosthetics and orthotics: a review. Journal of neuroengineering and rehabilitation 12, 1(2015), 1.Google ScholarCross Ref
- Jur Van Den Berg, Stephen J Guy, Ming Lin, and Dinesh Manocha. 2011. Reciprocal n-body collision avoidance. In Robotics research. Springer, 3–19.Google Scholar
- Jur Van den Berg, Ming Lin, and Dinesh Manocha. 2008. Reciprocal velocity obstacles for real-time multi-agent navigation. In 2008 IEEE International Conference on Robotics and Automation. IEEE, 1928–1935.Google ScholarCross Ref
- William H Warren Jr, Bruce A Kay, Wendy D Zosh, Andrew P Duchon, and Stephanie Sahuc. 2001. Optic flow is used to control human walking. Nature neuroscience 4, 2 (2001), 213.Google Scholar
- Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. 2015. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in neural information processing systems. 2746–2754.Google Scholar
- David Wilkie, Jur Van Den Berg, and Dinesh Manocha. 2009. Generalized velocity obstacles. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5573–5578.Google ScholarCross Ref
- KangKang Yin, Kevin Loken, and Michiel van de Panne. 2007. SIMBICON: Simple Biped Locomotion Control. ACM Transactions on Graphics 26, 3 (2007), Article 105.Google ScholarDigital Library
- Petr Zaytsev, S Javad Hasaneini, and Andy Ruina. 2015. Two steps is enough: no need to plan far ahead for walking balance. In 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6295–6300.Google ScholarCross Ref
- Amy Zhang, Nicolas Ballas, and Joelle Pineau. 2018. A dissection of overfitting and generalization in continuous reinforcement learning. arXiv preprint arXiv:1806.07937(2018).Google Scholar
Deep Integration of Physical Humanoid Control and Crowd Navigation
Recommendations
Crowd simulation by deep reinforcement learning
MIG '18: Proceedings of the 11th ACM SIGGRAPH Conference on Motion, Interaction and GamesSimulating believable virtual crowds has been an important research topic in many research fields such as industry films, computer games, urban engineering, and behavioral science. One of the key capabilities agents should have is navigation, which is ...
Vision-Based Humanoid Robot Navigation in a Featureless Environment
MCPR 2015: Proceedings of the 7th Mexican Conference on Pattern Recognition - Volume 9116One of the most basic tasks for any autonomous mobile robot is that of safely navigating from one point to another e.g. service robots should be able to find their way in different kinds of environments. Typically, vision is used to find landmarks in ...
Vision-based maze navigation for humanoid robots
We present a vision-based approach for navigation of humanoid robots in networks of corridors connected through curves and junctions. The objective of the humanoid is to follow the corridors, walking as close as possible to their center to maximize ...
Comments