Abstract
The development of a robust and versatile biped walking engine might be considered one of the hardest problems in Mobile Robotics. Even well-developed cities contains obstacles that make the navigation of these agents without a human assistance infeasible. Therefore, it is primordial that they be able to restore dynamically their own balance when subject to certain types of external disturbances. Thereby, this article contributes with a implementation of a Push Recovery controller that improves the walking engine’s performance used by a simulated humanoid agent from RoboCup 3D Soccer Simulation League environment. This work applies Proximal Policy Optimization in order to learn a movement policy in this simulator. Our learned policy was able to surpass the baselines with statistical significance. Finally, we propose two approaches based on Transfer Learning and Imitation Learning to achieve a final policy which performs well across an wide range disturbance directions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Abreu, M., Lau, N., Sousa, A., Reis, L. P.: Learning Low Level Skills from Scratch for Humanoid Robot Soccer Using Deep Reinforcement Learning. In: 2019 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pp. 1–8 (2019), https://doi.org/10.1109/ICARSC.2019.8733632
Abreu, M., Reis, L. P., Lau, N.: Learning to Run Faster in a Humanoid Robot Soccer Environment through Reinforcement Learning. In: Chalup, S., Niemueller, T., Suthakorn, J., Williams, M. A. (eds.) Robocup 2019: Robot World Cup XXIII, pp 3–15. Springer International Publishing, Cham (2019)
Abreu, M., Simes, D., Lau, N., Reis, L.P.: Fast, human-like running and sprinting. https://archive.robocup.info/Soccer/Simulation/3D/FCPs/RoboCup/2019/FCPortugal_SS3D_RC2019_FCP.pdf (2019)
de Albuquerque Maximo, M. R. O.: Automatic Walking Step Duration through Model Predictive Control. Ph.D. thesis, Aeronautics Institute of Technology (2017)
Bain, M., Sammut, C.: A Framework for Behavioural Cloning. In: Machine Intelligence 15 (1995)
Carvalho Melo, D., Quartucci Forster, C.H., Omena de Albuquerque Maximó, M.R.: Learning When to Kick through Deep Neural Networks. In: 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE), pp. 43–48 (2019)
Carvalho Melo, L., Omena Albuquerque Maximó, M.R.: Learning Humanoid Robot Running Skills through Proximal Policy Optimization. In: 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE), pp. 37–42 (2019)
Chaffre, T., Moras, J., Chan-Hon-Tong, A., Marzat, J.: Sim-to-real transfer with incremental environment complexity for reinforcement learning of depth-based robot navigation (2020)
Colas, C., Sigaud, O., Oudeyer, P.: How many random seeds? statistical power analysis in deep reinforcement learning experiments. arXiv:1806.08295 (2018)
Depinet, M., MacAlpine, P., Stone, P.: Keyframe Sampling, Optimization, and Behavior Integration: Towards Long-Distance Kicking in the Robocup 3D Simulation League. In: Bianchi, R. A. C., Akin, H. L., Ramamoorthy, S., Sugiura, K. (eds.) RoboCup-2014: Robot Soccer World Cup XVIII, Lecture Notes in Artificial Intelligence. Springer Verlag, Berlin (2015)
Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines https://github.com/openai/baselines (2017)
Dorer, K.: Learning to Use Toes in a Humanoid Robot. In: Akiyama, H., Obst, O., Sammut, C., Tonidandel, F. (eds.) Robocup 2017: Robot World Cup XXI, pp 168–179. Springer International Publishing, Cham (2018)
Duan, Y., Andrychowicz, M., Stadie, B.C., Ho, J., Schneider, J., Sutskever, I., Abbeel, P., Zaremba, W.: One-shot imitation learning. arXiv:1703.07326 (2017)
Dunbar, D. C., Horak, F. B., Macpherson, J., Rushmer, D. S.: Neural control of quadrupedal and bipedal stance: implications for the evolution of erect posture. American journal of physical anthropology 69 (1), 93–105 (1986)
Efron, B., Tibshirani, R.: Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci. 1(1), 54–75 (1986). https://doi.org/10.1214/ss/1177013815
Farchy, A., Barrett, S., MacAlpine, P., Stone, P.: Humanoid Robots Learning to Walk Faster: from the Real World to Simulation and Back. In: Proceedings of 12Th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2013)
Fischer, J., Dorer, K.: Learning a walk behavior utilizing toes from scratch. https://archive.robocup.info/Soccer/Simulation/3D/FCPs/RoboCup/2019/magmaOffenburg_SS3D_RC2019_FCP.pdf (2019)
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT press (2016)
Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: an empirical investigation of catastrophic forgetting in gradient-based neural networks (2015)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv:1801.01290 (2018)
Hofmann, A.: Robust execution of bipedal walking tasks from biomechanical principles (2006)
Horak, F., Henry, S., Shumway-Cook, A.: Postural perturbations: New insights for treatment of balance disorders. Physical therapy 77, 517–33 (1997). https://doi.org/10.1093/ptj/77.5.517
Horak, F., Macpherson, J.: Postural Orientation and Equilibrium. In: Handbook of Physiology. Exercise: Regulation and Integration of Multiple Systems. MD1 am Physiol Soc pp. 255–292 (1996)
James, S., Wohlhart, P., Kalakrishnan, M., Kalashnikov, D., Irpan, A., Ibarz, J., Levine, S., Hadsell, R., Bousmalis, K.: Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks (2019)
Kajita, S., Kanehiro, F., Kaneko, K., Yokoi, K., Hirukawa, H.: The 3D Linear Inverted Pendulum mode: A simple modeling for a biped walking pattern generation. In: Proceedings of the 2001IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, Hawaii, USA (2001)
Kim, H., Seo, D., Kim, D.: Push Recovery Control for Humanoid Robot Using Reinforcement Learning. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 488–492 (2019), https://doi.org/10.1109/IRC.2019.00102
Leike, J., Martic, M., Krakovna, V., Ortega, P. A., Everitt, T., Lefrancq, A., Orseau, L., Legg, S.: Ai safety gridworlds (2017)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
MacAlpine, P., Barrett, S., Urieli, D., Vu, V., Stone, P.: Design and optimization of an omnidirectional humanoid walk: A winning approach at the roboCup 2011 3D simulation competition. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI) (2012)
MacAlpine, P., Collins, N., Lopez-Mobilia, A., Stone, P.: UT Austin Villa: RoboCup 2012 3D Simulation League Champion. In: Chen, X., Stone, P., Sucar, L. E., der Zant, T. V. (eds.) RoboCup-2012: Robot Soccer World Cup XVI, Lecture Notes in Artificial Intelligence. Springer Verlag, Berlin (2013)
MacAlpine, P., Stone, P.: Overlapping layered learning. Artificial Intelligence 254, 21–43 (2018). https://doi.org/10.1016/j.artint.2017.09.001 . https://www.sciencedirect.com/science/article/pii/S0004370217301066
MacAlpine, P., Stone, P.: UT Austin Villa: RoboCup 2017 3D Simulation League Competition and Technical Challenges Champions. In: Sammut, C., Obst, O., Tonidandel, F., Akyama, H. (eds.) RoboCup 2017: Robot Soccer World Cup XXI, Lecture Notes in Artificial Intelligence. Springer (2018)
Maximo, M.R., Colombini, E.L., Ribeiro, C.H.: Stable and fast model-free walk with arms movement for humanoid robots. International Journal of Advanced Robotic Systems 14(3), 1729881416675135 (2017). https://doi.org/10.1177/1729881416675135
Maximo, M. R. O. A.: Omnidirectional ZMP-based walking for a humanoid robot. Master’s Thesis, Instituto tecnológico de aeronáutica, são josé dos Campos, SP Brazil (2015)
Maximo, M. R. O. A., Ribeiro, C. H. C.: ZMP-Based Humanoid Walking Engine with Arms Movement and Stabilization. In: Proceedings of the 2016 Congresso Brasileiro de Automática (CBA). SBA, Vitória, ES, Brazil (2016)
Maximo, M. R. O. A., Ribeiro, C. H. C., Afonso, R. J. M.: Modeling of a position servo used in robotics applications. In: Proceedings of the 2017 Simpósio Brasileiro de Automação Inteligente (SBAI). SBA, Porto Alegre, SC, Brazil (2017)
Melo, D. C.: Learning Push Recovery Strategies for Bipedal Walking. Master’s Thesis, Instituto tecnológico de aeronáutica, são josé dos Campos, SP Brazil (2021)
Melo, D. C., Máximo, M.R.O.A., da Cunha, A.M.: Push recovery strategies through deep reinforcement learning. In: 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE), pp. 1–6 (2020), https://doi.org/10.1109/LARS/SBR/WRE51543.2020.9306967
Melo, L. C., Maximo, M.R.O.A.: Learning humanoid robot running skills through proximal policy optimization (2019)
Melo, L. C., Maximo, M. R. O. A., da Cunha, A. M.: Bottom-up meta-policy search. In: Proceedings of the Deep Reinforcement Learning Workshop of NeurIPS 2019 (2019)
Mitchell, E., Rafailov, R., Peng, X. B., Levine, S., Finn, C.: Offline meta-reinforcement learning with advantage weighting (2020)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with Deep Reinforcement Learning. In: NIPS Deep Learning Workshop (2013)
Muniz, F., Maximo, M. R. O. A., Ribeiro, C. H. C.: Keyframe Movement Optimization for Simulated Humanoid Robot Using a Parallel Optimization Framework. In: 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR), pp. 79–84 (2016), https://doi.org/10.1109/LARS-SBR.2016.20https://doi.org/10.1109/LARS-SBR.2016.20
Muzio, A., Aguiar, L., Maximo, M. R. O. A., Pinto, S. C.: Monte Carlo Localization with Field Lines Observations for Simulated Humanoid Robotic Soccer. In: 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR), pp 334–339. IEEE, Recife, PE, Brazil (2016), https://doi.org/10.1109/LARS-SBR.2016.63
Muzio, A.F.V.: Deep reinforcement learning applied to humanoid robots (2017)
Muzio, A. F. V., Maximo, M. R. O. A., Yoneyama, T.: Deep Reinforcement Learning for Humanoid Robot Dribbling. In: 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE), pp. 1–6 (2020), https://doi.org/10.1109/LARS/SBR/WRE51543.2020.9307084
Nashner, L.: Analysis of stance posture in humans (1981)
Nashner, L. M., McCollum, G.: The organization of human postural movements: a formal basis and experimental synthesis. Behavioral and Brain Sciences 8(1), 135–150 (1985). https://doi.org/10.1017/S0140525X00020008
Oh, J., Singh, S.P., Lee, H., Kohli, P.: Zero-shot task generalization with multi-task deep reinforcement learning. arXiv:1706.05064 (2017)
OpenAI, Andrychowicz, M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. arXiv:1808.00177(2018)
Orin, D. E., Goswani, A., Lee, S. H.: Centroidal dynamics of a humanoid robot. Auton. Robot. 35, 161–176 (2013)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. https://doi.org/10.3115/1073083.1073135https://doi.org/10.3115/1073083.1073135 (2002)
Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37, 4 (2018)
Peng, X. B., Berseth, G., Yin, K., van de Panne, M.: Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (Proc SIGGRAPH 2017) 36(4) (2017)
Rebula, J., Canas, F., Pratt, J., Goswami, A.: Learning capture points for bipedal push recovery. pp. 1774–1774. https://doi.org/10.1109/ROBOT.2008.4543460 (2008)
Rietdyk, S., Patla, A., Winter, D., Ishac, M., Little, C.: Balance recovery from medio-lateral perturbations of the upper body during standing. Journal of Biomechanics 32(11), 1149–1158 (1999). https://doi.org/10.1016/S0021-9290(99)00116-5. http://www.sciencedirect.com/science/article/pii/S0021929099001165
Runge, C., Shupert, C., Horak, F., Zajac, F.: Ankle and hip postural strategies defined by joint torques. Gait and Posture 10(2), 161–170 (1999). https://doi.org/10.1016/S0966-6362(99)00032-6
Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3, 233–242 (1999)
Schroff, F., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2015.7298682https://doi.org/10.1109/cvpr.2015.7298682 (2015)
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. arXiv:1502.05477 (2015)
Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. arXiv:1506.02438 (2016)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Yi, S.-J., Zhang, B.-T., Hong, D., Lee, D.D.: Learning Full Body Push Recovery Control for Small Humanoid Robots. In: 2011 IEEE International Conference on Robotics and Automation, pp. 2047–2052 (2011), https://doi.org/10.1109/ICRA.2011.5980531
Shafiee-Ashtiani, M., Yousefi-Koma, A., Mirjalili, R., Maleki, H., Karimi, M.: Push recovery of a position-controlled humanoid robot based on capture point feedback control (2017)
Shafii, N., Aslani, S., Nezami, O. M., Shiry, S.: Evolution of Biped Walking Using Truncated Fourier Series and Particle Swarm Optimization. In: Robocup 2009: Robot Soccer World Cup XIII, pp 344–354. Springer, Singapore (2010)
Siegwart, R., Nourbakhsh, I. R., Scaramuzza, D.: Introduction to autonomous mobile robots. The MIT press, Cambridge (2011)
Singh, A., Jang, E., Irpan, A., Kappler, D., Dalal, M., Levine, S., Khansari, M., Finn, C.: Scalable multi-task imitation learning with autonomous improvement (2020)
Stephens, B.: Humanoid Push Recovery. In: 2007 7Th IEEE-RAS International Conference on Humanoid Robots, pp. 589–595 (2007), https://doi.org/10.1109/ICHR.2007.4813931
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press. http://incompleteideas.net/book/the-book-2nd.html (2018)
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. arXiv:1808.01974 (2018)
Tanwani, A.K.: Domain-invariant representation learning for sim-to-real transfer (2020)
Tedrake, R. L.: Applied Optimal Control for Dynamically Stable Legged Locomotion. Ph.D. thesis Massachusetts Institute of Technology (2004)
Ting, L.H.: Postural Synergies, pp. 3228–3233. Springer, Berlin Heidelberg (2009). https://doi.org/10.1007/978-3-540-29678-2∖_4716https://doi.org/10.1007/978-3-540-29678-2∖_4716
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a Physics Engine for Model-Based Control. In: IROS, pp. 5026–5033. IEEE (2012)
Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation. arXiv:1805.01954(2018)
Vatankhah, H., Lau, N., MacAlpine, P., van Dijk, S., Glaser, S.: Simspark https://gitlab.com/robocup-sim/SimSpark (2018)
Vukobratović, M., Borovac, B.: Zero-Moment Point – thirty five years of its life. International Journal of Humanoid Robots 1(1), 157–173 (2004)
Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., de Freitas, N.: Sample efficient actor-critic with experience replay. arXiv:1611.01224 (2016)
Xie, Z., Clary, P., Dao, J., Morais, P., Hurst, J., van de Panne, M.: Iterative reinforcement learning based design of dynamic locomotion skills for cassie (2019)
Xu, Y., Vatankhah, H.: Simspark: an Open Source Robot Simulator Developed by the Robocup Community. In: Behnke, S., Veloso, M., Visser, A., Xiong, R. (eds.) Robocup 2013: Robot World Cup XVII, pp 632–639. Springer, Berlin, Heidelberg (2014)
Yang, C., Komura, T., Li, Z.: Emergence of Human-Comparable Balancing Behaviours by Deep Reinforcement Learning. In: 2017 IEEE-RAS 17Th International Conference on Humanoid Robotics (Humanoids), pp. 372–377 (2017), https://doi.org/10.1109/HUMANOIDS.2017.8246900
Yang, C., Yuan, K., Merkt, W., Komura, T., Vijayakumar, S., Li, Z.: Learning Whole-Body Motor Skills for Humanoids. In: 2018 IEEE-RAS 18Th International Conference on Humanoid Robots (Humanoids), pp. 270–276 (2018), https://doi.org/10.1109/HUMANOIDS.2018.8625045
Yi, S., Zhang, B., Hong, D., Lee, D. D.: Online Learning of Low Dimensional Strategies for High-Level Push Recovery in Bipedal Humanoid Robots. In: 2013 IEEE International Conference on Robotics and Automation, pp. 1649–1655 (2013), https://doi.org/10.1109/ICRA.2013.6630791
Yi, S. J., Zhang, B. T., Hong, D., Lee, D.: Online learning of a full body push recovery controller for omnidirectional walking. pp. 1–6. https://doi.org/10.1109/Humanoids.2011.6100896 (2011)
Yi, S. J., Zhang, B. T., Hong, D., Lee, D.: Practical bipedal walking control on uneven terrain using surface learning and push recovery. pp. 3963–3968. https://doi.org/10.1109/IROS.2011.6095131 (2011)
Acknowledgments
The authors thank ITAndroids’ sponsors: Altium, Cenic, Intel, ITAEx, MathWorks, Metinjo, Micropress, Polimold, Rapid, SolidWorks, STMicroelectronics, Wildlife Studios, and Virtual.PYXIS. A special thanks goes to Intel for providing the computational resources and specialized AI software. Finally, we are also grateful to all members of the ITAndroids team for developing the base code used in this research.
Funding
This research received no external funding.
Author information
Authors and Affiliations
Contributions
All authors have contributed to the concept and design of the research. Dicksiano Melo is the main contributor: developed the RL formulations, implemented the source code, executed the experiments, and prepared this manuscript. Marcos Maximo and Adilson Cunha assumed advisor roles during the research, discussing ideas, and revising the text. The final manuscript was revised and approved by all authors.
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
The authors declare that they have no conflicts of interest/competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Availability of data and material
No extra data or material is available.
Code availability
gitlab.com/itandroids/open-projects/learning-push-recovery-strategies-for-bipedal-walking
Appendix: Experimental Parameters
Appendix: Experimental Parameters
Rights and permissions
About this article
Cite this article
Melo, D.C., Maximo, M.R.O.A. & da Cunha, A.M. Learning Push Recovery Behaviors for Humanoid Walking Using Deep Reinforcement Learning. J Intell Robot Syst 106, 8 (2022). https://doi.org/10.1007/s10846-022-01656-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-022-01656-7
Keywords
Profiles
- Adilson Marques da Cunha View author profile