Skip to main content
Log in

Learning Push Recovery Behaviors for Humanoid Walking Using Deep Reinforcement Learning

  • Regular paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

The development of a robust and versatile biped walking engine might be considered one of the hardest problems in Mobile Robotics. Even well-developed cities contains obstacles that make the navigation of these agents without a human assistance infeasible. Therefore, it is primordial that they be able to restore dynamically their own balance when subject to certain types of external disturbances. Thereby, this article contributes with a implementation of a Push Recovery controller that improves the walking engine’s performance used by a simulated humanoid agent from RoboCup 3D Soccer Simulation League environment. This work applies Proximal Policy Optimization in order to learn a movement policy in this simulator. Our learned policy was able to surpass the baselines with statistical significance. Finally, we propose two approaches based on Transfer Learning and Imitation Learning to achieve a final policy which performs well across an wide range disturbance directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abreu, M., Lau, N., Sousa, A., Reis, L. P.: Learning Low Level Skills from Scratch for Humanoid Robot Soccer Using Deep Reinforcement Learning. In: 2019 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pp. 1–8 (2019), https://doi.org/10.1109/ICARSC.2019.8733632

  2. Abreu, M., Reis, L. P., Lau, N.: Learning to Run Faster in a Humanoid Robot Soccer Environment through Reinforcement Learning. In: Chalup, S., Niemueller, T., Suthakorn, J., Williams, M. A. (eds.) Robocup 2019: Robot World Cup XXIII, pp 3–15. Springer International Publishing, Cham (2019)

  3. Abreu, M., Simes, D., Lau, N., Reis, L.P.: Fast, human-like running and sprinting. https://archive.robocup.info/Soccer/Simulation/3D/FCPs/RoboCup/2019/FCPortugal_SS3D_RC2019_FCP.pdf (2019)

  4. de Albuquerque Maximo, M. R. O.: Automatic Walking Step Duration through Model Predictive Control. Ph.D. thesis, Aeronautics Institute of Technology (2017)

  5. Bain, M., Sammut, C.: A Framework for Behavioural Cloning. In: Machine Intelligence 15 (1995)

  6. Carvalho Melo, D., Quartucci Forster, C.H., Omena de Albuquerque Maximó, M.R.: Learning When to Kick through Deep Neural Networks. In: 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE), pp. 43–48 (2019)

  7. Carvalho Melo, L., Omena Albuquerque Maximó, M.R.: Learning Humanoid Robot Running Skills through Proximal Policy Optimization. In: 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE), pp. 37–42 (2019)

  8. Chaffre, T., Moras, J., Chan-Hon-Tong, A., Marzat, J.: Sim-to-real transfer with incremental environment complexity for reinforcement learning of depth-based robot navigation (2020)

  9. Colas, C., Sigaud, O., Oudeyer, P.: How many random seeds? statistical power analysis in deep reinforcement learning experiments. arXiv:1806.08295 (2018)

  10. Depinet, M., MacAlpine, P., Stone, P.: Keyframe Sampling, Optimization, and Behavior Integration: Towards Long-Distance Kicking in the Robocup 3D Simulation League. In: Bianchi, R. A. C., Akin, H. L., Ramamoorthy, S., Sugiura, K. (eds.) RoboCup-2014: Robot Soccer World Cup XVIII, Lecture Notes in Artificial Intelligence. Springer Verlag, Berlin (2015)

  11. Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines https://github.com/openai/baselines (2017)

  12. Dorer, K.: Learning to Use Toes in a Humanoid Robot. In: Akiyama, H., Obst, O., Sammut, C., Tonidandel, F. (eds.) Robocup 2017: Robot World Cup XXI, pp 168–179. Springer International Publishing, Cham (2018)

  13. Duan, Y., Andrychowicz, M., Stadie, B.C., Ho, J., Schneider, J., Sutskever, I., Abbeel, P., Zaremba, W.: One-shot imitation learning. arXiv:1703.07326 (2017)

  14. Dunbar, D. C., Horak, F. B., Macpherson, J., Rushmer, D. S.: Neural control of quadrupedal and bipedal stance: implications for the evolution of erect posture. American journal of physical anthropology 69 (1), 93–105 (1986)

    Article  Google Scholar 

  15. Efron, B., Tibshirani, R.: Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci. 1(1), 54–75 (1986). https://doi.org/10.1214/ss/1177013815

    Article  MathSciNet  MATH  Google Scholar 

  16. Farchy, A., Barrett, S., MacAlpine, P., Stone, P.: Humanoid Robots Learning to Walk Faster: from the Real World to Simulation and Back. In: Proceedings of 12Th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2013)

  17. Fischer, J., Dorer, K.: Learning a walk behavior utilizing toes from scratch. https://archive.robocup.info/Soccer/Simulation/3D/FCPs/RoboCup/2019/magmaOffenburg_SS3D_RC2019_FCP.pdf (2019)

  18. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT press (2016)

  19. Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: an empirical investigation of catastrophic forgetting in gradient-based neural networks (2015)

  20. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv:1801.01290 (2018)

  21. Hofmann, A.: Robust execution of bipedal walking tasks from biomechanical principles (2006)

  22. Horak, F., Henry, S., Shumway-Cook, A.: Postural perturbations: New insights for treatment of balance disorders. Physical therapy 77, 517–33 (1997). https://doi.org/10.1093/ptj/77.5.517

    Article  Google Scholar 

  23. Horak, F., Macpherson, J.: Postural Orientation and Equilibrium. In: Handbook of Physiology. Exercise: Regulation and Integration of Multiple Systems. MD1 am Physiol Soc pp. 255–292 (1996)

  24. James, S., Wohlhart, P., Kalakrishnan, M., Kalashnikov, D., Irpan, A., Ibarz, J., Levine, S., Hadsell, R., Bousmalis, K.: Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks (2019)

  25. Kajita, S., Kanehiro, F., Kaneko, K., Yokoi, K., Hirukawa, H.: The 3D Linear Inverted Pendulum mode: A simple modeling for a biped walking pattern generation. In: Proceedings of the 2001IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, Hawaii, USA (2001)

  26. Kim, H., Seo, D., Kim, D.: Push Recovery Control for Humanoid Robot Using Reinforcement Learning. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 488–492 (2019), https://doi.org/10.1109/IRC.2019.00102

  27. Leike, J., Martic, M., Krakovna, V., Ortega, P. A., Everitt, T., Lefrancq, A., Orseau, L., Legg, S.: Ai safety gridworlds (2017)

  28. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)

  29. MacAlpine, P., Barrett, S., Urieli, D., Vu, V., Stone, P.: Design and optimization of an omnidirectional humanoid walk: A winning approach at the roboCup 2011 3D simulation competition. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI) (2012)

  30. MacAlpine, P., Collins, N., Lopez-Mobilia, A., Stone, P.: UT Austin Villa: RoboCup 2012 3D Simulation League Champion. In: Chen, X., Stone, P., Sucar, L. E., der Zant, T. V. (eds.) RoboCup-2012: Robot Soccer World Cup XVI, Lecture Notes in Artificial Intelligence. Springer Verlag, Berlin (2013)

  31. MacAlpine, P., Stone, P.: Overlapping layered learning. Artificial Intelligence 254, 21–43 (2018). https://doi.org/10.1016/j.artint.2017.09.001 . https://www.sciencedirect.com/science/article/pii/S0004370217301066

    Article  MathSciNet  Google Scholar 

  32. MacAlpine, P., Stone, P.: UT Austin Villa: RoboCup 2017 3D Simulation League Competition and Technical Challenges Champions. In: Sammut, C., Obst, O., Tonidandel, F., Akyama, H. (eds.) RoboCup 2017: Robot Soccer World Cup XXI, Lecture Notes in Artificial Intelligence. Springer (2018)

  33. Maximo, M.R., Colombini, E.L., Ribeiro, C.H.: Stable and fast model-free walk with arms movement for humanoid robots. International Journal of Advanced Robotic Systems 14(3), 1729881416675135 (2017). https://doi.org/10.1177/1729881416675135

    Article  Google Scholar 

  34. Maximo, M. R. O. A.: Omnidirectional ZMP-based walking for a humanoid robot. Master’s Thesis, Instituto tecnológico de aeronáutica, são josé dos Campos, SP Brazil (2015)

  35. Maximo, M. R. O. A., Ribeiro, C. H. C.: ZMP-Based Humanoid Walking Engine with Arms Movement and Stabilization. In: Proceedings of the 2016 Congresso Brasileiro de Automática (CBA). SBA, Vitória, ES, Brazil (2016)

  36. Maximo, M. R. O. A., Ribeiro, C. H. C., Afonso, R. J. M.: Modeling of a position servo used in robotics applications. In: Proceedings of the 2017 Simpósio Brasileiro de Automação Inteligente (SBAI). SBA, Porto Alegre, SC, Brazil (2017)

  37. Melo, D. C.: Learning Push Recovery Strategies for Bipedal Walking. Master’s Thesis, Instituto tecnológico de aeronáutica, são josé dos Campos, SP Brazil (2021)

  38. Melo, D. C., Máximo, M.R.O.A., da Cunha, A.M.: Push recovery strategies through deep reinforcement learning. In: 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE), pp. 1–6 (2020), https://doi.org/10.1109/LARS/SBR/WRE51543.2020.9306967

  39. Melo, L. C., Maximo, M.R.O.A.: Learning humanoid robot running skills through proximal policy optimization (2019)

  40. Melo, L. C., Maximo, M. R. O. A., da Cunha, A. M.: Bottom-up meta-policy search. In: Proceedings of the Deep Reinforcement Learning Workshop of NeurIPS 2019 (2019)

  41. Mitchell, E., Rafailov, R., Peng, X. B., Levine, S., Finn, C.: Offline meta-reinforcement learning with advantage weighting (2020)

  42. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with Deep Reinforcement Learning. In: NIPS Deep Learning Workshop (2013)

  43. Muniz, F., Maximo, M. R. O. A., Ribeiro, C. H. C.: Keyframe Movement Optimization for Simulated Humanoid Robot Using a Parallel Optimization Framework. In: 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR), pp. 79–84 (2016), https://doi.org/10.1109/LARS-SBR.2016.20https://doi.org/10.1109/LARS-SBR.2016.20

  44. Muzio, A., Aguiar, L., Maximo, M. R. O. A., Pinto, S. C.: Monte Carlo Localization with Field Lines Observations for Simulated Humanoid Robotic Soccer. In: 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR), pp 334–339. IEEE, Recife, PE, Brazil (2016), https://doi.org/10.1109/LARS-SBR.2016.63

  45. Muzio, A.F.V.: Deep reinforcement learning applied to humanoid robots (2017)

  46. Muzio, A. F. V., Maximo, M. R. O. A., Yoneyama, T.: Deep Reinforcement Learning for Humanoid Robot Dribbling. In: 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE), pp. 1–6 (2020), https://doi.org/10.1109/LARS/SBR/WRE51543.2020.9307084

  47. Nashner, L.: Analysis of stance posture in humans (1981)

  48. Nashner, L. M., McCollum, G.: The organization of human postural movements: a formal basis and experimental synthesis. Behavioral and Brain Sciences 8(1), 135–150 (1985). https://doi.org/10.1017/S0140525X00020008

    Article  Google Scholar 

  49. Oh, J., Singh, S.P., Lee, H., Kohli, P.: Zero-shot task generalization with multi-task deep reinforcement learning. arXiv:1706.05064 (2017)

  50. OpenAI, Andrychowicz, M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. arXiv:1808.00177(2018)

  51. Orin, D. E., Goswani, A., Lee, S. H.: Centroidal dynamics of a humanoid robot. Auton. Robot. 35, 161–176 (2013)

    Article  Google Scholar 

  52. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. https://doi.org/10.3115/1073083.1073135https://doi.org/10.3115/1073083.1073135 (2002)

  53. Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37, 4 (2018)

  54. Peng, X. B., Berseth, G., Yin, K., van de Panne, M.: Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (Proc SIGGRAPH 2017) 36(4) (2017)

  55. Rebula, J., Canas, F., Pratt, J., Goswami, A.: Learning capture points for bipedal push recovery. pp. 1774–1774. https://doi.org/10.1109/ROBOT.2008.4543460 (2008)

  56. Rietdyk, S., Patla, A., Winter, D., Ishac, M., Little, C.: Balance recovery from medio-lateral perturbations of the upper body during standing. Journal of Biomechanics 32(11), 1149–1158 (1999). https://doi.org/10.1016/S0021-9290(99)00116-5. http://www.sciencedirect.com/science/article/pii/S0021929099001165

    Article  Google Scholar 

  57. Runge, C., Shupert, C., Horak, F., Zajac, F.: Ankle and hip postural strategies defined by joint torques. Gait and Posture 10(2), 161–170 (1999). https://doi.org/10.1016/S0966-6362(99)00032-6

    Article  Google Scholar 

  58. Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3, 233–242 (1999)

    Article  Google Scholar 

  59. Schroff, F., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2015.7298682https://doi.org/10.1109/cvpr.2015.7298682 (2015)

  60. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. arXiv:1502.05477 (2015)

  61. Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. arXiv:1506.02438 (2016)

  62. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)

  63. Yi, S.-J., Zhang, B.-T., Hong, D., Lee, D.D.: Learning Full Body Push Recovery Control for Small Humanoid Robots. In: 2011 IEEE International Conference on Robotics and Automation, pp. 2047–2052 (2011), https://doi.org/10.1109/ICRA.2011.5980531

  64. Shafiee-Ashtiani, M., Yousefi-Koma, A., Mirjalili, R., Maleki, H., Karimi, M.: Push recovery of a position-controlled humanoid robot based on capture point feedback control (2017)

  65. Shafii, N., Aslani, S., Nezami, O. M., Shiry, S.: Evolution of Biped Walking Using Truncated Fourier Series and Particle Swarm Optimization. In: Robocup 2009: Robot Soccer World Cup XIII, pp 344–354. Springer, Singapore (2010)

  66. Siegwart, R., Nourbakhsh, I. R., Scaramuzza, D.: Introduction to autonomous mobile robots. The MIT press, Cambridge (2011)

    Google Scholar 

  67. Singh, A., Jang, E., Irpan, A., Kappler, D., Dalal, M., Levine, S., Khansari, M., Finn, C.: Scalable multi-task imitation learning with autonomous improvement (2020)

  68. Stephens, B.: Humanoid Push Recovery. In: 2007 7Th IEEE-RAS International Conference on Humanoid Robots, pp. 589–595 (2007), https://doi.org/10.1109/ICHR.2007.4813931

  69. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press. http://incompleteideas.net/book/the-book-2nd.html (2018)

  70. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. arXiv:1808.01974 (2018)

  71. Tanwani, A.K.: Domain-invariant representation learning for sim-to-real transfer (2020)

  72. Tedrake, R. L.: Applied Optimal Control for Dynamically Stable Legged Locomotion. Ph.D. thesis Massachusetts Institute of Technology (2004)

  73. Ting, L.H.: Postural Synergies, pp. 3228–3233. Springer, Berlin Heidelberg (2009). https://doi.org/10.1007/978-3-540-29678-2∖_4716https://doi.org/10.1007/978-3-540-29678-2∖_4716

    Google Scholar 

  74. Todorov, E., Erez, T., Tassa, Y.: Mujoco: a Physics Engine for Model-Based Control. In: IROS, pp. 5026–5033. IEEE (2012)

  75. Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation. arXiv:1805.01954(2018)

  76. Vatankhah, H., Lau, N., MacAlpine, P., van Dijk, S., Glaser, S.: Simspark https://gitlab.com/robocup-sim/SimSpark (2018)

  77. Vukobratović, M., Borovac, B.: Zero-Moment Point – thirty five years of its life. International Journal of Humanoid Robots 1(1), 157–173 (2004)

    Article  Google Scholar 

  78. Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., de Freitas, N.: Sample efficient actor-critic with experience replay. arXiv:1611.01224 (2016)

  79. Xie, Z., Clary, P., Dao, J., Morais, P., Hurst, J., van de Panne, M.: Iterative reinforcement learning based design of dynamic locomotion skills for cassie (2019)

  80. Xu, Y., Vatankhah, H.: Simspark: an Open Source Robot Simulator Developed by the Robocup Community. In: Behnke, S., Veloso, M., Visser, A., Xiong, R. (eds.) Robocup 2013: Robot World Cup XVII, pp 632–639. Springer, Berlin, Heidelberg (2014)

  81. Yang, C., Komura, T., Li, Z.: Emergence of Human-Comparable Balancing Behaviours by Deep Reinforcement Learning. In: 2017 IEEE-RAS 17Th International Conference on Humanoid Robotics (Humanoids), pp. 372–377 (2017), https://doi.org/10.1109/HUMANOIDS.2017.8246900

  82. Yang, C., Yuan, K., Merkt, W., Komura, T., Vijayakumar, S., Li, Z.: Learning Whole-Body Motor Skills for Humanoids. In: 2018 IEEE-RAS 18Th International Conference on Humanoid Robots (Humanoids), pp. 270–276 (2018), https://doi.org/10.1109/HUMANOIDS.2018.8625045

  83. Yi, S., Zhang, B., Hong, D., Lee, D. D.: Online Learning of Low Dimensional Strategies for High-Level Push Recovery in Bipedal Humanoid Robots. In: 2013 IEEE International Conference on Robotics and Automation, pp. 1649–1655 (2013), https://doi.org/10.1109/ICRA.2013.6630791

  84. Yi, S. J., Zhang, B. T., Hong, D., Lee, D.: Online learning of a full body push recovery controller for omnidirectional walking. pp. 1–6. https://doi.org/10.1109/Humanoids.2011.6100896 (2011)

  85. Yi, S. J., Zhang, B. T., Hong, D., Lee, D.: Practical bipedal walking control on uneven terrain using surface learning and push recovery. pp. 3963–3968. https://doi.org/10.1109/IROS.2011.6095131 (2011)

Download references

Acknowledgments

The authors thank ITAndroids’ sponsors: Altium, Cenic, Intel, ITAEx, MathWorks, Metinjo, Micropress, Polimold, Rapid, SolidWorks, STMicroelectronics, Wildlife Studios, and Virtual.PYXIS. A special thanks goes to Intel for providing the computational resources and specialized AI software. Finally, we are also grateful to all members of the ITAndroids team for developing the base code used in this research.

Funding

This research received no external funding.

Author information

Authors and Affiliations

Authors

Contributions

All authors have contributed to the concept and design of the research. Dicksiano Melo is the main contributor: developed the RL formulations, implemented the source code, executed the experiments, and prepared this manuscript. Marcos Maximo and Adilson Cunha assumed advisor roles during the research, discussing ideas, and revising the text. The final manuscript was revised and approved by all authors.

Corresponding author

Correspondence to Dicksiano C. Melo.

Ethics declarations

Conflicts of interest/Competing interests

The authors declare that they have no conflicts of interest/competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Availability of data and material

No extra data or material is available.

Code availability

gitlab.com/itandroids/open-projects/learning-push-recovery-strategies-for-bipedal-walking

Appendix: Experimental Parameters

Appendix: Experimental Parameters

Table 4 PPO Hyperparameters for JPL, WSC, RANP, RAS and RAU
Table 5 Experiment parameters for JPL, WSC, RANP, RAS and RAU

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Melo, D.C., Maximo, M.R.O.A. & da Cunha, A.M. Learning Push Recovery Behaviors for Humanoid Walking Using Deep Reinforcement Learning. J Intell Robot Syst 106, 8 (2022). https://doi.org/10.1007/s10846-022-01656-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-022-01656-7

Keywords

Navigation