Learning Push Recovery Behaviors for Humanoid Walking Using Deep Reinforcement Learning

Melo, Dicksiano C.; Maximo, Marcos R. O. A.; da Cunha, Adilson Marques

doi:10.1007/s10846-022-01656-7

Learning Push Recovery Behaviors for Humanoid Walking Using Deep Reinforcement Learning

Regular paper
Published: 20 August 2022

Volume 106, article number 8, (2022)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

292 Accesses
3 Citations
Explore all metrics

Abstract

The development of a robust and versatile biped walking engine might be considered one of the hardest problems in Mobile Robotics. Even well-developed cities contains obstacles that make the navigation of these agents without a human assistance infeasible. Therefore, it is primordial that they be able to restore dynamically their own balance when subject to certain types of external disturbances. Thereby, this article contributes with a implementation of a Push Recovery controller that improves the walking engine’s performance used by a simulated humanoid agent from RoboCup 3D Soccer Simulation League environment. This work applies Proximal Policy Optimization in order to learn a movement policy in this simulator. Our learned policy was able to surpass the baselines with statistical significance. Finally, we propose two approaches based on Transfer Learning and Imitation Learning to achieve a final policy which performs well across an wide range disturbance directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Understanding the stability of deep control policies for biped locomotion

Article 03 January 2022

Control of Wheeled-Legged Quadrupeds Using Deep Reinforcement Learning

Push Recovery and Active Balancing for Inexpensive Humanoid Robots Using RL and DRL

References

Abreu, M., Lau, N., Sousa, A., Reis, L. P.: Learning Low Level Skills from Scratch for Humanoid Robot Soccer Using Deep Reinforcement Learning. In: 2019 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pp. 1–8 (2019), https://doi.org/10.1109/ICARSC.2019.8733632
Abreu, M., Reis, L. P., Lau, N.: Learning to Run Faster in a Humanoid Robot Soccer Environment through Reinforcement Learning. In: Chalup, S., Niemueller, T., Suthakorn, J., Williams, M. A. (eds.) Robocup 2019: Robot World Cup XXIII, pp 3–15. Springer International Publishing, Cham (2019)
Abreu, M., Simes, D., Lau, N., Reis, L.P.: Fast, human-like running and sprinting. https://archive.robocup.info/Soccer/Simulation/3D/FCPs/RoboCup/2019/FCPortugal_SS3D_RC2019_FCP.pdf (2019)
de Albuquerque Maximo, M. R. O.: Automatic Walking Step Duration through Model Predictive Control. Ph.D. thesis, Aeronautics Institute of Technology (2017)
Bain, M., Sammut, C.: A Framework for Behavioural Cloning. In: Machine Intelligence 15 (1995)
Carvalho Melo, D., Quartucci Forster, C.H., Omena de Albuquerque Maximó, M.R.: Learning When to Kick through Deep Neural Networks. In: 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE), pp. 43–48 (2019)
Carvalho Melo, L., Omena Albuquerque Maximó, M.R.: Learning Humanoid Robot Running Skills through Proximal Policy Optimization. In: 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE), pp. 37–42 (2019)
Chaffre, T., Moras, J., Chan-Hon-Tong, A., Marzat, J.: Sim-to-real transfer with incremental environment complexity for reinforcement learning of depth-based robot navigation (2020)
Colas, C., Sigaud, O., Oudeyer, P.: How many random seeds? statistical power analysis in deep reinforcement learning experiments. arXiv:1806.08295 (2018)
Depinet, M., MacAlpine, P., Stone, P.: Keyframe Sampling, Optimization, and Behavior Integration: Towards Long-Distance Kicking in the Robocup 3D Simulation League. In: Bianchi, R. A. C., Akin, H. L., Ramamoorthy, S., Sugiura, K. (eds.) RoboCup-2014: Robot Soccer World Cup XVIII, Lecture Notes in Artificial Intelligence. Springer Verlag, Berlin (2015)
Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines https://github.com/openai/baselines (2017)
Dorer, K.: Learning to Use Toes in a Humanoid Robot. In: Akiyama, H., Obst, O., Sammut, C., Tonidandel, F. (eds.) Robocup 2017: Robot World Cup XXI, pp 168–179. Springer International Publishing, Cham (2018)
Duan, Y., Andrychowicz, M., Stadie, B.C., Ho, J., Schneider, J., Sutskever, I., Abbeel, P., Zaremba, W.: One-shot imitation learning. arXiv:1703.07326 (2017)
Dunbar, D. C., Horak, F. B., Macpherson, J., Rushmer, D. S.: Neural control of quadrupedal and bipedal stance: implications for the evolution of erect posture. American journal of physical anthropology 69 (1), 93–105 (1986)
Article Google Scholar
Efron, B., Tibshirani, R.: Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci. 1(1), 54–75 (1986). https://doi.org/10.1214/ss/1177013815
Article MathSciNet MATH Google Scholar
Farchy, A., Barrett, S., MacAlpine, P., Stone, P.: Humanoid Robots Learning to Walk Faster: from the Real World to Simulation and Back. In: Proceedings of 12Th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2013)
Fischer, J., Dorer, K.: Learning a walk behavior utilizing toes from scratch. https://archive.robocup.info/Soccer/Simulation/3D/FCPs/RoboCup/2019/magmaOffenburg_SS3D_RC2019_FCP.pdf (2019)
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT press (2016)
Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: an empirical investigation of catastrophic forgetting in gradient-based neural networks (2015)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv:1801.01290 (2018)
Hofmann, A.: Robust execution of bipedal walking tasks from biomechanical principles (2006)
Horak, F., Henry, S., Shumway-Cook, A.: Postural perturbations: New insights for treatment of balance disorders. Physical therapy 77, 517–33 (1997). https://doi.org/10.1093/ptj/77.5.517
Article Google Scholar
Horak, F., Macpherson, J.: Postural Orientation and Equilibrium. In: Handbook of Physiology. Exercise: Regulation and Integration of Multiple Systems. MD1 am Physiol Soc pp. 255–292 (1996)
James, S., Wohlhart, P., Kalakrishnan, M., Kalashnikov, D., Irpan, A., Ibarz, J., Levine, S., Hadsell, R., Bousmalis, K.: Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks (2019)
Kajita, S., Kanehiro, F., Kaneko, K., Yokoi, K., Hirukawa, H.: The 3D Linear Inverted Pendulum mode: A simple modeling for a biped walking pattern generation. In: Proceedings of the 2001IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, Hawaii, USA (2001)
Kim, H., Seo, D., Kim, D.: Push Recovery Control for Humanoid Robot Using Reinforcement Learning. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 488–492 (2019), https://doi.org/10.1109/IRC.2019.00102
Leike, J., Martic, M., Krakovna, V., Ortega, P. A., Everitt, T., Lefrancq, A., Orseau, L., Legg, S.: Ai safety gridworlds (2017)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
MacAlpine, P., Barrett, S., Urieli, D., Vu, V., Stone, P.: Design and optimization of an omnidirectional humanoid walk: A winning approach at the roboCup 2011 3D simulation competition. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI) (2012)
MacAlpine, P., Collins, N., Lopez-Mobilia, A., Stone, P.: UT Austin Villa: RoboCup 2012 3D Simulation League Champion. In: Chen, X., Stone, P., Sucar, L. E., der Zant, T. V. (eds.) RoboCup-2012: Robot Soccer World Cup XVI, Lecture Notes in Artificial Intelligence. Springer Verlag, Berlin (2013)
MacAlpine, P., Stone, P.: Overlapping layered learning. Artificial Intelligence 254, 21–43 (2018). https://doi.org/10.1016/j.artint.2017.09.001 . https://www.sciencedirect.com/science/article/pii/S0004370217301066
Article MathSciNet Google Scholar
MacAlpine, P., Stone, P.: UT Austin Villa: RoboCup 2017 3D Simulation League Competition and Technical Challenges Champions. In: Sammut, C., Obst, O., Tonidandel, F., Akyama, H. (eds.) RoboCup 2017: Robot Soccer World Cup XXI, Lecture Notes in Artificial Intelligence. Springer (2018)
Maximo, M.R., Colombini, E.L., Ribeiro, C.H.: Stable and fast model-free walk with arms movement for humanoid robots. International Journal of Advanced Robotic Systems 14(3), 1729881416675135 (2017). https://doi.org/10.1177/1729881416675135
Article Google Scholar
Maximo, M. R. O. A.: Omnidirectional ZMP-based walking for a humanoid robot. Master’s Thesis, Instituto tecnológico de aeronáutica, são josé dos Campos, SP Brazil (2015)
Maximo, M. R. O. A., Ribeiro, C. H. C.: ZMP-Based Humanoid Walking Engine with Arms Movement and Stabilization. In: Proceedings of the 2016 Congresso Brasileiro de Automática (CBA). SBA, Vitória, ES, Brazil (2016)
Maximo, M. R. O. A., Ribeiro, C. H. C., Afonso, R. J. M.: Modeling of a position servo used in robotics applications. In: Proceedings of the 2017 Simpósio Brasileiro de Automação Inteligente (SBAI). SBA, Porto Alegre, SC, Brazil (2017)
Melo, D. C.: Learning Push Recovery Strategies for Bipedal Walking. Master’s Thesis, Instituto tecnológico de aeronáutica, são josé dos Campos, SP Brazil (2021)
Melo, D. C., Máximo, M.R.O.A., da Cunha, A.M.: Push recovery strategies through deep reinforcement learning. In: 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE), pp. 1–6 (2020), https://doi.org/10.1109/LARS/SBR/WRE51543.2020.9306967
Melo, L. C., Maximo, M.R.O.A.: Learning humanoid robot running skills through proximal policy optimization (2019)
Melo, L. C., Maximo, M. R. O. A., da Cunha, A. M.: Bottom-up meta-policy search. In: Proceedings of the Deep Reinforcement Learning Workshop of NeurIPS 2019 (2019)
Mitchell, E., Rafailov, R., Peng, X. B., Levine, S., Finn, C.: Offline meta-reinforcement learning with advantage weighting (2020)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with Deep Reinforcement Learning. In: NIPS Deep Learning Workshop (2013)
Muniz, F., Maximo, M. R. O. A., Ribeiro, C. H. C.: Keyframe Movement Optimization for Simulated Humanoid Robot Using a Parallel Optimization Framework. In: 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR), pp. 79–84 (2016), https://doi.org/10.1109/LARS-SBR.2016.20 https://doi.org/10.1109/LARS-SBR.2016.20
Muzio, A., Aguiar, L., Maximo, M. R. O. A., Pinto, S. C.: Monte Carlo Localization with Field Lines Observations for Simulated Humanoid Robotic Soccer. In: 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR), pp 334–339. IEEE, Recife, PE, Brazil (2016), https://doi.org/10.1109/LARS-SBR.2016.63
Muzio, A.F.V.: Deep reinforcement learning applied to humanoid robots (2017)
Muzio, A. F. V., Maximo, M. R. O. A., Yoneyama, T.: Deep Reinforcement Learning for Humanoid Robot Dribbling. In: 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE), pp. 1–6 (2020), https://doi.org/10.1109/LARS/SBR/WRE51543.2020.9307084
Nashner, L.: Analysis of stance posture in humans (1981)
Nashner, L. M., McCollum, G.: The organization of human postural movements: a formal basis and experimental synthesis. Behavioral and Brain Sciences 8(1), 135–150 (1985). https://doi.org/10.1017/S0140525X00020008
Article Google Scholar
Oh, J., Singh, S.P., Lee, H., Kohli, P.: Zero-shot task generalization with multi-task deep reinforcement learning. arXiv:1706.05064 (2017)
OpenAI, Andrychowicz, M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. arXiv:1808.00177(2018)
Orin, D. E., Goswani, A., Lee, S. H.: Centroidal dynamics of a humanoid robot. Auton. Robot. 35, 161–176 (2013)
Article Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. https://doi.org/10.3115/1073083.1073135 https://doi.org/10.3115/1073083.1073135 (2002)
Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37, 4 (2018)
Peng, X. B., Berseth, G., Yin, K., van de Panne, M.: Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (Proc SIGGRAPH 2017) 36(4) (2017)
Rebula, J., Canas, F., Pratt, J., Goswami, A.: Learning capture points for bipedal push recovery. pp. 1774–1774. https://doi.org/10.1109/ROBOT.2008.4543460 (2008)
Rietdyk, S., Patla, A., Winter, D., Ishac, M., Little, C.: Balance recovery from medio-lateral perturbations of the upper body during standing. Journal of Biomechanics 32(11), 1149–1158 (1999). https://doi.org/10.1016/S0021-9290(99)00116-5. http://www.sciencedirect.com/science/article/pii/S0021929099001165
Article Google Scholar
Runge, C., Shupert, C., Horak, F., Zajac, F.: Ankle and hip postural strategies defined by joint torques. Gait and Posture 10(2), 161–170 (1999). https://doi.org/10.1016/S0966-6362(99)00032-6
Article Google Scholar
Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3, 233–242 (1999)
Article Google Scholar
Schroff, F., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2015.7298682 https://doi.org/10.1109/cvpr.2015.7298682 (2015)
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. arXiv:1502.05477 (2015)
Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. arXiv:1506.02438 (2016)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Yi, S.-J., Zhang, B.-T., Hong, D., Lee, D.D.: Learning Full Body Push Recovery Control for Small Humanoid Robots. In: 2011 IEEE International Conference on Robotics and Automation, pp. 2047–2052 (2011), https://doi.org/10.1109/ICRA.2011.5980531
Shafiee-Ashtiani, M., Yousefi-Koma, A., Mirjalili, R., Maleki, H., Karimi, M.: Push recovery of a position-controlled humanoid robot based on capture point feedback control (2017)
Shafii, N., Aslani, S., Nezami, O. M., Shiry, S.: Evolution of Biped Walking Using Truncated Fourier Series and Particle Swarm Optimization. In: Robocup 2009: Robot Soccer World Cup XIII, pp 344–354. Springer, Singapore (2010)
Siegwart, R., Nourbakhsh, I. R., Scaramuzza, D.: Introduction to autonomous mobile robots. The MIT press, Cambridge (2011)
Google Scholar
Singh, A., Jang, E., Irpan, A., Kappler, D., Dalal, M., Levine, S., Khansari, M., Finn, C.: Scalable multi-task imitation learning with autonomous improvement (2020)
Stephens, B.: Humanoid Push Recovery. In: 2007 7Th IEEE-RAS International Conference on Humanoid Robots, pp. 589–595 (2007), https://doi.org/10.1109/ICHR.2007.4813931
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press. http://incompleteideas.net/book/the-book-2nd.html (2018)
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. arXiv:1808.01974 (2018)
Tanwani, A.K.: Domain-invariant representation learning for sim-to-real transfer (2020)
Tedrake, R. L.: Applied Optimal Control for Dynamically Stable Legged Locomotion. Ph.D. thesis Massachusetts Institute of Technology (2004)
Ting, L.H.: Postural Synergies, pp. 3228–3233. Springer, Berlin Heidelberg (2009). https://doi.org/10.1007/978-3-540-29678-2∖_4716 https://doi.org/10.1007/978-3-540-29678-2∖_4716
Google Scholar
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a Physics Engine for Model-Based Control. In: IROS, pp. 5026–5033. IEEE (2012)
Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation. arXiv:1805.01954(2018)
Vatankhah, H., Lau, N., MacAlpine, P., van Dijk, S., Glaser, S.: Simspark https://gitlab.com/robocup-sim/SimSpark (2018)
Vukobratović, M., Borovac, B.: Zero-Moment Point – thirty five years of its life. International Journal of Humanoid Robots 1(1), 157–173 (2004)
Article Google Scholar
Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., de Freitas, N.: Sample efficient actor-critic with experience replay. arXiv:1611.01224 (2016)
Xie, Z., Clary, P., Dao, J., Morais, P., Hurst, J., van de Panne, M.: Iterative reinforcement learning based design of dynamic locomotion skills for cassie (2019)
Xu, Y., Vatankhah, H.: Simspark: an Open Source Robot Simulator Developed by the Robocup Community. In: Behnke, S., Veloso, M., Visser, A., Xiong, R. (eds.) Robocup 2013: Robot World Cup XVII, pp 632–639. Springer, Berlin, Heidelberg (2014)
Yang, C., Komura, T., Li, Z.: Emergence of Human-Comparable Balancing Behaviours by Deep Reinforcement Learning. In: 2017 IEEE-RAS 17Th International Conference on Humanoid Robotics (Humanoids), pp. 372–377 (2017), https://doi.org/10.1109/HUMANOIDS.2017.8246900
Yang, C., Yuan, K., Merkt, W., Komura, T., Vijayakumar, S., Li, Z.: Learning Whole-Body Motor Skills for Humanoids. In: 2018 IEEE-RAS 18Th International Conference on Humanoid Robots (Humanoids), pp. 270–276 (2018), https://doi.org/10.1109/HUMANOIDS.2018.8625045
Yi, S., Zhang, B., Hong, D., Lee, D. D.: Online Learning of Low Dimensional Strategies for High-Level Push Recovery in Bipedal Humanoid Robots. In: 2013 IEEE International Conference on Robotics and Automation, pp. 1649–1655 (2013), https://doi.org/10.1109/ICRA.2013.6630791
Yi, S. J., Zhang, B. T., Hong, D., Lee, D.: Online learning of a full body push recovery controller for omnidirectional walking. pp. 1–6. https://doi.org/10.1109/Humanoids.2011.6100896 (2011)
Yi, S. J., Zhang, B. T., Hong, D., Lee, D.: Practical bipedal walking control on uneven terrain using surface learning and push recovery. pp. 3963–3968. https://doi.org/10.1109/IROS.2011.6095131 (2011)

Download references

Acknowledgments

The authors thank ITAndroids’ sponsors: Altium, Cenic, Intel, ITAEx, MathWorks, Metinjo, Micropress, Polimold, Rapid, SolidWorks, STMicroelectronics, Wildlife Studios, and Virtual.PYXIS. A special thanks goes to Intel for providing the computational resources and specialized AI software. Finally, we are also grateful to all members of the ITAndroids team for developing the base code used in this research.

Funding

This research received no external funding.

Author information

Authors and Affiliations

Autonomous Computational Systems Lab (LAB-SCA), Computer Science Division, Aeronautics Institute of Technology, Praça Marechal Eduardo Gomes, 50, Vila das Acácias, 12228-900, São José dos Campos, SP, Brazil
Dicksiano C. Melo, Marcos R. O. A. Maximo & Adilson Marques da Cunha

Authors

Dicksiano C. Melo
View author publications
You can also search for this author in PubMed Google Scholar
Marcos R. O. A. Maximo
View author publications
You can also search for this author in PubMed Google Scholar
Adilson Marques da Cunha
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have contributed to the concept and design of the research. Dicksiano Melo is the main contributor: developed the RL formulations, implemented the source code, executed the experiments, and prepared this manuscript. Marcos Maximo and Adilson Cunha assumed advisor roles during the research, discussing ideas, and revising the text. The final manuscript was revised and approved by all authors.

Corresponding author

Correspondence to Dicksiano C. Melo.

Ethics declarations

Conflicts of interest/Competing interests

The authors declare that they have no conflicts of interest/competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Availability of data and material

No extra data or material is available.

Code availability

gitlab.com/itandroids/open-projects/learning-push-recovery-strategies-for-bipedal-walking

Appendix: Experimental Parameters

Table 4 PPO Hyperparameters for JPL, WSC, RANP, RAS and RAU

Full size table

Table 5 Experiment parameters for JPL, WSC, RANP, RAS and RAU

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Melo, D.C., Maximo, M.R.O.A. & da Cunha, A.M. Learning Push Recovery Behaviors for Humanoid Walking Using Deep Reinforcement Learning. J Intell Robot Syst 106, 8 (2022). https://doi.org/10.1007/s10846-022-01656-7

Download citation

Received: 30 July 2021
Accepted: 08 November 2021
Published: 20 August 2022
DOI: https://doi.org/10.1007/s10846-022-01656-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Push Recovery Behaviors for Humanoid Walking Using Deep Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

Understanding the stability of deep control policies for biped locomotion

Control of Wheeled-Legged Quadrupeds Using Deep Reinforcement Learning

Push Recovery and Active Balancing for Inexpensive Humanoid Robots Using RL and DRL

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Additional information

Publisher’s Note

Availability of data and material

Code availability

Appendix: Experimental Parameters

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning Push Recovery Behaviors for Humanoid Walking Using Deep Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

Understanding the stability of deep control policies for biped locomotion

Control of Wheeled-Legged Quadrupeds Using Deep Reinforcement Learning

Push Recovery and Active Balancing for Inexpensive Humanoid Robots Using RL and DRL

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Additional information

Publisher’s Note

Availability of data and material

Code availability

Appendix: Experimental Parameters

Appendix: Experimental Parameters

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation