Learning Humanoid Robot Running Motions with Symmetry Incentive through Proximal Policy Optimization

Melo, Luckeciano C.; Melo, Dicksiano C.; Maximo, Marcos R. O. A.

doi:10.1007/s10846-021-01355-9

Learning Humanoid Robot Running Motions with Symmetry Incentive through Proximal Policy Optimization

Regular Paper
Published: 03 June 2021

Volume 102, article number 54, (2021)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

540 Accesses
14 Citations
1 Altmetric
Explore all metrics

Abstract

This article contributes with a methodology based on deep reinforcement learning to develop running skills in a humanoid robot with no prior knowledge. Specifically, the algorithm used for learning is the Proximal Policy Optimization (PPO). The chosen application domain is the RoboCup 3D Soccer Simulation (Soccer 3D), a competition where teams composed by 11 autonomous agents each compete in simulated soccer matches. In our approach, the state vector used as the neural network’s input consists of raw sensor measurements or quantities which could be obtained through sensor fusion, while the actions are the joint positions, which are sent to joint controllers. Our running behavior outperforms the state-of-the-art in terms of sprint speed by approximately 50%. We present results regarding the training procedure and also evaluate the controllers in terms of speed, reliability, and human similarity. Since the running policies with top speed display asymmetric motions, we also investigate a technique to encourage symmetry in the sagittal plane. Finally, we discuss key factors that lead us to surpass previous results in the literature and share some ideas for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to Run Faster in a Humanoid Robot Soccer Environment Through Reinforcement Learning

6D Localization and Kicking for Humanoid Robotic Soccer

Article 12 May 2021

Deep Reinforcement Learning for Humanoid Robot Behaviors

Article 27 April 2022

References

Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawa, E., Matsubara, H.: Robocup: A challenge problem for ai. AI Mag. 18(1), 73 (1997). https://doi.org/10.1609/aimag.v18i1.1276, https://aaai.org/ojs/index.php/aimagazine/article/view/1276
Google Scholar
Gouaillier, D., Hugel, V., Blazevic, P., Kilner, C., Monceaux, J., Lafourcade, P., Marnier, B., Serre, J., Maisonnier, B.: Mechatronic design of nao humanoid. In: 2009 IEEE International conference on robotics and automation, pp. 769–774 (2009)
Melo, L.C., Maximo, M.R.O.A., da Cunha, A.M.: Learning humanoid robot motions through deep neural networks. In: Proceedings of the II brazilian humanoid robot workshop (BRAHUR) and II brazilian workshop on service robotics (BRASERO), pp. 74–79. https://fei.edu.br/brahurbrasero2019/Proceedings_BRAHUR_BRASERO_2019.pdf (2019)
Maximo, M.R.O.A., Colombini, E.L., Ribeiro, C.H.: Stable and fast model-free walk with arms movement for humanoid robots. Int. J. Adv. Robot. Syst. 14 (3), 1729881416675135 (2017). https://doi.org/10.1177/1729881416675135
Article Google Scholar
Farchy, A., Barrett, S., MacAlpine, P., Stone, P.: Humanoid robots learning to walk faster: From the real world to simulation and back. In: Proc. of 12th Int. Conf. on autonomous agents and multiagent systems (AAMAS) (2013)
Kuindersma, S., Permenter, F., Tedrake, R.: An Efficiently Solvable Quadratic Program for Stabilizing Dynamic Locomotion. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, Hong Kong, China (2014)
Kajita, S., Kanehiro, F., Kaneko, K., Yokoi, K., Hirukawa, H.: The 3D Linear Inverted Pendulum Mode: A simple modeling for a biped walking pattern generation. In: Proceedings of the 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, Hawaii, USA (2001)
Collins, S., Ruina, A., Tedrake, R., Wisse, M.: Efficient bipedal robots based on passive dynamic walkers. Science Magazine 307, 1082–1085 (2005)
Google Scholar
Muniz, F., Maximo, M.R.O.A., Ribeiro, C.H.C.: Keyframe movement optimization for simulated humanoid robot using a parallel optimization framework. In: 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR), pp. 79–84 (2016)
Fischer, J., Dorer, K.: Learning a walk behavior utilizing toes from scratch. https://archive.robocup.info/Soccer/Simulation/3D/FCPs/RoboCup/2019/magmaOffenburg_SS3D_RC2019_FCP.pdf (2019)
Abreu, M., Simes, D., Lau, N., Reis, L.P.: Fast, human-like running and sprinting. https://archive.robocup.info/Soccer/Simulation/3D/FCPs/RoboCup/2019/FCPortugal_SS3D_RC2019_FCP.pdf (2019)
Abrel, M., Reis, L.P., Lau, N.: Learning to run faster in a humanoid robot soccer environment through reinforcement learning. In: Proceedings of the 2019 RoboCup symposium. RoboCup, Sydney, Australia (2019)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347. arXiv:1707.06347 (2017)
Abdolhosseini, F., Ling, H.Y., Xie, Z., Peng, X., Panne, M.V.D.: On learning symmetric locomotion. Motion, Interaction and Games (2019)
Carvalho Melo, L., Omena Albuquerque Máximo, M.R.: Learning humanoid robot running skills through proximal policy optimization. In: 2019 Latin american robotics symposium (LARS), 2019 Brazilian symposium on robotics (SBR) and 2019 workshop on robotics in education (WRE), pp. 37–42 (2019)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction , 2nd edn. The MIT Press, Cambridge (2018). http://incompleteideas.net/book/the-book-2nd.html http://incompleteideas.net/book/the-book-2nd.html
MATH Google Scholar
Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. CoRR abs/1506.02438. arXiv:1506.02438 (2015)
Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Bengio, Y, LeCun, Y (eds.) 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. arXiv:1506.02438 (2016)
Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines. GitHub, San Francisco (2017). https://github.com/openai/baselines https://github.com/openai/baselines
Google Scholar
Melo, L.C., Maximo, M.R.O.A., da Cunha, A.M.: Bottom-up meta-policy search. In: Proceedings of the deep reinforcement learning workshop of NeurIPS 2019 (2019)
Carvalho Melo, D, Quartucci Forster, C H, Omena de Albuquerque Mximo, M R: Learning when to kick through deep neural networks. In: 2019 Latin american robotics symposium (LARS), 2019 Brazilian symposium on robotics (SBR) and 2019 workshop on robotics in education (WRE), pp. 43–48 (2019)
MacAlpine, P., Collins, N., Lopez-Mobilia, A., Stone, P.: Ut austin villa: Robocup 2012 3d simulation league champion. In: Chen, X., Stone, P., Sucar, L.E., van der Zant, T. (eds.) RoboCup 2012: Robot soccer world cup XVI, pp. 77–88. Springer Berlin Heidelberg, Berlin, Heidelberg (2013)
Abdolmaleki, A., Simões, D, Lau, N., Reis, L.P., Neumann, G.: Learning a humanoid kick with controlled distance. In: Behnke, S., Sheh, R., Sarıel, S., Lee, D.D. (eds.) RoboCup 2016: Robot world cup XX, pp. 45–57. Springer International Publishing, Cham (2017)
Depinet, M., MacAlpine, P., Stone, P.: Keyframe sampling, optimization, and behavior integration: Towards long-distance kicking in the robocup 3d simulation league. In: Bianchi, R.A.C., Akin, H.L., Ramamoorthy, S., Sugiura, K. (eds.) RoboCup-2014: Robot soccer world cup XVIII, Lecture Notes in Artificial Intelligence. Springer Verlag. Berlin (2015)
MacAlpine, P., Stone, P.: UT Austin Villa: RoboCup 2017 3D simulation league competition and technical challenges champions. In: Sammut, C., Obst, O., Tonidandel, F., Akyama, H. (eds.) RoboCup 2017: Robot soccer world cup XXI. Lecture Notes in Artificial Intelligence, Springer (2018)
Urieli, D., MacAlpine, P., Kalyanakrishnan, S., Bentor, Y., Stone, P.: On optimizing interdependent skills: A case study in simulated 3d humanoid robot soccer. In: Tumer, K., Yolum, P., Sonenberg, L., Stone, P. (eds.) Proc. of 10th Int. Conf. on autonomous agents and multiagent systems (AAMAS), vol. 2, pp. 769–776. IFAAMAS (2011)
MacAlpine, P., Barrett, S., Urieli, D., Vu, V., Stone, P.: Design and optimization of an omnidirectional humanoid walk: A winning approach at the RoboCup 2011 3D simulation competition. In: Proceedings of the twenty-sixth AAAI conference on artificial intelligence (AAAI) (2012)
MacAlpine, P., Stone, P.: Overlapping layered learning. Artif. Intell. 254, 21–43 (2018). https://doi.org/10.1016/j.artint.2017.09.001. https://www.sciencedirect.com/science/article/pii/S0004370217301066
Article MathSciNet MATH Google Scholar
Dorer, K.: Learning to use toes in a humanoid robot. In: Akiyama, H., Obst, O., Sammut, C., Tonidandel, F. (eds.) RoboCup 2017: Robot world cup XXI, pp. 168–179. Springer International Publishing, Cham (2018)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deepreinforcement learning. arXiv:1312.5602, Cite arxiv:1312.5602Comment: NIPS Deep Learning Workshop 2013 (2013)
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Harley, T., Lillicrap, T.P., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd international conference on international conference on machine learning - Volume 48, ICML’16, pp. 1928–1937. JMLR.org (2016)
Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., de Freitas, N.: Sample efficient actor-critic with experience replay. arXiv:1611.01224 (2016)
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization (2017)
Heess, N., TB, D., S, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, S.M.A., Riedmiller, M., Silver, D.: Emergence of locomotion behaviours in rich environments. arXiv (20 17)
Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37(4), 1–14 (2018). https://doi.org/10.1145/3197517.3201311
Article Google Scholar
Melo, L.C.: Imitation learning and meta-learning for optimizing humanoid robot motions. Master’s Thesis, Instituto Tecnológico de Aeronáutica (2019)
Vatankhah, H., Lau, N., MacAlpine, P., van Dijk, S., Glaser, S.: Simspark. Gitlab, San Francisco (2018). https://gitlab.com/robocup-sim/SimSpark
Google Scholar
Maximo, M.R.O.A., Ribeiro, C.H.C.: ZMP-based humanoid walking engine with arms movement and stabilization. In: Proceedings of the 2016 Congresso Brasileiro de Automática (CBA). SBA, Vitória, ES, Brazil (2016)
Xu, Y., Vatankhah, H.: Simspark: An open source robot simulator developed by the robocup community. In: Behnke, S., Veloso, M., Visser, A., Xiong, R. (eds.) RoboCup 2013: Robot world cup XVII, pp. 632–639. Springer Berlin Heidelberg, Berlin, Heidelberg (2014)
MacAlpine, P., Collins, N., Lopez-Mobilia, A., Stone, P.: UT Austin Villa: RoboCup 2012 3D simulation league champion. In: Chen, X., Stone, P., Sucar, L.E., der Zant, T.V. (eds.) RoboCup-2012: Robot soccer world cup XVI, Lecture notes in artificial intelligence. Springer Verlag, Berlin (2013)
Intel: Intel devcloud. https://software.intel.com/en-us/ai-academy/devcloud (2018)
et al, M.A.: TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/, Software available from tensorflow.org (2015)

Download references

Acknowledgements

The authors thank ITAndroids’ sponsors: Altium, Cenic, Intel, ITAEx, MathWorks, Metinjo, Micropress, Polimold, Rapid, SolidWorks, STMicroelectronics, Wildlife Studios, and Virtual.PYXIS. A special thanks goes to Intel for providing the computational resources and specialized AI software. Finally, we are also grateful to all members of the ITAndroids team, especially those from Soccer 3D, for developing the base code used in this research.

Author information

Authors and Affiliations

Autonomous Computational Systems Lab (LAB-SCA), Computer Science Division, Aeronautics Institute of Technology, Praça Marechal Eduardo Gomes, 50, Vila das Acácias, 12228-900, São José dos Campos, SP, Brazil
Luckeciano C. Melo, Dicksiano C. Melo & Marcos R. O. A. Maximo

Authors

Luckeciano C. Melo
View author publications
You can also search for this author in PubMed Google Scholar
Dicksiano C. Melo
View author publications
You can also search for this author in PubMed Google Scholar
Marcos R. O. A. Maximo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luckeciano C. Melo.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors Luckeciano C. Melo and Dicksiano C. Melo contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Melo, L.C., Melo, D.C. & Maximo, M.R.O.A. Learning Humanoid Robot Running Motions with Symmetry Incentive through Proximal Policy Optimization. J Intell Robot Syst 102, 54 (2021). https://doi.org/10.1007/s10846-021-01355-9

Download citation

Received: 16 October 2020
Accepted: 24 February 2021
Published: 03 June 2021
DOI: https://doi.org/10.1007/s10846-021-01355-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Humanoid Robot Running Motions with Symmetry Incentive through Proximal Policy Optimization

Abstract

Access this article

Similar content being viewed by others

Learning to Run Faster in a Humanoid Robot Soccer Environment Through Reinforcement Learning

6D Localization and Kicking for Humanoid Robotic Soccer

Deep Reinforcement Learning for Humanoid Robot Behaviors

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning Humanoid Robot Running Motions with Symmetry Incentive through Proximal Policy Optimization

Abstract

Access this article

Similar content being viewed by others

Learning to Run Faster in a Humanoid Robot Soccer Environment Through Reinforcement Learning

6D Localization and Kicking for Humanoid Robotic Soccer

Deep Reinforcement Learning for Humanoid Robot Behaviors

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation