Learning to exploit passive compliance for energy-efficient gait generation on a compliant humanoid

Kormushev, Petar; Ugurlu, Barkan; Caldwell, Darwin G.; Tsagarakis, Nikos G.

doi:10.1007/s10514-018-9697-6

Learning to exploit passive compliance for energy-efficient gait generation on a compliant humanoid

Published: 13 February 2018

Volume 43, pages 79–95, (2019)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Petar Kormushev¹,
Barkan Ugurlu ORCID: orcid.org/0000-0002-9124-7441²,
Darwin G. Caldwell³ &
…
Nikos G. Tsagarakis³

1307 Accesses
11 Citations
Explore all metrics

Abstract

Modern humanoid robots include not only active compliance but also passive compliance. Apart from improved safety and dependability, availability of passive elements, such as springs, opens up new possibilities for improving the energy efficiency. With this in mind, this paper addresses the challenging open problem of exploiting the passive compliance for the purpose of energy efficient humanoid walking. To this end, we develop a method comprising two parts: an optimization part that finds an optimal vertical center-of-mass trajectory, and a walking pattern generator part that uses this trajectory to produce a dynamically-balanced gait. For the optimization part, we propose a reinforcement learning approach that dynamically evolves the policy parametrization during the learning process. By gradually increasing the representational power of the policy parametrization, it manages to find better policies in a faster and computationally efficient way. For the walking generator part, we develop a variable-center-of-mass-height ZMP-based bipedal walking pattern generator. The method is tested in real-world experiments with the bipedal robot COMAN and achieves a significant 18% reduction in the electric energy consumption by learning to efficiently use the passive compliance of the robot.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Data-Driven Approach to Estimate Human Center of Mass State During Perturbed Locomotion Using Simulated Wearable Sensors

Article 01 April 2024

Path Planning and Trajectory Planning Algorithms: A General Overview

Systematic Review on Wearable Lower Extremity Robotic Exoskeletons for Assisted Locomotion

Article Open access 28 October 2022

Notes

https://github.com/petar-kormushev/evolving-policy-parametrization.
Spring deflections are mechanically limited within 11.25 degrees in COMAN.

References

Abdolmaleki, A., Lau, N., Reis, L. P., Peters, J., & Neumann, G. (2016). Contextual policy search for linear and nonlinear generalization of a humanoid walking controller. Journal of Intelligent and Robotic Systems, 83(3), 393–408.
Article Google Scholar
Amran, C. A., Ugurlu, B., & Kawamura, A. (2010). Energy and torque efficient ZMP-based bipedal walking with varying center of mass height. In Proceedings of the IEEE international workshop on advanced motion control (pp. 408–413). Nagaoka, Japan.
Bernstein, A., & Shimkin, N. (2010). Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains. Machine Learning, 81(3), 359–397.
Article MathSciNet Google Scholar
Calandra, R., Seyfarth, A., Peters, J., & Deisenroth, M. P. (2014). An experimental comparison of bayesian optimization for bipedal locomotion. In Proceedings of 2014 IEEE international conference on robotics and automation (ICRA), Hong Kong.
Carpentier, J., Tonneau, S., Naveau, M., Stasse, O., & Mansard, N. (2016). A versatile and efficient pattern generator for generalized legged locomotion. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 1–6). Stockholm, Sweden.
Choi, Y., Kim, D., Oh, Y., & You, B. (2007). Posture/walking control for humanoid robot based on resolution of CoM Jacobian with embedded motion. IEEE Transactions on Robotics, 23(6), 1285–1293.
Article Google Scholar
Coates, A., Abbeel, P., & Ng, A. Y. (2009). Apprenticeship learning for helicopter control. Communications of the ACM, 52(7), 97–105.
Article Google Scholar
Deisenroth, M. P., Calandra, R., Seyfarth, A., & Peters, J. (2012). Toward fast policy search for learning legged locomotion. In 2012 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1787–1792). Algarve, Portugal: IEEE.
Geyer, H., Seyfarth, A., & Blickhan, R. (2006). Compliant leg behaviour explains basic dynamics of walking and running. Proceedings of the Royal Society B: Biological Sciences, 273(1603), 2861–2867.
Article Google Scholar
Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. Advanced Robotics, 21(13), 1521–1544.
Google Scholar
Harada, K., Kajita, S., Kaneko, K., & Hirukawa, H. (2004). An analytical method on real-time gait planning for a humanoid robot. International Journal of Humanoid Robotics, 3(1), 1–19.
Article Google Scholar
Herzog, A., Schaal, S., & Righetti, L. (2016). Structured contact force optimization for kino-dynamic motion generation. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Daejeon, Korea (pp. 1–6).
Hu, Y., Felis, M., & Mombaur, K. (2014). Compliance analysis of human leg joints in level ground walking with an optimal control approach. In Proceedings of the IEEE international conference on humanoid robots (humanoids), Madrid, Spain (pp. 881–886).
Ishikawa, M., Komi, P. V., Grey, M. J., Lepola, V., & Bruggemann, P. G. (2005). Muscle-tendon interaction and elastic energy usage in human walking. The Journal of Applied Physiology, 99(2), 603–608.
Article Google Scholar
Jafari, A., Tsagarakis, N. G., & Caldwell, D. G. (2013). A novel intrinsically energy efficient actuator with adjustable stiffness (AwAS). IEEE/ASME Transactions on Mechatronics, 18(1), 355–365.
Article Google Scholar
Kagami, S., Kitagawa, T., Nishiwaki, K., Sugihara, T., Inaba, T., & Inoue, H. (2002). A fast dynamically equilibrated walking trajectory generation method of humanoid robot. Autonomous Robots, 2(1), 71–82.
Article MATH Google Scholar
Kajita, S., Kanehiro, F., Kaneko, K., Fujiwara, K., Yokoi, K., & Hirukawa, H. (2003). Biped walking pattern generation by using preview control. In Proceedings of the IEEE international conference on robotics and automation (ICRA), Taipei, Taiwan (pp. 1620–1626).
Kober, J., & Peters, J. (2009). Learning motor primitives for robotics. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 2112–2118). Kobe, Japan.
Kober, J., & Peters, J. (2011). Policy search for motor primitives in robotics. Machine Learning, 84(1–2), 171–203.
Article MathSciNet MATH Google Scholar
Koch, K.H., Clever, D., Mombaur, K., & Endres, D. (2015). Learning movement primitives from optimal and dynamically feasible trajectories for humanoid walking. In Proceedings IEEE-Ras Intl Conf. on Humanoid Robots (Humanoids) (pp. 866–873). Seoul, Korea.
Kohl, N., & Stone, P. (2004). Machine learning for fast quadrupedal locomotion. In Proceedings National Conference on Artificial Intelligence, pages 611–616. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.
Kormushev, P., Calinon, S., & Caldwell, D. G. (2010). Robot motor skill coordination with EM-based reinforcement learning. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Taipei, Taiwan (pp. 3232–3237).
Kormushev, P., Nenchev, D. N., Calinon, S., & Caldwell, D. G. (2011a). Upper-body kinesthetic teaching of a free-standing humanoid robot. In Proceedings of the IEEE international conference on robotics and automation (ICRA), Shanghai, China.
Kormushev, P., Ugurlu, B., Calinon, S., Tsagarakis, N. G., & Caldwell, D. G. (2011b). Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems(IROS), San Francisco, USA (pp. 318–324).
Liu, Q., Zhao, J., Schutz, S., & Berns, K. (2015). Adaptive motor patterns and reflexes for bipedal locomotion on rough terrain. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, Hamburg, Germany (pp. 3856–3861).
McGeer, T. (1990). Passive dynamic walking. International Journal of Robotics Research, 9(2), 62–82.
Article Google Scholar
Minekata, H., Seki, H., & Tadakuma, S. (2008). A study of energy-saving shoes for robot considering lateral plane motion. IEEE Transactions on Industrial Electronics, 55(3), 1271–1276.
Article Google Scholar
Miyamoto, H., Morimoto, J., Doya, K., & Kawato, M. (2004). Reinforcement learning with via-point representation. Neural Networks, 17, 299–305.
Article MATH Google Scholar
Moore, A. W., & Atkeson, C. G. (1995). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning, 21, 199–233.
Google Scholar
Morimoto, J., & Atkeson, C. G. (2007). Learning biped locomotion: Application of poincare-map-based reinforcement learning. IEEE Robotics and Automation Magazine, 14(2), 41–51.
Article Google Scholar
Orin, D. E., Goswami, A., & Lee, S.-H. (2013). Centroidal dynamics of a humanoid robot. Autonomous Robots, 35(2), 161–176.
Article Google Scholar
Ortega, J. D., & Farley, C. T. (2005). Minimizing center of mass vertical movement increases metabolic cost in walking. The Journal of Applied Physiology, 581(9), 2099–2107.
Article Google Scholar
Pastor, P., Kalakrishnan, M., Chitta, S., Theodorou, E., & Schaal, S. (2011). Skill learning and task outcome prediction for manipulation. In International conference on robotics and automation (ICRA), Shanghai, China.
Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Beijing, China.
Peters, J., & Schaal, S. (2008a). Natural actor-critic. Neurocomputing, 71(7–9), 1180–1190.
Peters, J., & Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.
Rosado, J., Silva, F., & Santos, V. (2015). Biped walking learning from imitation using dynamic movement primitives. In L. P. Reis, A. P. Moreira, P. U. Lima, L. Montano, & V. Munoz Martinez (Eds.), Advances in intelligent systems and computing (pp. 185–196). Switzerland: Springer International Publishing.
Rosenstein, M. T., Barto, A. G., & Van Emmerik, R. E. A. (2006). Learning at the level of synergies for a robot weightlifter. Robotics and Autonomous Systems, 54(8), 706–717.
Article Google Scholar
Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transaction of the Royal Society of London: Series B, Biological Sciences, 358(1431), 537–547.
Article Google Scholar
Shafii, N., Lau, N., & Reis, L. P. (2015). Learning to walk fast: Optimized hip height movement for simulated and real humanoid robots. Journal of Intelligent and Robotic Systems, 80(3), 555–571.
Article Google Scholar
Shen, H., Yosinski, J., Kormushev, P., Caldwell, D. G., & Lipson, H. (2012). Learning fast quadruped robot gaits with the rl power spline parameterization. Bulgarian Academy of Sciences, Cybernetics and Information Technologies, 12(3), 66–75.
Article Google Scholar
Stulp, F., Buchli, J., Theodorou, E., & Schaal, S. (2010). Reinforcement learning of full-body humanoid motor skills. In Proceedings of the IEEE international conference on humanoid robots, Nashville, TN, USA (pp. 405–410).
Sugihara, T., & Nakamura, Y. (2009). Boundary condition relaxation method for stepwise pedipulation planning of biped robot. IEEE Transactions on Robotics, 25(3), 658–669.
Article Google Scholar
Theodorou, E., Buchli, J., & Schaal, S. (2010a). Reinforcement learning of motor skills in high dimensions: A path integral approach. In Proceedings of the IEEE international conference on robotics and automation (ICRA), Anchorage, US.
Theodorou, E., Buchli, J., & Schaal, S. (2010b). A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research, 11, 3137–3181.
MathSciNet MATH Google Scholar
Ugurlu, B., Hirabayashi, T., & Kawamura, A. (2009). A unified control frame for stable bipedal walking. In IEEE international conference on industrial electronics and control, Porto, Portugal (pp. 4167–4172).
Ugurlu, B., Tsagarakis, N. G., Spyrakos-Papastravridis, E., & Caldwell, D. G. (2011). Compiant joint modification and real-time dynamic walking implementation on bipedal robot cCub. In Proceedings of the IEEE international conference on mechatronics, Istanbul, Turkey.
Ugurlu, B., Saglia, J. A., Tsagarakis, N. G., Morfey, S., & Caldwell, D. G. (2014). Bipedal hopping pattern generation for passively compliant humanoids: Exploiting the resonance. IEEE Transactions on Industrial Electronics, 61(10), 5431–5443.
Article Google Scholar
Wada, Y., & Sumita, K. (2004). A reinforcement learning scheme for acquisition of via-point representation of human motion. In Proceedings of the IEEE International Conference on Neural Networks, 2, 1109–1114.
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), 229–256.
MATH Google Scholar
Wisse, M., Schwab, A. L., van der Linde, R. Q., & van der Helm, F. C. T. (2005). How to keep from falling forward: Elementary swing leg action for passive dynamic walkers. IEEE Transactions on Robotics, 21(3), 393–401.
Article Google Scholar
Xiaoxiang, Y., & Iida, F. (2014). Minimalistic models of an energy-efficient vertical-hopping robot. IEEE Transactions on Industrial Electronics, 61(2), 1053–1062.
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the EU project AMARSi, under the contract FP7-ICT-248311.

Author information

Authors and Affiliations

Dyson School of Design Engineering, Imperial College London, London, SW7 2AZ, UK
Petar Kormushev
Department of Mechanical Engineering, Ozyegin University, 34794, Istanbul, Turkey
Barkan Ugurlu
Department of Advanced Robotics, Istituto Italiano di Tecnologia, 16163, Genoa, Italy
Darwin G. Caldwell & Nikos G. Tsagarakis

Authors

Petar Kormushev
View author publications
You can also search for this author in PubMed Google Scholar
Barkan Ugurlu
View author publications
You can also search for this author in PubMed Google Scholar
Darwin G. Caldwell
View author publications
You can also search for this author in PubMed Google Scholar
Nikos G. Tsagarakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Barkan Ugurlu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 28136 KB)

Appendix: bipedal walking gait generator

Given the z-axis CoM trajectory, we utilized the ZMP concept for x-axis and y-axis CoM trajectories, in order to obtain walking patterns with dynamic balance. To generate real-time bipedal walking patterns which use the vertical CoM trajectory generated by the RL component, we adopted the resolution method explained in Kagami et al. (2002), using Thomas Algorithm (Ugurlu et al. 2009). Considering the one mass model, CoM position and ZMP position are described as $P = (p_x, p_y, p_z)$ and $Q =(q_x, q_y, 0)$, respectively. As described in Kajita et al. (2003), Choi et al. (2007), Harada et al. (2004), Sugihara and Nakamura (2009), the abstracted x-axis ZMP equation takes the following form,

$$\begin{aligned} q_x = p_x - \frac{\ddot{p}_x}{\ddot{p}_z+g}p_z , \end{aligned}$$

(6)

where g is the gravitational acceleration. The vertical CoM position ($p_z$) and acceleration ($\ddot{p}_z$) are provided by the learning algorithm for all times as previously stated. As next step, (6) is discretized for $p_x$ as follows:

$$\begin{aligned} \ddot{p}_x(t) = \frac{p_x(i+1) - 2p_x(i) + p_x(i-1)}{\varDelta t^2}, \end{aligned}$$

(7)

where $\varDelta t$ is the sampling period, i is the discrete event. i starts from 0 to n which is the total number of discrete events. Inserting (7) into (6), we obtain the following:

$$\begin{aligned} p_x(i+1)= & {} \frac{b(i)}{c(i)}p_x(i) -p_x(i-1) +\frac{q_x(i)}{c(i)} ; \end{aligned}$$

(8)

$$\begin{aligned} b(i)= & {} 1 - 2c(i); \, \, \, \, \, \, \, \, \, \, \, \, c(i) = \frac{-p_z(i)}{(\ddot{p}_z(i)+g) \varDelta t^2} . \end{aligned}$$

(9)

In order to solve this tridiagonal equation efficiently, we employ Thomas Algorithm (Ugurlu et al. 2009). To do so, initial and final position of x-axis CoM ($p_x(0)$ and $p_x(n)$) must be given in advance. Therefore, for a given set of reference ZMP trajectory, initial conditions, and final conditions, we are able to calculate CoM trajectory. For that purpose, the tridiagonal equation is re-arranged as below.

$$\begin{aligned} {p_x}(i)=e(i+1)p_x(i+1)+f(i+1) . \end{aligned}$$

(10)

In (10), $e(i+1)$ and $f(i+1)$ can be defined as follows:

$$\begin{aligned} e(i+1)=-\frac{c(i)}{c(i)e(i)+b(i)} , \end{aligned}$$

(11)

$$\begin{aligned} f(i+1)=\frac{q_x(i)-c(i)f(i)}{c(i)e(i)+b(i)} . \end{aligned}$$

(12)

Combining (10), (11) and (12), (13) is yielded.

$$\begin{aligned} {p_x}(i)=-\frac{c(i)}{c(i)e(i)+b(i)}p_x(i+1)+\frac{q_x(i)-c(i)f(i)}{c(i)e(i)+b(i)} . \nonumber \\ \end{aligned}$$

(13)

Recall that $p_x(0) = x_0$ and $p_x(n) = x_n$, e(1) and f(1) are determined as 0 and $x_0$, respectively. Utilizing Thomas Algorithm for the solution of this tridiagonal equation, we can obtain the CoM trajectory’s x-axis component. If an identical approach is also executed for y-axis CoM position, we could derive all the components of the CoM trajectory in real-time since vertical CoM position is previously determined by the RL algorithm.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kormushev, P., Ugurlu, B., Caldwell, D.G. et al. Learning to exploit passive compliance for energy-efficient gait generation on a compliant humanoid. Auton Robot 43, 79–95 (2019). https://doi.org/10.1007/s10514-018-9697-6

Download citation

Received: 11 August 2016
Accepted: 12 January 2018
Published: 13 February 2018
Issue Date: 31 January 2019
DOI: https://doi.org/10.1007/s10514-018-9697-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to exploit passive compliance for energy-efficient gait generation on a compliant humanoid

Abstract

Access this article

Similar content being viewed by others

A Data-Driven Approach to Estimate Human Center of Mass State During Perturbed Locomotion Using Simulated Wearable Sensors

Path Planning and Trajectory Planning Algorithms: A General Overview

Systematic Review on Wearable Lower Extremity Robotic Exoskeletons for Assisted Locomotion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Appendix: bipedal walking gait generator

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning to exploit passive compliance for energy-efficient gait generation on a compliant humanoid

Abstract

Access this article

Similar content being viewed by others

A Data-Driven Approach to Estimate Human Center of Mass State During Perturbed Locomotion Using Simulated Wearable Sensors

Path Planning and Trajectory Planning Algorithms: A General Overview

Systematic Review on Wearable Lower Extremity Robotic Exoskeletons for Assisted Locomotion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Appendix: bipedal walking gait generator

Appendix: bipedal walking gait generator

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation