Skip to main content

Robot Trajectory Optimization with Reinforcement Learning Based on Local Dynamic Fitting

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14271))

Included in the following conference series:

  • 756 Accesses

Abstract

With the development of artificial intelligence, reinforcement learning plays an increasingly important role in the robot operation filed. In this paper, a trajectory optimization method based on local dynamic model fitting is proposed to improve sample utilization and reduce the difficulty of dynamic model learning. Firstly, the Gaussian mixture model of the robot was constructed, and based on this, the accurate local dynamics model was obtained through the Normal-inverse-wishart distribution. Secondly, LQR optimization algorithm was used to optimize the robot trajectory, and the optimal control strategy was obtained during the grasping process of the robot. Finally, the effectiveness of the proposed algorithm is verified on the dynamic simulation platform. The experimental results show that the method proposed in this paper can significantly improve sample utilization and learning efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lanfranco, A.R., Castellanos, A.E., Desai, J.P. and Meyers, W.C.: Robotic surgery: a current perspective. Ann. Surg. 239(1), 14–21 (2004)

    Google Scholar 

  2. Wyrobek, K.A., Berger, E.H., Van der Loos, H.M., Salisbury, J.K.:Towards a personal robotics development platform: rationale and design of an intrinsically safe personal robot. In: International Conference on Robotics and Automation (ICRA), pp. 2165–2170 (2008)

    Google Scholar 

  3. Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)

  4. Hwangbo, J., et al.: Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4(26), eaau5872 (2019)

    Google Scholar 

  5. Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trend Robot. 2, 1–142 (2013)

    Google Scholar 

  6. Hoppe, S., Giftthaler, M., Krug, R., Toussaint, M.:Sample-efficient learning for industrial assembly using Qgraph-bounded DDPG. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9080–9087 (2020)

    Google Scholar 

  7. Kober, J., Mohler, B.J., Peters, J.: Learning perceptual coupling for motor primitives. Intell. Robots Syst. 834–839 (2008)

    Google Scholar 

  8. Kober, J., Peters. J.: Policy Search for Motor Primitives in Robotics. Mach. Learn. 1–33 (2010)

    Google Scholar 

  9. Kormushev, P., Calinon, S., Caldwell, D.G.: Robot motor skill coordination with EM-based reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3232–3237 (2010)

    Google Scholar 

  10. Kober, J., Oztop, E., Peters, J.: Reinforcement learning to adjust robot movements to new situations. In: Proceedings of the 2010 Robotics: Science and Systems Conference, pp. 301–312 (2010)

    Google Scholar 

  11. Daniel, C., Neumann, G., Peters, J.: Learning concurrent motor skills in versatile solution spaces. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3591–3597 (2012)

    Google Scholar 

  12. Yang, C., Yang, J., Wang, X., Liang, B.:Control of space flexible manipulator using soft actor-critic and random network distillation. In: 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 3019–3024 (2019)

    Google Scholar 

  13. Frazelle, C., Rogers, J., Karamouzas, I., Walker, I.: Optimizing a continuum manipulators search policy through model-free reinforcement learning. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5564-5571 (2020)

    Google Scholar 

  14. Kim, H., Jordan, M., Sastry, S., Ng, A.: Autonomous helicopter flight via reinforcement learning. In: Advances in Neural Information Processing Systems (2004)

    Google Scholar 

  15. Ng, A.Y., et al.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. Springer Tracts in Advanced Robotics, vol. 21, pp. 363–372. Springer, Heidelberg (2006). https://doi.org/10.1007/11552246_35

  16. Bagnell, J.A., Schneider, J.G.: Autonomous helicopter control using reinforcement learning policy search methods. In: Proceedings of the International Conference on Robotics and Automation, pp. 1615–1620 (2001)

    Google Scholar 

  17. Deisenroth, M.P., Rasmussen, C.E., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Proceedings of the International Conference on Robotics: Science and Systems (2011)

    Google Scholar 

  18. Edition, R.L.A.I.S.: Richard S. MIT Press, Sutton (2019)

    Google Scholar 

  19. Khansari-Zadeh, S.M., Billard, A.: BM: an iterative algorithm to learn stable non-linear dynamical systems with Gaussian mixture models. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 2381–2388 (2010)

    Google Scholar 

  20. Dempster, A.P:. Maximum likelihood from incomplete data via the EM algorithm (1977)

    Google Scholar 

  21. http://wiki.ros.org/abb_irb1200_support

  22. https://github.com/tlund80/MARVIN/tree/9fddfd4c8e298850fc8ce49c02ff437f139309d0/src/swcomponents/marvin/models/cranfield-40

Download references

Acknowledgment

We would like to thank all the participants for the experiments in this paper. This work is supported by the Key Laboratory of Space Utilization, Chinese Academy of Sciences under the grant Y7031661SY, the National Natural Science Foundation of China under the grant No. 61502463 and Youth Innovation Promotion Association CAS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ji Liang .

Editor information

Editors and Affiliations

Appendix

Appendix

In this appendix, we present the derivation of the dynamic model coefficients. Firstly, we need to define some temporary variables.

The random variable \(\left[ {s_t ,a_t } \right]\) and \(s_{t + 1}\) are represented by \(Y\) and \(X\) respectively. \(\mu_X\) and \(\Sigma_X\) represent the mean value and covariance of \(X\), and \(\Sigma_X = \delta_X \delta_X^T\). \(\mu_Y\) and \(\Sigma_Y\) represent the mean value and covariance of \(Y\), and \(\Sigma_Y = \delta_Y \delta_Y^T\).

Secondly, we prove \(F_{sat}^T = \Sigma_Y^{ - 1} \Sigma_{YX}\).

\(\begin{gathered} \Sigma_Y F_{sat}^T - \Sigma_{YX} \hfill \\ = \delta_Y \delta_Y^T F_{sat}^T - \delta_Y \delta_X^T \hfill \\ = \iint {p(x,y)\left[ {(y - E(y))(y - E(y))^T F_{sat}^T - (y - E(y))(x - E(x))^T } \right]}dxdy \hfill \\ = \int_y {(y - E(y))dy\int_x {p(x,y)\left[ {F_{sat} y - F_{sat} E(y) - x + E(x)} \right]^T dx} } \hfill \\ \end{gathered}\) Where \(p(x,y) = p(s_t ,a_t ,s_{t + 1} )\).

As \(E(X) = E(F_{sat} y + f_t ) = F_{sat} E(y) + f_t\), substitute this into equation above we can get:

$$ \begin{gathered} \Sigma_Y F_{sat}^T - \Sigma_{YX} \hfill \\ = \int_y {(y - E(y))dy\int_x {p(x,y)\left[ {F_{sat} y - F_{sat} E(y) - x + F_{sat} E(y) + f_t } \right]^T dx} } \hfill \\ = \int_y {(y - E(y))dy\int_x {p(x,y)\left[ {F_{sat} y - x + f_t } \right]^T dx} } \hfill \\ = \int_y {(y - E(y))p_y (y)dy\left[ {(F_{sat} y + f_t - E(x|y))} \right]} \hfill \\ = \int_y {(y - E(y))p_y (y)dy*} 0 \hfill \\ = 0 \hfill \\ \end{gathered} $$

Thirdly, we prove \(f_t = \mu_X - F_{sat} \mu_Y\).

As \(E(X) = E(s_{t + 1} ) = \mu_X\),\(E(Y) = E(s_t ,a_t ) = \mu_Y\), we can obtain:

$$ f_t = \mu_X - F_{sat} \mu_Y $$

Finally, we prove \(\delta_{st} = \Sigma_X - F_{sat} \Sigma_Y F_{sat}^T\).

As we know that \(E(X|Y = y) = F_{sat} y + f_t\), we can derive that:

$$ D(E(X|Y)) = D(F_{sat} y + f_t ) = F_{sat} D(Y)F_{sat}^T = F_{sat} \Sigma_Y F_{sat}^T $$

And because of \(D(X|Y) = \delta_{st}\), we can derive that:

$$ E(D(X|Y)) = \delta_{st} $$

As \(D(X) = D(E(X|Y)) + E(D(X|Y))\), we can get:

$$ \Sigma_X = F_{sat} \Sigma_Y F_{sat}^T + \delta_{st} $$

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liang, J., Yan, S., Sun, G., Yu, G., Guo, L. (2023). Robot Trajectory Optimization with Reinforcement Learning Based on Local Dynamic Fitting. In: Yang, H., et al. Intelligent Robotics and Applications. ICIRA 2023. Lecture Notes in Computer Science(), vol 14271. Springer, Singapore. https://doi.org/10.1007/978-981-99-6495-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-6495-6_30

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-6494-9

  • Online ISBN: 978-981-99-6495-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics