Robot Trajectory Optimization with Reinforcement Learning Based on Local Dynamic Fitting

Liang, Ji; Yan, Shuo; Sun, Guangbin; Yu, Ge; Guo, Lili

doi:10.1007/978-981-99-6495-6_30

Ji Liang^15,16,
Shuo Yan¹⁶,
Guangbin Sun¹⁶,
Ge Yu¹⁶ &
…
Lili Guo¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14271))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

756 Accesses

Abstract

With the development of artificial intelligence, reinforcement learning plays an increasingly important role in the robot operation filed. In this paper, a trajectory optimization method based on local dynamic model fitting is proposed to improve sample utilization and reduce the difficulty of dynamic model learning. Firstly, the Gaussian mixture model of the robot was constructed, and based on this, the accurate local dynamics model was obtained through the Normal-inverse-wishart distribution. Secondly, LQR optimization algorithm was used to optimize the robot trajectory, and the optimal control strategy was obtained during the grasping process of the robot. Finally, the effectiveness of the proposed algorithm is verified on the dynamic simulation platform. The experimental results show that the method proposed in this paper can significantly improve sample utilization and learning efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lanfranco, A.R., Castellanos, A.E., Desai, J.P. and Meyers, W.C.: Robotic surgery: a current perspective. Ann. Surg. 239(1), 14–21 (2004)
Google Scholar
Wyrobek, K.A., Berger, E.H., Van der Loos, H.M., Salisbury, J.K.:Towards a personal robotics development platform: rationale and design of an intrinsically safe personal robot. In: International Conference on Robotics and Automation (ICRA), pp. 2165–2170 (2008)
Google Scholar
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Hwangbo, J., et al.: Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4(26), eaau5872 (2019)
Google Scholar
Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trend Robot. 2, 1–142 (2013)
Google Scholar
Hoppe, S., Giftthaler, M., Krug, R., Toussaint, M.:Sample-efficient learning for industrial assembly using Qgraph-bounded DDPG. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9080–9087 (2020)
Google Scholar
Kober, J., Mohler, B.J., Peters, J.: Learning perceptual coupling for motor primitives. Intell. Robots Syst. 834–839 (2008)
Google Scholar
Kober, J., Peters. J.: Policy Search for Motor Primitives in Robotics. Mach. Learn. 1–33 (2010)
Google Scholar
Kormushev, P., Calinon, S., Caldwell, D.G.: Robot motor skill coordination with EM-based reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3232–3237 (2010)
Google Scholar
Kober, J., Oztop, E., Peters, J.: Reinforcement learning to adjust robot movements to new situations. In: Proceedings of the 2010 Robotics: Science and Systems Conference, pp. 301–312 (2010)
Google Scholar
Daniel, C., Neumann, G., Peters, J.: Learning concurrent motor skills in versatile solution spaces. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3591–3597 (2012)
Google Scholar
Yang, C., Yang, J., Wang, X., Liang, B.:Control of space flexible manipulator using soft actor-critic and random network distillation. In: 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 3019–3024 (2019)
Google Scholar
Frazelle, C., Rogers, J., Karamouzas, I., Walker, I.: Optimizing a continuum manipulators search policy through model-free reinforcement learning. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5564-5571 (2020)
Google Scholar
Kim, H., Jordan, M., Sastry, S., Ng, A.: Autonomous helicopter flight via reinforcement learning. In: Advances in Neural Information Processing Systems (2004)
Google Scholar
Ng, A.Y., et al.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. Springer Tracts in Advanced Robotics, vol. 21, pp. 363–372. Springer, Heidelberg (2006). https://doi.org/10.1007/11552246_35
Bagnell, J.A., Schneider, J.G.: Autonomous helicopter control using reinforcement learning policy search methods. In: Proceedings of the International Conference on Robotics and Automation, pp. 1615–1620 (2001)
Google Scholar
Deisenroth, M.P., Rasmussen, C.E., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Proceedings of the International Conference on Robotics: Science and Systems (2011)
Google Scholar
Edition, R.L.A.I.S.: Richard S. MIT Press, Sutton (2019)
Google Scholar
Khansari-Zadeh, S.M., Billard, A.: BM: an iterative algorithm to learn stable non-linear dynamical systems with Gaussian mixture models. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 2381–2388 (2010)
Google Scholar
Dempster, A.P:. Maximum likelihood from incomplete data via the EM algorithm (1977)
Google Scholar
http://wiki.ros.org/abb_irb1200_support
https://github.com/tlund80/MARVIN/tree/9fddfd4c8e298850fc8ce49c02ff437f139309d0/src/swcomponents/marvin/models/cranfield-40

Download references

Acknowledgment

We would like to thank all the participants for the experiments in this paper. This work is supported by the Key Laboratory of Space Utilization, Chinese Academy of Sciences under the grant Y7031661SY, the National Natural Science Foundation of China under the grant No. 61502463 and Youth Innovation Promotion Association CAS.

Author information

Authors and Affiliations

University of Chinese Academy of Sciences, Beijing, China
Ji Liang
Key Laboratory of Space Utilization, Technology and Engineering Center for Space Utilization, Beijing, China
Ji Liang, Shuo Yan, Guangbin Sun, Ge Yu & Lili Guo

Authors

Ji Liang
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Yan
View author publications
You can also search for this author in PubMed Google Scholar
Guangbin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Ge Yu
View author publications
You can also search for this author in PubMed Google Scholar
Lili Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ji Liang .

Editor information

Editors and Affiliations

Zhejiang University, Hangzhou, China
Huayong Yang
Harbin Institute of Technology, Shenzhen, China
Honghai Liu
Zhejiang University, Hangzhou, China
Jun Zou
Huazhong University of Science and Technology, Wuhan, China
Zhouping Yin
Shenyang Institute of Automation, Shenyang, Liaoning, China
Lianqing Liu
Zhejiang University, Hangzhou, China
Geng Yang
Zhejiang University, Hangzhou, China
Xiaoping Ouyang
Harbin Institute of Technology, Shenzhen, China
Zhiyong Wang

Appendix

In this appendix, we present the derivation of the dynamic model coefficients. Firstly, we need to define some temporary variables.

The random variable $\left[ {s_t ,a_t } \right]$ and $s_{t + 1}$ are represented by $Y$ and $X$ respectively. $\mu_X$ and $\Sigma_X$ represent the mean value and covariance of $X$, and $\Sigma_X = \delta_X \delta_X^T$. $\mu_Y$ and $\Sigma_Y$ represent the mean value and covariance of $Y$, and $\Sigma_Y = \delta_Y \delta_Y^T$.

Secondly, we prove $F_{sat}^T = \Sigma_Y^{ - 1} \Sigma_{YX}$.

$\begin{gathered} \Sigma_Y F_{sat}^T - \Sigma_{YX} \hfill \\ = \delta_Y \delta_Y^T F_{sat}^T - \delta_Y \delta_X^T \hfill \\ = \iint {p(x,y)\left[ {(y - E(y))(y - E(y))^T F_{sat}^T - (y - E(y))(x - E(x))^T } \right]}dxdy \hfill \\ = \int_y {(y - E(y))dy\int_x {p(x,y)\left[ {F_{sat} y - F_{sat} E(y) - x + E(x)} \right]^T dx} } \hfill \\ \end{gathered}$ Where $p(x,y) = p(s_t ,a_t ,s_{t + 1} )$.

As $E(X) = E(F_{sat} y + f_t ) = F_{sat} E(y) + f_t$, substitute this into equation above we can get:

$$ \begin{gathered} \Sigma_Y F_{sat}^T - \Sigma_{YX} \hfill \\ = \int_y {(y - E(y))dy\int_x {p(x,y)\left[ {F_{sat} y - F_{sat} E(y) - x + F_{sat} E(y) + f_t } \right]^T dx} } \hfill \\ = \int_y {(y - E(y))dy\int_x {p(x,y)\left[ {F_{sat} y - x + f_t } \right]^T dx} } \hfill \\ = \int_y {(y - E(y))p_y (y)dy\left[ {(F_{sat} y + f_t - E(x|y))} \right]} \hfill \\ = \int_y {(y - E(y))p_y (y)dy*} 0 \hfill \\ = 0 \hfill \\ \end{gathered} $$

Thirdly, we prove $f_t = \mu_X - F_{sat} \mu_Y$.

As $E(X) = E(s_{t + 1} ) = \mu_X$,$E(Y) = E(s_t ,a_t ) = \mu_Y$, we can obtain:

$$ f_t = \mu_X - F_{sat} \mu_Y $$

Finally, we prove $\delta_{st} = \Sigma_X - F_{sat} \Sigma_Y F_{sat}^T$.

As we know that $E(X|Y = y) = F_{sat} y + f_t$, we can derive that:

$$ D(E(X|Y)) = D(F_{sat} y + f_t ) = F_{sat} D(Y)F_{sat}^T = F_{sat} \Sigma_Y F_{sat}^T $$

And because of $D(X|Y) = \delta_{st}$, we can derive that:

$$ E(D(X|Y)) = \delta_{st} $$

As $D(X) = D(E(X|Y)) + E(D(X|Y))$, we can get:

$$ \Sigma_X = F_{sat} \Sigma_Y F_{sat}^T + \delta_{st} $$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, J., Yan, S., Sun, G., Yu, G., Guo, L. (2023). Robot Trajectory Optimization with Reinforcement Learning Based on Local Dynamic Fitting. In: Yang, H., et al. Intelligent Robotics and Applications. ICIRA 2023. Lecture Notes in Computer Science(), vol 14271. Springer, Singapore. https://doi.org/10.1007/978-981-99-6495-6_30

Download citation

DOI: https://doi.org/10.1007/978-981-99-6495-6_30
Published: 16 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6494-9
Online ISBN: 978-981-99-6495-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robot Trajectory Optimization with Reinforcement Learning Based on Local Dynamic Fitting

Abstract

Access this chapter

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation