Abstract
With the development of artificial intelligence, reinforcement learning plays an increasingly important role in the robot operation filed. In this paper, a trajectory optimization method based on local dynamic model fitting is proposed to improve sample utilization and reduce the difficulty of dynamic model learning. Firstly, the Gaussian mixture model of the robot was constructed, and based on this, the accurate local dynamics model was obtained through the Normal-inverse-wishart distribution. Secondly, LQR optimization algorithm was used to optimize the robot trajectory, and the optimal control strategy was obtained during the grasping process of the robot. Finally, the effectiveness of the proposed algorithm is verified on the dynamic simulation platform. The experimental results show that the method proposed in this paper can significantly improve sample utilization and learning efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lanfranco, A.R., Castellanos, A.E., Desai, J.P. and Meyers, W.C.: Robotic surgery: a current perspective. Ann. Surg. 239(1), 14–21 (2004)
Wyrobek, K.A., Berger, E.H., Van der Loos, H.M., Salisbury, J.K.:Towards a personal robotics development platform: rationale and design of an intrinsically safe personal robot. In: International Conference on Robotics and Automation (ICRA), pp. 2165–2170 (2008)
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Hwangbo, J., et al.: Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4(26), eaau5872 (2019)
Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trend Robot. 2, 1–142 (2013)
Hoppe, S., Giftthaler, M., Krug, R., Toussaint, M.:Sample-efficient learning for industrial assembly using Qgraph-bounded DDPG. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9080–9087 (2020)
Kober, J., Mohler, B.J., Peters, J.: Learning perceptual coupling for motor primitives. Intell. Robots Syst. 834–839 (2008)
Kober, J., Peters. J.: Policy Search for Motor Primitives in Robotics. Mach. Learn. 1–33 (2010)
Kormushev, P., Calinon, S., Caldwell, D.G.: Robot motor skill coordination with EM-based reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3232–3237 (2010)
Kober, J., Oztop, E., Peters, J.: Reinforcement learning to adjust robot movements to new situations. In: Proceedings of the 2010 Robotics: Science and Systems Conference, pp. 301–312 (2010)
Daniel, C., Neumann, G., Peters, J.: Learning concurrent motor skills in versatile solution spaces. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3591–3597 (2012)
Yang, C., Yang, J., Wang, X., Liang, B.:Control of space flexible manipulator using soft actor-critic and random network distillation. In: 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 3019–3024 (2019)
Frazelle, C., Rogers, J., Karamouzas, I., Walker, I.: Optimizing a continuum manipulators search policy through model-free reinforcement learning. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5564-5571 (2020)
Kim, H., Jordan, M., Sastry, S., Ng, A.: Autonomous helicopter flight via reinforcement learning. In: Advances in Neural Information Processing Systems (2004)
Ng, A.Y., et al.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. Springer Tracts in Advanced Robotics, vol. 21, pp. 363–372. Springer, Heidelberg (2006). https://doi.org/10.1007/11552246_35
Bagnell, J.A., Schneider, J.G.: Autonomous helicopter control using reinforcement learning policy search methods. In: Proceedings of the International Conference on Robotics and Automation, pp. 1615–1620 (2001)
Deisenroth, M.P., Rasmussen, C.E., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Proceedings of the International Conference on Robotics: Science and Systems (2011)
Edition, R.L.A.I.S.: Richard S. MIT Press, Sutton (2019)
Khansari-Zadeh, S.M., Billard, A.: BM: an iterative algorithm to learn stable non-linear dynamical systems with Gaussian mixture models. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 2381–2388 (2010)
Dempster, A.P:. Maximum likelihood from incomplete data via the EM algorithm (1977)
Acknowledgment
We would like to thank all the participants for the experiments in this paper. This work is supported by the Key Laboratory of Space Utilization, Chinese Academy of Sciences under the grant Y7031661SY, the National Natural Science Foundation of China under the grant No. 61502463 and Youth Innovation Promotion Association CAS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
In this appendix, we present the derivation of the dynamic model coefficients. Firstly, we need to define some temporary variables.
The random variable \(\left[ {s_t ,a_t } \right]\) and \(s_{t + 1}\) are represented by \(Y\) and \(X\) respectively. \(\mu_X\) and \(\Sigma_X\) represent the mean value and covariance of \(X\), and \(\Sigma_X = \delta_X \delta_X^T\). \(\mu_Y\) and \(\Sigma_Y\) represent the mean value and covariance of \(Y\), and \(\Sigma_Y = \delta_Y \delta_Y^T\).
Secondly, we prove \(F_{sat}^T = \Sigma_Y^{ - 1} \Sigma_{YX}\).
\(\begin{gathered} \Sigma_Y F_{sat}^T - \Sigma_{YX} \hfill \\ = \delta_Y \delta_Y^T F_{sat}^T - \delta_Y \delta_X^T \hfill \\ = \iint {p(x,y)\left[ {(y - E(y))(y - E(y))^T F_{sat}^T - (y - E(y))(x - E(x))^T } \right]}dxdy \hfill \\ = \int_y {(y - E(y))dy\int_x {p(x,y)\left[ {F_{sat} y - F_{sat} E(y) - x + E(x)} \right]^T dx} } \hfill \\ \end{gathered}\) Where \(p(x,y) = p(s_t ,a_t ,s_{t + 1} )\).
As \(E(X) = E(F_{sat} y + f_t ) = F_{sat} E(y) + f_t\), substitute this into equation above we can get:
Thirdly, we prove \(f_t = \mu_X - F_{sat} \mu_Y\).
As \(E(X) = E(s_{t + 1} ) = \mu_X\),\(E(Y) = E(s_t ,a_t ) = \mu_Y\), we can obtain:
Finally, we prove \(\delta_{st} = \Sigma_X - F_{sat} \Sigma_Y F_{sat}^T\).
As we know that \(E(X|Y = y) = F_{sat} y + f_t\), we can derive that:
And because of \(D(X|Y) = \delta_{st}\), we can derive that:
As \(D(X) = D(E(X|Y)) + E(D(X|Y))\), we can get:
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liang, J., Yan, S., Sun, G., Yu, G., Guo, L. (2023). Robot Trajectory Optimization with Reinforcement Learning Based on Local Dynamic Fitting. In: Yang, H., et al. Intelligent Robotics and Applications. ICIRA 2023. Lecture Notes in Computer Science(), vol 14271. Springer, Singapore. https://doi.org/10.1007/978-981-99-6495-6_30
Download citation
DOI: https://doi.org/10.1007/978-981-99-6495-6_30
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6494-9
Online ISBN: 978-981-99-6495-6
eBook Packages: Computer ScienceComputer Science (R0)