Abstract
Standard robotic control works perfectly in case of ordinary conditions, but in the case of a change in the conditions (e.g. damaging of one of the motors), the robot won’t achieve its task anymore. We need an algorithm that provide the robot with the ability of adaption to unforeseen situations. Reinforcement learning provide a framework corresponds with that requirements, but it needs big data sets to learn robotic tasks, which is impractical. We discuss using Gaussian processes to improve the efficiency of the Reinforcement learning, where a Gaussian Process will learn a state transition model using data from the robot (interaction) phase, and after that use the learned GP model to simulate trajectories and optimize the robot’s controller in a (simulation) phase. PILCO algorithm considered as the most data efficient RL algorithm. It gives promising results in Cart-pole task, where a working controller was learned after seconds of (interaction) on the real robot, but the whole training time, considering the training in the (simulation) was longer. In this work, we will try to leverage the abilities of the computational graphs to produce a ROS friendly python implementation of PILCO, and discuss a case study of a real world robotic task.
This work was supported by the Russian Science Foundation, project no. 18-71-00143.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McFarlane, D.C., Glover, K.: Robust Controller Design Procedure Using Normalized Coprime Factor Plant Descriptions. Lecture Notes in Control and Information Sciences, vol. 138. Springer, Heidelberg (1990). https://doi.org/10.1007/BFb0043199
Rocco, P.: Stability of PID control for industrial robot arms. IEEE Trans. Robot. Autom. 12(4), 606–614 (1996)
Åström, K.J., Wittenmark, B.: Adaptive Control. Courier Corporation, Mineola (2013)
Wen, J.T., Murphy, S.H.: PID control for robot manipulators. Rensselaer Polytechnic Institute (1990)
Teixeira, R.A., Braga, A.D.P., De Menezes, B.R.: Control of a robotic manipulator using artificial neural networks with on-line adaptation. Neural Process. Lett. 12(1), 19–31 (2000)
Nesnas, I.A., et al.: CLARAty: challenges and steps toward reusable robotic software. Int. J. Adv. Robot. Syst. 3(1), 5 (2006)
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897, June 2015
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends® Robot. 2(1–2), 1–142 (2013)
Carlson, J., Murphy, R.R.: How UGVs physically fail in the field. IEEE Trans. Robot. 21(3), 423–437 (2005)
Cully, A., Clune, J., Tarapore, D., Mouret, J.B.: Robots that can adapt like animals. Nature 521(7553), 503 (2015)
Nagatani, K., et al.: Emergency response to the nuclear accident at the Fukushima Daiichi Nuclear Power Plants using mobile rescue robots. J. Field Robot. 30(1), 44–63 (2013)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)
Ko, J., Klein, D.J., Fox, D., Haehnel, D.: Gaussian processes and reinforcement learning for identification and control of an autonomous blimp. In: Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 742–747. IEEE, April 2007
Wilson, A., Fern, A., Tadepalli, P.: Incorporating domain models into Bayesian optimization for RL. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 467–482. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_30
Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pp. 154–161 (2003)
Deisenroth, M.P., Fox, D., Rasmussen, C.E.: Gaussian processes for data-efficient learning in robotics and control. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 408–423 (2015)
Matthews, D.G., et al.: GPflow: a Gaussian process library using TensorFlow. J. Mach. Learn. Res. 18(1), 1299–1304 (2017)
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Younes, A., Panov, A.I. (2019). Toward Faster Reinforcement Learning for Robotics: Using Gaussian Processes. In: Osipov, G., Panov, A., Yakovlev, K. (eds) Artificial Intelligence. Lecture Notes in Computer Science(), vol 11866. Springer, Cham. https://doi.org/10.1007/978-3-030-33274-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-33274-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33273-0
Online ISBN: 978-3-030-33274-7
eBook Packages: Computer ScienceComputer Science (R0)