Skip to main content

Toward Faster Reinforcement Learning for Robotics: Using Gaussian Processes

  • Chapter
  • First Online:
Artificial Intelligence

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11866))

Abstract

Standard robotic control works perfectly in case of ordinary conditions, but in the case of a change in the conditions (e.g. damaging of one of the motors), the robot won’t achieve its task anymore. We need an algorithm that provide the robot with the ability of adaption to unforeseen situations. Reinforcement learning provide a framework corresponds with that requirements, but it needs big data sets to learn robotic tasks, which is impractical. We discuss using Gaussian processes to improve the efficiency of the Reinforcement learning, where a Gaussian Process will learn a state transition model using data from the robot (interaction) phase, and after that use the learned GP model to simulate trajectories and optimize the robot’s controller in a (simulation) phase. PILCO algorithm considered as the most data efficient RL algorithm. It gives promising results in Cart-pole task, where a working controller was learned after seconds of (interaction) on the real robot, but the whole training time, considering the training in the (simulation) was longer. In this work, we will try to leverage the abilities of the computational graphs to produce a ROS friendly python implementation of PILCO, and discuss a case study of a real world robotic task.

This work was supported by the Russian Science Foundation, project no. 18-71-00143.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. McFarlane, D.C., Glover, K.: Robust Controller Design Procedure Using Normalized Coprime Factor Plant Descriptions. Lecture Notes in Control and Information Sciences, vol. 138. Springer, Heidelberg (1990). https://doi.org/10.1007/BFb0043199

    Book  MATH  Google Scholar 

  2. Rocco, P.: Stability of PID control for industrial robot arms. IEEE Trans. Robot. Autom. 12(4), 606–614 (1996)

    Article  Google Scholar 

  3. Åström, K.J., Wittenmark, B.: Adaptive Control. Courier Corporation, Mineola (2013)

    MATH  Google Scholar 

  4. Wen, J.T., Murphy, S.H.: PID control for robot manipulators. Rensselaer Polytechnic Institute (1990)

    Google Scholar 

  5. Teixeira, R.A., Braga, A.D.P., De Menezes, B.R.: Control of a robotic manipulator using artificial neural networks with on-line adaptation. Neural Process. Lett. 12(1), 19–31 (2000)

    Article  Google Scholar 

  6. Nesnas, I.A., et al.: CLARAty: challenges and steps toward reusable robotic software. Int. J. Adv. Robot. Syst. 3(1), 5 (2006)

    Article  Google Scholar 

  7. Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  8. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Article  Google Scholar 

  9. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897, June 2015

    Google Scholar 

  10. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  11. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  12. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  13. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  14. Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends® Robot. 2(1–2), 1–142 (2013)

    Article  Google Scholar 

  15. Carlson, J., Murphy, R.R.: How UGVs physically fail in the field. IEEE Trans. Robot. 21(3), 423–437 (2005)

    Article  Google Scholar 

  16. Cully, A., Clune, J., Tarapore, D., Mouret, J.B.: Robots that can adapt like animals. Nature 521(7553), 503 (2015)

    Article  Google Scholar 

  17. Nagatani, K., et al.: Emergency response to the nuclear accident at the Fukushima Daiichi Nuclear Power Plants using mobile rescue robots. J. Field Robot. 30(1), 44–63 (2013)

    Article  Google Scholar 

  18. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  19. Ko, J., Klein, D.J., Fox, D., Haehnel, D.: Gaussian processes and reinforcement learning for identification and control of an autonomous blimp. In: Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 742–747. IEEE, April 2007

    Google Scholar 

  20. Wilson, A., Fern, A., Tadepalli, P.: Incorporating domain models into Bayesian optimization for RL. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 467–482. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_30

    Chapter  Google Scholar 

  21. Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pp. 154–161 (2003)

    Google Scholar 

  22. Deisenroth, M.P., Fox, D., Rasmussen, C.E.: Gaussian processes for data-efficient learning in robotics and control. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 408–423 (2015)

    Article  Google Scholar 

  23. Matthews, D.G., et al.: GPflow: a Gaussian process library using TensorFlow. J. Mach. Learn. Res. 18(1), 1299–1304 (2017)

    MathSciNet  MATH  Google Scholar 

  24. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

  25. http://www.ros.org

  26. http://mlg.eng.cam.ac.uk/pilco/

  27. http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aleksandr I. Panov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Younes, A., Panov, A.I. (2019). Toward Faster Reinforcement Learning for Robotics: Using Gaussian Processes. In: Osipov, G., Panov, A., Yakovlev, K. (eds) Artificial Intelligence. Lecture Notes in Computer Science(), vol 11866. Springer, Cham. https://doi.org/10.1007/978-3-030-33274-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33274-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33273-0

  • Online ISBN: 978-3-030-33274-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics