Survey of Model-Based Reinforcement Learning: Applications on Robotics

Polydoros, Athanasios S.; Nalpantidis, Lazaros

doi:10.1007/s10846-017-0468-y

Survey of Model-Based Reinforcement Learning: Applications on Robotics

Published: 26 January 2017

Volume 86, pages 153–173, (2017)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

8320 Accesses
288 Citations
10 Altmetric
Explore all metrics

Abstract

Reinforcement learning is an appealing approach for allowing robots to learn new tasks. Relevant literature reveals a plethora of methods, but at the same time makes clear the lack of implementations for dealing with real life challenges. Current expectations raise the demand for adaptable robots. We argue that, by employing model-based reinforcement learning, the—now limited—adaptability characteristics of robotic systems can be expanded. Also, model-based reinforcement learning exhibits advantages that makes it more applicable to real life use-cases compared to model-free methods. Thus, in this survey, model-based methods that have been applied in robotics are covered. We categorize them based on the derivation of an optimal policy, the definition of the returns function, the type of the transition model and the learned task. Finally, we discuss the applicability of model-based reinforcement learning approaches in new applications, taking into consideration the state of the art in both algorithms and hardware.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Deisenroth, M.P.: A survey on policy search for robotics. Foundations and Trends in Robotics 2(1–2), 1–142 (2011)
Article Google Scholar
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Article Google Scholar
Kormushev, P., Calinon, S., Caldwell, D.G.: Reinforcement learning in robotics: applications and real-world challenges. Robotics 2(3), 122–148 (2013)
Article Google Scholar
Levine, S., Koltun, V.: Learning complex neural network policies with trajectory optimization. In: Proceedings of the 31St International Conference on Machine Learning (ICML-14), pp. 829–837 (2014)
Deisenroth, M.P., Englert, P., Peters, J., Fox, D.: Multi-task policy search for robotics. In: IEEE International Conference on Robotics and Automation, IEEE, pp. 3876–3881 (2014)
van Rooijen, J., Grondman, I., Babuška, R.: Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy. Mechatronics 24(8), 966–974 (2014)
Wilson, A., Fern, A., Tadepalli, P.: Using trajectory data to improve bayesian optimization for reinforcement learning. J. Mach. Learn. Res. 15(1), 253–282 (2014)
MathSciNet MATH Google Scholar
Kupcsik, A., Deisenroth, M.P., Peters, J., Loh, A.P., Vadakkepat, P., Neumann, G.: Model-based contextual policy search for data-efficient generalization of robot skills. Artif. Intell. (2014)
Strahl, J., Honkela, T., Wagner, P.: A gaussian process reinforcement learning algorithm with adaptability and minimal tuning requirements. In: Artificial Neural Networks and Machine Learning–ICANN 2014, pp. 371–378. Springer (2014)
Boedecker, J., Springenberg, J.T., Wulfing, J., Riedmiller, M.: Approximate real-time optimal control based on sparse gaussian process models. In: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), IEEE, pp. 1–8 (2014)
Depraetere, B., Liu, M., Pinte, G., Grondman, I., Babuška, R. : Comparison of model-free and model-based methods for time optimal hit control of a badminton robot. Mechatronics 24(8), 1021–1030 (2014)
Article Google Scholar
Guenter, F., Hersch, M., Calinon, S., Billard, A.: Reinforcement learning for imitating constrained reaching movements. Adv. Robot. 21(13), 1521–1544 (2007)
Google Scholar
Shaker, M.R., Yue, S., Duckett, T.: Vision-based reinforcement learning using approximate policy iteration. In: International Conference on Advanced Robotics (2009)
Touzet, C.F.: Neural reinforcement learning for behaviour synthesis. Robot. Auton. Syst. 22(3-4), 251–281 (1997)
Article Google Scholar
Boone, G.: Efficient reinforcement learning: model-based Acrobot control. In: Proceedings of International Conference on Robotics and Automation, p. 1 (1997)
Abbeel, P., Quigley, M., Ng, A.Y.: Using inaccurate models in reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning - ICML ’06, pp. 1–8. ACM Press, New York, USA (2006)
Morimoto, J., Atkeson, C.G.: Minimax differential dynamic programming: an application to robust biped walking. Adv. Neural Inf. Proces. Syst. 15, 1539–1546 (2003)
Google Scholar
Martínez-Marín, T., Duckett, T.: Fast reinforcement learning for vision-guided mobile robots. In: Proceedings - IEEE International Conference on Robotics and Automation, vol. 2005, pp. 4170–4175 (2005)
Martinez-Marin, T.: On-line optimal motion planning for nonholonomic mobile robots. In: Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006, pp. 512–517. IEEE (2006)
Bakker, B., Zhumatiy, V., Gruener, G., Schmidhuber, J.: Quasi-online reinforcement learning for robots. In: Proceedings - IEEE International Conference on Robotics and Automation, vol. 2006, pp. 2997–3002 (2006)
Leffler, B.R., Littman, M.L., Edmunds, T.: Efficient reinforcement learning with relocatable action models. In: Proceedings of the 22nd AAAI Conference on Artificial Intelligence, pp. 572–577 (2007)
Hester, T., Quinlan, M., Stone, P.: Generalized model learning for reinforcement learning on a humanoid robot. In: IEEE International Conference on Robotics and Automation (ICRA), 2010, pp. 2369–2374. IEEE (2010)
Nguyen, T., Li, Z., Silander, T., Leong, T.Y.: Online feature selection for model-based reinforcement learning. Proceedings of the 30th International Conference on Machine Learning (ICML-13), 498–506 (2013)
Van Den Berg, J., Miller, S., Duckworth, D., Hu, H., Wan, A., Fu, X.Y., Goldberg, K., Abbeel, P.: Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations. In: Proceedings - IEEE International Conference on Robotics and Automation, pp. 2074–2081 (2010)
Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29(13), 1608–1639 (2010)
Article Google Scholar
Ross, S., Bagnell, J.A.: Agnostic system identification for model-based reinforcement learning. In: Proceedings of the 29th International Conference on Machine Learning, pp. 1703–1710 (2012)
Coates, A., Abbeel, P., Ng, A.Y.: Apprenticeship learning for helicopter control. Commun. ACM 52(7), 97–105 (2009). doi:10.1145/1538788.1538812
Schneider, J.G.: Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning. In: Neural Information Processing Systems 9, pp. 1047–1053. The MIT Press (1996)
Kuvayev, L., Sutton, R.: Model-based reinforcement learning with an approximate, learned model. In: Proceedings of the Ninth Yale Workshop on Adaptive and Learning Systems, pp. 101–105 (1996)
Hester, T., Quinlan, M., Stone, P.: RTMBA: a real-time model-based reinforcement learning architecture for robot control. In: IEEE International Conference on Robotics and Automation, pp. 85–89 (2012)
Frank, M., Leitner, J., Stollenga, M., Förster, A., Schmidhuber, J.: Curiosity driven reinforcement learning for motion planning on humanoids. Frontiers in neurorobotics 7, 25 (2014)
Atkeson, C.G.: Nonparametric model-based reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (1998)
Yamaguchi, A., Atkeson, C.G.: Neural networks and differential dynamic programming for reinforcement learning problems. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 5434–5441. IEEE (2016)
Howard, R.: Dynamic Programming and Markov Processes. Technology Press of the Massachusetts Institute of Technology (1960)
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7), 1180–1190 (2008)
Article Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots, pp. 1–20 (2003)
Amari, S.I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)
Article Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)
MathSciNet MATH Google Scholar
Lagoudakis, M., Parr, R., Littman, M.: Least-squares methods in reinforcement learning for control. In: Vlahavas, I., Spyropoulos, C. (eds.) Methods and Applications of Artificial Intelligence. Volume 2308 of Lecture Notes in Computer Science, pp. 249–260. Springer, Berlin, Heidelberg (2002)
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)
Google Scholar
Rasmussen, C.E.: Gaussian Processes for Machine Learning. MIT Press (2006)
Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003)
MathSciNet MATH Google Scholar
Sherstov, A.A., Stone, P.: Improving Action Selection in Mdp’s via Knowledge Transfer. In: AAAI, vol. 5, pp. 1024–1029 (2005)
Lang, T., Toussaint, M., Kersting, K.: Exploration in relational domains for model-based reinforcement learning. J. Mach. Learn. Res. 13, 3725–3768 (2012)
MathSciNet MATH Google Scholar
Martínez, D., Alenya, G., Torras, C.: Relational reinforcement learning with guided demonstrations. Artif. Intell. (2015)
Martínez, D., Alenya, G., Torras, C.: Safe robot execution in model-based reinforcement learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 6422–6427 (2015)
Yamaguchi, A., Atkeson, C.G.: Differential dynamic programming with temporally decomposed dynamics. In: IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), 2015, pp. 696–703 (2015)
Andersson, O., Heintz, F., Doherty, P.: Model-based reinforcement learning in continuous environments using real-time constrained optimization. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI15) (2015)
Anderson, B.D., Moore, J.B.: Optimal control: linear quadratic methods. Courier Corporation (2007)
Bertsekas, D.P., Bertsekas, D.P., Bertsekas, D.P., Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific Belmont, MA (1995)
Bradtke, S.J.: Incremental dynamic programming for on-line adaptive optimal control. Phd thesis, Amherst, MA, USA. UMI Order No. GAX95-10446 (1995)
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Technical Report 166 Cambridge University Engineering Department (1994)
Watkins, C., Dayan, P.: Technical note: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull. 2(4), 160–163 (1991)
Article Google Scholar
Bagnell, J., Schneider, J.: Autonomous helicopter control using reinforcement learning policy search methods. In: IEEE International Conference on Robotics and Automation, vol. 2, pp. 1615–1620 (2001)
El-Fakdi, A., Carreras, M.: Policy gradient based reinforcement learning for real autonomous underwater cable tracking. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3635–3640. IEEE (2008)
El-Fakdi, A., Carreras, M.: Two-step gradient-based reinforcement learning for underwater robotics behavior learning. Robot. Auton. Syst. 61(3), 271–282 (2013)
Article Google Scholar
Morimoto, J., Atkeson, C.G.: Nonparametric representation of an approximated poincaré map for learning biped locomotion. Auton. Robot. 27(2), 131–144 (2009)
Article Google Scholar
Ng, A.Y., Kim, H.J., Jordan, M.I., Sastry, S.: Autonomous helicopter flight via reinforcement learning. Adv. Neural Inf. Proces. Syst. 16(16), 363–372 (2004)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1889–1897 (2015)
Deisenroth, M., Rasmussen, C., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning. RSS (2011)
Deisenroth, M.P., Calandra, R., Seyfarth, A., Peters, J.: Toward fast policy search for learning legged locomotion. In: IEEE International Conference on Intelligent Robots and Systems, pp. 1787–1792 (2012)
Koppejan, R., Whiteson, S.: Neuroevolutionary reinforcement learning for generalized helicopter control. In: Proceedings of the 11Th Annual Conference on Genetic and Evolutionary Computation - GECCO ’09, p. 145. ACM Press, New York, USA (2009)
Kupcsik, A., Deisenroth, M., Peters, J., Neumann, G.: Data-efficient generalization of robot skills with contextual policy search. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2013)
Levine, S., Koltun, V.: Variational policy search via trajectory optimization. In: Advances in Neural Information Processing, pp. 207–215 (2013)
Deisenroth, M., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: 28th International Conference on Machine Learning, pp. 465–472 (2011)
Englert, P., Paraschos, A., Peters, J., Deisenroth, M.P.: Model-based imitation learning by probabilistic trajectory matching. In: IEEE International Conference on Robotics and Automation, pp. 1922–1927 (2013)
Mordatch, I., Mishra, N., Eppner, C., Abbeel, P.: Combining model-based policy search with online model learning for control of physical humanoids. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 242–248 (2016)
Tangkaratt, V., Mori, S., Zhao, T., Morimoto, J., Sugiyama, M.: Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation. Neural Netw. 57, 128–140 (2014)
Article MATH Google Scholar
Ko, J., Klein, D.J., Fox, D., Haehnel, D.: Gaussian processes and reinforcement learning for identification and control of an autonomous blimp. In: Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 742–747 (2007)
Michels, J., Saxena, A., Ng, A.Y.: High speed obstacle avoidance using monocular vision and reinforcement learning. In: Proceedings of the 22nd International Conference on Machine Learning, ACM, pp. 593–600 (2005)
Williams, G., Drews, P., Goldfain, B., Rehg, J.M., Theodorou, E.A.: Aggressive driving with model predictive path integral control. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1433–1440 (2016)
Baxter, J., Bartlett, P.L.: Direct gradient-based reinforcement learning. In: The 2000 IEEE International Symposium on Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva, vol. 3, pp. 271–274. IEEE (2000)
Girard, A., Rasmussen, C.E., Candela, J.Q., Murray-Smith, R.: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 545–552. MIT Press (2003)
Deisenroth, M.P.: Efficient Reinforcement Learning Using Gaussian Processes, vol. 9. KIT Scientific Publishing (2010)
Ng, A.Y., Jordan, M.: PEGASUS: a policy search method for large MDPs and POMDPs. In: Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 406–415. Morgan Kaufmann Publishers Inc (2000)
Peters, J., Mulling, K., Altun, Y.: Relative entropy policy search. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)
Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. J. Mach. Learn. Res. 11(Nov), 3137–3181 (2010)
MathSciNet MATH Google Scholar
Pan, Y., Theodorou, E., Kontitsis, M.: Sample efficient path integral control under uncertainty. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 2314–2322. Curran Associates, Inc (2015)
Colomé, A., Planells, A., Torras, C.: A friction-model-based framework for reinforcement learning of robotic tasks in non-rigid environments. In: 2015 IEEE International Conference on Robotics and Automation, (ICRA), pp. 5649–5654. IEEE (2015)
Theodorou, E., Buchli, J., Schaal, S.: Reinforcement learning of motor skills in high dimensions: a path integral approach. In: IEEE International Conference on Robotics and Automation (ICRA), 2010, IEEE, pp. 2397–2403 (2010)
Kober, J., Peters, J.R.: Policy search for motor primitives in robotics. In: Advances in Neural Information Processing Systems, pp. 849–856 (2009)
Polydoros, A.S., Nalpantidis, L.: A reservoir computing approach for learning forward dynamics of industrial manipulators. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, IEEE, pp. 612–618 (2016)
Schaal, S., Atkeson, C.G.: Constructive incremental learning from only local information. Neural Comput. 10, 2047–2084 (1997)
Article Google Scholar
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning for control. In: Lazy Learning, pp. 75–113. Springer (1997)
Quinlan, J.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Rasmussen, C.E.: Gaussian processes in machine learning. In: Advanced Lectures on Machine Learning, pp. 63–71. Springer (2004)
Albus, J.S.: A new approach to manipulator control: the cerebellar model articulation controller (CMAC). J. Dyn. Syst. Meas. Control. 97(3), 220–227 (1975)
Article MATH Google Scholar
Zufiria, P., Martínez-Marín, T.: Improved optimal control methods based upon the adjoining cell mapping technique. J. Optim. Theory Appl. 118(3), 657–680 (2003)
Article MathSciNet MATH Google Scholar
Andrew Moore, J.S.: Memory-based stochastic optimization. In: Touretzky, D., Mozer, M., Hasselm, M. (eds.) Neural Information Processing Systems 8, vol. 8, pp. 1066–1072. MIT Press (1996)
Sugiyama, M., Takeuchi, I., Suzuki, T., Kanamori, T., Hachiya, H., Okanohara, D.: Least-squares conditional density estimation. IEICE Trans. Inf. Syst. 93(3), 583–594 (2010)
Article Google Scholar
Tangkaratt, V., Morimoto, J., Sugiyama, M.: Model-based reinforcement learning with dimension reduction. Neural Netw. 84, 1–16 (2016)
Article Google Scholar
Polydoros, A.S., Nalpantidis, L., Kruger, V.: Real-time deep learning of robotic manipulator inverse dynamics. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3442–3448 (2015)

Download references

Author information

Authors and Affiliations

Department of Mechanical and Manufacturing Engineering, Aalborg University, AC Meyers Vaenge 15, 2450, Copenhagen SV, Denmark
Athanasios S. Polydoros & Lazaros Nalpantidis

Authors

Athanasios S. Polydoros
View author publications
You can also search for this author in PubMed Google Scholar
Lazaros Nalpantidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Athanasios S. Polydoros.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Polydoros, A.S., Nalpantidis, L. Survey of Model-Based Reinforcement Learning: Applications on Robotics. J Intell Robot Syst 86, 153–173 (2017). https://doi.org/10.1007/s10846-017-0468-y

Download citation

Received: 14 March 2016
Accepted: 05 January 2017
Published: 26 January 2017
Issue Date: May 2017
DOI: https://doi.org/10.1007/s10846-017-0468-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey of Model-Based Reinforcement Learning: Applications on Robotics

Abstract

Access this article

Similar content being viewed by others

Robot Learning

The Challenges of Reinforcement Learning in Robotics and Optimal Control

Optimal Control and Reinforcement Learning for Robot: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Survey of Model-Based Reinforcement Learning: Applications on Robotics

Abstract

Access this article

Similar content being viewed by others

Robot Learning

The Challenges of Reinforcement Learning in Robotics and Optimal Control

Optimal Control and Reinforcement Learning for Robot: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation