Skip to main content

Optimal Control and Reinforcement Learning for Robot: A Survey

  • Conference paper
  • First Online:
  • 1152 Accesses

Abstract

Along with the development of systems and their applications, conventional control approaches are limited by system complexity and functions. The development of reinforcement learning and optimal control has become an impetus of engineering, which has show large potentials on automation. Currently, the optimization applications on robot are facing challenges caused by model bias, high dimensional systems, and computational complexity. To solve these issues, several researches proposed available data-driven optimization approaches. This survey aims to review the achievements on optimal control and reinforcement learning approaches for robots. This is not a complete and exhaustive survey, but provides some latest and remarkable achievements for optimal control of robots. It introduces the background and facing problem statement at the beginning. The developments of the solutions to existed issues for robot control and some notable control methods in these areas are reviewed briefly. In addition, the survey discusses the future development prospects from four aspects as research directions to achieve improving the efficiency of control, the artificial assistant learning, the applications in extreme environment and related subjects. The interdisciplinary researches are essential for engineering fields based on optimal control methods according to the perspective; which would not only promote engineering equipment to be more intelligent, but extend applications of optimal control approaches.

This work was supported by the Research Development Fund RDF-20-01-08 provided by Xi’an Jiaotong-Liverpool University.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chen, Y., Roveda, L., Braun, D.J.: Efficiently computable constrained optimal feedback controllers. IEEE Rob. Autom. Lett. 4(1), 121–128 (2019)

    Article  Google Scholar 

  2. Kober, J., Bagnell, J.A., Peter, J.: Reinforcement learning in robotics: a survey. Int. J. Rob. Res. 32(11), 1238–1274 (2013)

    Article  Google Scholar 

  3. Chen, Y., Braun, D.J.: Hardware-in-the-loop iterative optimal feedback control without model-based future prediction. IEEE Trans. Rob. 35(6), 1419–1434 (2019)

    Article  Google Scholar 

  4. Rastogi, D., Koryakovskiy, I., Kober, J.: Sample-efficient reinforcement learning via difference models. In: 3rd Machine Learning in Planning and Control of Robot Motion Workshop at ICRA (2018)

    Google Scholar 

  5. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  6. Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566 (2018)

    Google Scholar 

  7. Kupcsik, A., Deisenroth, M.P., Peters, J., Loh, A.P., Vadakkepat, P., Neumann, G.: Model-based contextual policy search for data-efficient generalization of robot skills. Artif. Intell. 247, 415–439 (2017)

    Article  MathSciNet  Google Scholar 

  8. Deisenroth, M., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: 28th International Conference on Machine Learning (ICML), pp. 465–472 (2011)

    Google Scholar 

  9. Kumar, V., Todorov, E., Levine, S.: Optimal control with learned local models: application to dexterous manipulation. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 378–383 (2016)

    Google Scholar 

  10. Nagabandi, A., et al.: Learning image-conditioned dynamics models for control of underactuated legged millirobots. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4606–4613 (2018)

    Google Scholar 

  11. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: 31th International Conference on Machine Learning (ICML), pp. 1889–1897 (2015)

    Google Scholar 

  12. Kingma, D.P., Ba, J.: ADAM: a method for stochastic optimization. In: 2015 International Conference for Learning Representations (ICLR) (2015)

    Google Scholar 

  13. Zhang, K., Shi, Y.: Adaptive model predictive control for a class of constrained linear systems with parametric uncertainties. Automatica 117, 108974 (2020)

    Article  MathSciNet  Google Scholar 

  14. Rottmann, A., Burgard, W.: Adaptive autonomous control using online value iteration with gaussian processes. In: 2009 IEEE International Conference on Robotics and Automation (ICRA), pp. 2106–2111 (2009)

    Google Scholar 

  15. Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)

    Article  MathSciNet  Google Scholar 

  16. Chen, Y., Braun, D.J.: Iterative online optimal feedback control. IEEE Trans. Autom. Control 66(2), 566–580 (2021)

    Article  MathSciNet  Google Scholar 

  17. Losey, D.P., McDonald, C.G., O’Malley, M.K.: A bio-inspired algorithm for identifying unknown kinematics from a discrete set of candidate models by using collision detection. In: 6th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob), pp. 418–423 (2016)

    Google Scholar 

  18. Saputra, A.A., Wi Tay, N.N., Toda, Y., Botzheim, J., Kubota, N.: Bézier curve model for efficient bio-inspired locomotion of low cost four legged robot. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4443–4448 (2016)

    Google Scholar 

  19. Morton, J., Witherden, F.D., Jameson, A., Kochenderfer, M.J.: Deep dynamical modeling and control of unsteady fluid flows. In: 2018 Conference on Neural Information Processing Systems (NIPS) (2018)

    Google Scholar 

  20. Corneil, D., Gerstner, W., Brea, J.: Efficient model-based deep reinforcement learning with variational state tabulation. In: 35th International Conference on Machine Learning (ICML), pp. 1049–1058 (2018)

    Google Scholar 

  21. Lioutikov, R., Paraschos, A., Peters, J., Neumann, G.: Sample-based informationl-theoretic stochastic optimal control. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3896–3902 (2014)

    Google Scholar 

  22. Yaghmaie, F.A., Braun, D.J.: Reinforcement learning for a class of continuous-time input constrained optimal control problems. Automatica 99, 221–227 (2019)

    Article  MathSciNet  Google Scholar 

  23. Levine, S., Wagener, N., Abbeel, P.: Learning contact-rich manipulation skills with guided policy search. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 156–163 (2015)

    Google Scholar 

  24. Goedhart, M., Van Kampen, E.J., Armanini, S.F., de Visser, C.C., Chu, Q.P.: Machine learning for flapping wing flight control. In: 2018 AIAA Information Systems-AIAA Infotech @ Aerospace (2018)

    Google Scholar 

  25. Jordan, M.I., Rumelhart, D.E.: Forward models: supervised learning with a distal teacher. Cogn. Sci. 16(3), 307–354 (1992)

    Article  Google Scholar 

  26. Åkesson, B.M., Toivonen, H.T.: A neural network model predictive controller. J. Process Control 16(9), 937–946 (2006)

    Article  Google Scholar 

  27. Liu, D., Wang, D., Zhao, D., Wei, Q., Jin, N.: Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans. Autom. Sci. Eng. 9(3), 628–634 (2012)

    Article  Google Scholar 

  28. Sun, Z., Dai, L., Liu, K., Dimarogonas, D.V., Xia, Y.: Robust self-triggered MPC with adaptive prediction horizon for perturbed nonlinear systems. IEEE Trans. Autom. Control 64(11), 4780–4787 (2019)

    Article  MathSciNet  Google Scholar 

  29. Talvitie, E.: Self-correcting models for model-based reinforcement learning. In: 31 Conference on Artificial Intelligence (AAAI), pp. 1–12 (2017)

    Google Scholar 

  30. Talvitie, E.: Model regularization for stable sample rollouts. In: the 30th Conference on Uncertainty in Artificial Intelligence, pp. 780–789 (2014)

    Google Scholar 

  31. Xu, F., Ocampomartinez, C., Olaru, S., Niculescu, S.I.: Robust MPC for actuator-fault tolerance using set-based passive fault detection and active fault isolation. Int. J. Appl. Math. Comput. Sci. 27(1), 43–61 (2017)

    Article  MathSciNet  Google Scholar 

  32. Kumbasar, T., Eksin, I., Guzelkaya, M., Yesil, E.: Adaptive fuzzy internal model control design with bias term compensator. In: 2011 IEEE International Conference on Mechatronics, pp. 312–317 (2011)

    Google Scholar 

  33. Li, X., Cao, L., Hu, X., Zhang, S.: Command filtered model-free robust control for aircrafts with actuator dynamics. IEEE Access. 7, 139475–139487 (2019)

    Article  Google Scholar 

  34. Daneshfar, F., Mansoori, F., Bevrani, H.: Multi-agent reinforcement learning design of load-frequency control with frequency bias estimation. In: The 2nd International Conference on Control, Instrumentation and Automation (ICCIA), pp. 310–314 (2011)

    Google Scholar 

  35. Vafamand, N., Arefi, M.M., Khooban, M.H., Dragicevic, T., Blaabjerg, F.: Nonlinear model predictive speed control of electric vehicles represented by linear parameter varying models with bias terms. IEEE J. Emerg. Sel. Topics Power Electron. 7(3), 2081–2089 (2019)

    Article  Google Scholar 

  36. Song, R., Lewis, F.L., Wei, Q., Zhang, H.: Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans. Cybern. 46(5), 1041–1050 (2016)

    Article  Google Scholar 

  37. Varutti, P., Findeisen, R.: Event-based NMPC for networked control systems over UDP-like communication channels. In: 2011 American Control Conference, pp. 3166–3171 (2011)

    Google Scholar 

  38. Martinez-Cantin, R., Lopes, M., Montesano, L.: Body schema acquisition through active learning. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 1860–1866 (2010)

    Google Scholar 

  39. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  40. Fleming, J., Kouvaritakis, B., Cannon, M.: Robust tube MPC for linear systems with multiplicative uncertainty. IEEE Trans. Autom. Control 60(4), 1087–1092 (2015)

    Article  MathSciNet  Google Scholar 

  41. Gupta, V., Luo, F.: On a control algorithm for time-varying processor availability. IEEE Trans. Autom. Control 58(3), 743–748 (2013)

    Article  MathSciNet  Google Scholar 

  42. Demirel, B., Ghadimi, E., Quevedo, D.E., Johansson, M.: Optimal control of linear systems with limited control actions: threshold-based event-triggered control. IEEE Trans. Control Netw. Syst. 5(3), 1275–1286 (2017)

    Article  MathSciNet  Google Scholar 

  43. Jenson, E.L., Chen, X., Scheeres, D.J.: Optimal control of sampled linear systems with control-linear noise. IEEE Control Syst. Lett. 4(3), 650–655 (2020)

    Article  MathSciNet  Google Scholar 

  44. Nguyen, H., La, H.: Review of deep reinforcement learning for robot manipulation. In: 2019 3nd IEEE International Conference on Robotic Computing (IRC), pp. 590–595 (2019)

    Google Scholar 

  45. Khan, S.G., Herrmann, G., Lewis, F.L., Pipe, T., Melhuish, C.: Reinforcement learning and optimal adaptive control: an overview and implementation examples. Ann. Rev. Control. 36(1), 42–59 (2012)

    Article  Google Scholar 

  46. Polydoros, A.S., Nalpantidis, L.: Survey of model-based reinforcement learning: applications on robotics. J. Intell. Rob. Syst. 86(2), 153–173 (2017)

    Article  Google Scholar 

  47. Bhagat, S., Banerjee, H., Ho Tse, Z.T., Ren, H.: Deep reinforcement learning for soft, flexible robots: brief review with impending challenges. Robotics 8(1), 4 (2019)

    Article  Google Scholar 

  48. Khan, M.A.M., et al.: A systematic review on reinforcement learning-based robotics within the last decade. IEEE Access 8, 176598–176623 (2020)

    Article  Google Scholar 

  49. Zhao, W., Queralta, J.P., Westerlund, T.: Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 737–744 (2020)

    Google Scholar 

  50. Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Rob. Res. 29(13), 1608–1639 (2010)

    Article  Google Scholar 

  51. Cui, R., Yang, C., Li, Y., Sharma, S.: Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. 47(6), 1019–1029 (2017)

    Article  Google Scholar 

  52. Refsnes, J.E., Sorensen, A.J., Pettersen, K.Y.: Model-based output feedback control of slender-body underactuated AUVs: theory and experiments. IEEE Trans. Control Syst. Technol. 16(5), 930–946 (2008)

    Article  Google Scholar 

  53. Eller, L., Siafara, L. C., Sauter, T.: Adaptive control for building energy management using reinforcement learning. In: 2018 IEEE International Conference on Industrial Technology (ICIT), pp. 1562–1567 (2018)

    Google Scholar 

  54. Avila, L., De Paula, M., Carlucho, I., Sanchez Reinoso, C.: MPPT for PV systems using deep reinforcement learning algorithms. IEEE Lat. Am. Trans. 17(12), 2020–2027 (2019)

    Article  Google Scholar 

  55. Nguyen, T., Mukhopadhyay, S.: Multidisciplinary optimization in decentralized reinforcement learning. In: 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 779–784 (2017)

    Google Scholar 

  56. Dan, H., et al.: Error-voltage-based open-switch fault diagnosis strategy for matrix converters with model predictive control method. IEEE Trans. Ind. Appl. 53(5), 4603–4612 (2017)

    Article  Google Scholar 

  57. Yu, B., Zhang, Y., Qu, Y.: MPC-based FTC with FDD against actuator faults of UAVs. In: 15th International Conference on Control, Automation and Systems (ICCAS), pp. 225–230 (2015)

    Google Scholar 

  58. Kim, K., Raimondo, D.M., Braatz, R.D.: Optimum input design for fault detection and diagnosis: model-based prediction and statistical distance measures. Control Conference. In: 2013 European Control Conference (ECC), pp. 1940–1945 (2013)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the Research Development Fund RDF-20-01-08 provided by Xi’an Jiaotong-Liverpool University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuqing Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Feng, H., Yu, L., Chen, Y. (2021). Optimal Control and Reinforcement Learning for Robot: A Survey. In: Gao, H., Wang, X. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-030-92635-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92635-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92634-2

  • Online ISBN: 978-3-030-92635-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics