Abstract
Along with the development of systems and their applications, conventional control approaches are limited by system complexity and functions. The development of reinforcement learning and optimal control has become an impetus of engineering, which has show large potentials on automation. Currently, the optimization applications on robot are facing challenges caused by model bias, high dimensional systems, and computational complexity. To solve these issues, several researches proposed available data-driven optimization approaches. This survey aims to review the achievements on optimal control and reinforcement learning approaches for robots. This is not a complete and exhaustive survey, but provides some latest and remarkable achievements for optimal control of robots. It introduces the background and facing problem statement at the beginning. The developments of the solutions to existed issues for robot control and some notable control methods in these areas are reviewed briefly. In addition, the survey discusses the future development prospects from four aspects as research directions to achieve improving the efficiency of control, the artificial assistant learning, the applications in extreme environment and related subjects. The interdisciplinary researches are essential for engineering fields based on optimal control methods according to the perspective; which would not only promote engineering equipment to be more intelligent, but extend applications of optimal control approaches.
This work was supported by the Research Development Fund RDF-20-01-08 provided by Xi’an Jiaotong-Liverpool University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, Y., Roveda, L., Braun, D.J.: Efficiently computable constrained optimal feedback controllers. IEEE Rob. Autom. Lett. 4(1), 121–128 (2019)
Kober, J., Bagnell, J.A., Peter, J.: Reinforcement learning in robotics: a survey. Int. J. Rob. Res. 32(11), 1238–1274 (2013)
Chen, Y., Braun, D.J.: Hardware-in-the-loop iterative optimal feedback control without model-based future prediction. IEEE Trans. Rob. 35(6), 1419–1434 (2019)
Rastogi, D., Koryakovskiy, I., Kober, J.: Sample-efficient reinforcement learning via difference models. In: 3rd Machine Learning in Planning and Control of Robot Motion Workshop at ICRA (2018)
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566 (2018)
Kupcsik, A., Deisenroth, M.P., Peters, J., Loh, A.P., Vadakkepat, P., Neumann, G.: Model-based contextual policy search for data-efficient generalization of robot skills. Artif. Intell. 247, 415–439 (2017)
Deisenroth, M., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: 28th International Conference on Machine Learning (ICML), pp. 465–472 (2011)
Kumar, V., Todorov, E., Levine, S.: Optimal control with learned local models: application to dexterous manipulation. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 378–383 (2016)
Nagabandi, A., et al.: Learning image-conditioned dynamics models for control of underactuated legged millirobots. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4606–4613 (2018)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: 31th International Conference on Machine Learning (ICML), pp. 1889–1897 (2015)
Kingma, D.P., Ba, J.: ADAM: a method for stochastic optimization. In: 2015 International Conference for Learning Representations (ICLR) (2015)
Zhang, K., Shi, Y.: Adaptive model predictive control for a class of constrained linear systems with parametric uncertainties. Automatica 117, 108974 (2020)
Rottmann, A., Burgard, W.: Adaptive autonomous control using online value iteration with gaussian processes. In: 2009 IEEE International Conference on Robotics and Automation (ICRA), pp. 2106–2111 (2009)
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)
Chen, Y., Braun, D.J.: Iterative online optimal feedback control. IEEE Trans. Autom. Control 66(2), 566–580 (2021)
Losey, D.P., McDonald, C.G., O’Malley, M.K.: A bio-inspired algorithm for identifying unknown kinematics from a discrete set of candidate models by using collision detection. In: 6th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob), pp. 418–423 (2016)
Saputra, A.A., Wi Tay, N.N., Toda, Y., Botzheim, J., Kubota, N.: Bézier curve model for efficient bio-inspired locomotion of low cost four legged robot. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4443–4448 (2016)
Morton, J., Witherden, F.D., Jameson, A., Kochenderfer, M.J.: Deep dynamical modeling and control of unsteady fluid flows. In: 2018 Conference on Neural Information Processing Systems (NIPS) (2018)
Corneil, D., Gerstner, W., Brea, J.: Efficient model-based deep reinforcement learning with variational state tabulation. In: 35th International Conference on Machine Learning (ICML), pp. 1049–1058 (2018)
Lioutikov, R., Paraschos, A., Peters, J., Neumann, G.: Sample-based informationl-theoretic stochastic optimal control. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3896–3902 (2014)
Yaghmaie, F.A., Braun, D.J.: Reinforcement learning for a class of continuous-time input constrained optimal control problems. Automatica 99, 221–227 (2019)
Levine, S., Wagener, N., Abbeel, P.: Learning contact-rich manipulation skills with guided policy search. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 156–163 (2015)
Goedhart, M., Van Kampen, E.J., Armanini, S.F., de Visser, C.C., Chu, Q.P.: Machine learning for flapping wing flight control. In: 2018 AIAA Information Systems-AIAA Infotech @ Aerospace (2018)
Jordan, M.I., Rumelhart, D.E.: Forward models: supervised learning with a distal teacher. Cogn. Sci. 16(3), 307–354 (1992)
Åkesson, B.M., Toivonen, H.T.: A neural network model predictive controller. J. Process Control 16(9), 937–946 (2006)
Liu, D., Wang, D., Zhao, D., Wei, Q., Jin, N.: Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans. Autom. Sci. Eng. 9(3), 628–634 (2012)
Sun, Z., Dai, L., Liu, K., Dimarogonas, D.V., Xia, Y.: Robust self-triggered MPC with adaptive prediction horizon for perturbed nonlinear systems. IEEE Trans. Autom. Control 64(11), 4780–4787 (2019)
Talvitie, E.: Self-correcting models for model-based reinforcement learning. In: 31 Conference on Artificial Intelligence (AAAI), pp. 1–12 (2017)
Talvitie, E.: Model regularization for stable sample rollouts. In: the 30th Conference on Uncertainty in Artificial Intelligence, pp. 780–789 (2014)
Xu, F., Ocampomartinez, C., Olaru, S., Niculescu, S.I.: Robust MPC for actuator-fault tolerance using set-based passive fault detection and active fault isolation. Int. J. Appl. Math. Comput. Sci. 27(1), 43–61 (2017)
Kumbasar, T., Eksin, I., Guzelkaya, M., Yesil, E.: Adaptive fuzzy internal model control design with bias term compensator. In: 2011 IEEE International Conference on Mechatronics, pp. 312–317 (2011)
Li, X., Cao, L., Hu, X., Zhang, S.: Command filtered model-free robust control for aircrafts with actuator dynamics. IEEE Access. 7, 139475–139487 (2019)
Daneshfar, F., Mansoori, F., Bevrani, H.: Multi-agent reinforcement learning design of load-frequency control with frequency bias estimation. In: The 2nd International Conference on Control, Instrumentation and Automation (ICCIA), pp. 310–314 (2011)
Vafamand, N., Arefi, M.M., Khooban, M.H., Dragicevic, T., Blaabjerg, F.: Nonlinear model predictive speed control of electric vehicles represented by linear parameter varying models with bias terms. IEEE J. Emerg. Sel. Topics Power Electron. 7(3), 2081–2089 (2019)
Song, R., Lewis, F.L., Wei, Q., Zhang, H.: Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans. Cybern. 46(5), 1041–1050 (2016)
Varutti, P., Findeisen, R.: Event-based NMPC for networked control systems over UDP-like communication channels. In: 2011 American Control Conference, pp. 3166–3171 (2011)
Martinez-Cantin, R., Lopes, M., Montesano, L.: Body schema acquisition through active learning. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 1860–1866 (2010)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121–2159 (2011)
Fleming, J., Kouvaritakis, B., Cannon, M.: Robust tube MPC for linear systems with multiplicative uncertainty. IEEE Trans. Autom. Control 60(4), 1087–1092 (2015)
Gupta, V., Luo, F.: On a control algorithm for time-varying processor availability. IEEE Trans. Autom. Control 58(3), 743–748 (2013)
Demirel, B., Ghadimi, E., Quevedo, D.E., Johansson, M.: Optimal control of linear systems with limited control actions: threshold-based event-triggered control. IEEE Trans. Control Netw. Syst. 5(3), 1275–1286 (2017)
Jenson, E.L., Chen, X., Scheeres, D.J.: Optimal control of sampled linear systems with control-linear noise. IEEE Control Syst. Lett. 4(3), 650–655 (2020)
Nguyen, H., La, H.: Review of deep reinforcement learning for robot manipulation. In: 2019 3nd IEEE International Conference on Robotic Computing (IRC), pp. 590–595 (2019)
Khan, S.G., Herrmann, G., Lewis, F.L., Pipe, T., Melhuish, C.: Reinforcement learning and optimal adaptive control: an overview and implementation examples. Ann. Rev. Control. 36(1), 42–59 (2012)
Polydoros, A.S., Nalpantidis, L.: Survey of model-based reinforcement learning: applications on robotics. J. Intell. Rob. Syst. 86(2), 153–173 (2017)
Bhagat, S., Banerjee, H., Ho Tse, Z.T., Ren, H.: Deep reinforcement learning for soft, flexible robots: brief review with impending challenges. Robotics 8(1), 4 (2019)
Khan, M.A.M., et al.: A systematic review on reinforcement learning-based robotics within the last decade. IEEE Access 8, 176598–176623 (2020)
Zhao, W., Queralta, J.P., Westerlund, T.: Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 737–744 (2020)
Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Rob. Res. 29(13), 1608–1639 (2010)
Cui, R., Yang, C., Li, Y., Sharma, S.: Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. 47(6), 1019–1029 (2017)
Refsnes, J.E., Sorensen, A.J., Pettersen, K.Y.: Model-based output feedback control of slender-body underactuated AUVs: theory and experiments. IEEE Trans. Control Syst. Technol. 16(5), 930–946 (2008)
Eller, L., Siafara, L. C., Sauter, T.: Adaptive control for building energy management using reinforcement learning. In: 2018 IEEE International Conference on Industrial Technology (ICIT), pp. 1562–1567 (2018)
Avila, L., De Paula, M., Carlucho, I., Sanchez Reinoso, C.: MPPT for PV systems using deep reinforcement learning algorithms. IEEE Lat. Am. Trans. 17(12), 2020–2027 (2019)
Nguyen, T., Mukhopadhyay, S.: Multidisciplinary optimization in decentralized reinforcement learning. In: 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 779–784 (2017)
Dan, H., et al.: Error-voltage-based open-switch fault diagnosis strategy for matrix converters with model predictive control method. IEEE Trans. Ind. Appl. 53(5), 4603–4612 (2017)
Yu, B., Zhang, Y., Qu, Y.: MPC-based FTC with FDD against actuator faults of UAVs. In: 15th International Conference on Control, Automation and Systems (ICCAS), pp. 225–230 (2015)
Kim, K., Raimondo, D.M., Braatz, R.D.: Optimum input design for fault detection and diagnosis: model-based prediction and statistical distance measures. Control Conference. In: 2013 European Control Conference (ECC), pp. 1940–1945 (2013)
Acknowledgment
This work was supported by the Research Development Fund RDF-20-01-08 provided by Xi’an Jiaotong-Liverpool University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Feng, H., Yu, L., Chen, Y. (2021). Optimal Control and Reinforcement Learning for Robot: A Survey. In: Gao, H., Wang, X. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-030-92635-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-92635-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92634-2
Online ISBN: 978-3-030-92635-9
eBook Packages: Computer ScienceComputer Science (R0)