An Improved Reinforcement Learning Based Heuristic Dynamic Programming Algorithm for Model-Free Optimal Control

Li, Jia; Yuan, Zhaolin; Ban, Xiaojuan

doi:10.1007/978-3-030-61616-8_23

An Improved Reinforcement Learning Based Heuristic Dynamic Programming Algorithm for Model-Free Optimal Control

Jia Li¹¹,
Zhaolin Yuan¹¹ &
Xiaojuan Ban¹¹

Conference paper
First Online: 14 October 2020

2216 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Abstract

For complicated processing industrial area, model-free adaptive control in data-driven schema is a classic problem. This paper proposes an improved reinforcement learning (RL) based heuristic dynamic programming algorithm for optimal tracking control in industrial system. The proposed method designs a double neural networks framework and employs a gradient-based optimization schema to present the optimal control law. Inspired by the experience replay buffer in deep RL learning, historical system trajectories in short-term are also considered in the training phase which achieves the stabilization of network learning. An experimental study based on an simulated industrial device shows that the proposed method is superior to other algorithms in terms of time consumption and control accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Shen, Y., Hao, L., Ding, S.X.: Real-time implementation of fault tolerant control systems with performance optimization. IEEE Trans. Ind. Electron. 61(5), 2402–2411 (2014)
Article Google Scholar
Kouro, S., Cortes, P., Vargas, R., Ammann, U., Rodriguez, J.: Model predictive control - a simple and powerful method to control power converters. IEEE Trans. Ind. Electron. 56(6), 1826–1838 (2009)
Article Google Scholar
Dai, W., Chai, T., Yang, S.X.: Data-driven optimization control for safety operation of hematite grinding process. IEEE Trans. Ind. Electron. 62(5), 2930–2941 (2015)
Article Google Scholar
Wang, D., Liu, D., Zhang, Q., Zhao, D.: Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans. Syst. Man Cybern. Syst. 46(11), 1544–1555 (2016)
Article Google Scholar
Wei, Q.-L., Liu, D.-R.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11(4), 1020–1036 (2014)
Article MathSciNet Google Scholar
Jiang, Y., Fan, J.-L., Chai, T.-Y., Li, J.-N., Lewis, L.F.: Data-driven flotation industrial process operational optimal control based on reinforcement learning. IEEE Trans. Ind. Inform. 14(5), 1974–1989 (2017)
Article Google Scholar
Jiang, Y., Fan, J.-L., Chai, T.-Y., Lewis, L.F.: Dual-rate operational optimal control for flotation industrial process with unknown operational model. IEEE Trans. Ind. Electron. 66(6), 4587–4599 (2019)
Article Google Scholar
Modares, H., Lewis, F.L.: Automatica integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input. Automatica 50(1), 193–202 (2014)
Article MathSciNet Google Scholar
Mnih, V., Silver, D., Riedmiller, M.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop 2013, Lake Tahoe, USA NIPS, pp. 1–9 (2013)
Google Scholar
Wang, D., Liu, D.-R., Wei, Q.-L., Zhao, D.-B., Jin, N.: Automatica optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8), 1825–1832 (2012)
Article MathSciNet Google Scholar
Chai, T.-Y., Jia, Y., Li, H.-B., Wang, H.: An intelligent switching control for a mixed separation thickener process. Control Eng. Pract. 57, 61–71 (2016)
Article Google Scholar
Kim, B.H., Klima, M.S.: Development and application of a dynamic model for hindered-settling column separations. Miner. Eng. 17(3), 403–410 (2004)
Article Google Scholar
Wang, L.-Y., Jia, Y., Chai, T.-Y., Xie, W.-F.: Dual rate adaptive control for mixed separation thickening process using compensation signal based approach. IEEE Trans. Ind. Electron. 1 (2017)
Google Scholar
Wang, M.: Design and development of model software of processes of slurry neutralization, sedimentation and separation. Northeastern University (2011)
Google Scholar
Tang, M.-T.: Hydrometallurgical equipment. Central South University (2009)
Google Scholar
Lin-Yan, W., Jian, L., Yao, J., Tian-You, C.: Dual-rate intelligent switching control for mixed separation thickening process. Acta Automatica Sinica 44(2), 330–343 (2018)
Google Scholar
Luo, B., Liu, D.-R., Huang, T.-W., Wang, D.: Model-free optimal tracking control via critic-only Q-learning. IEEE Trans. Neural Netw. Learn. Syst. 27(10), 2134–2144 (2016)
Article MathSciNet Google Scholar
Padhi, R., Unnikrishnan, N., Wang, X.-H., Balakrishnan, S.N.: A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw. 19(10), 1648–1660 (2006)
Article Google Scholar

Download references

Acknowledgments

This research was funded by the National Key Research and Development Program of China with grant number (No. 2019YFC0605300 and No. 2016YFB0700500), the National Natural Science Foundation of China with grant number (No. 61572075, No. 61702036 and No. 61873299) and Key Research Plan of Hainan Province (No. ZDYF2019009).

Author information

Authors and Affiliations

Artificial Intelligence and 3D Visualization Lab, University of Science and Technology Beijing, Beijing, China
Jia Li, Zhaolin Yuan & Xiaojuan Ban

Authors

Jia Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhaolin Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojuan Ban
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaojuan Ban .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Yuan, Z., Ban, X. (2020). An Improved Reinforcement Learning Based Heuristic Dynamic Programming Algorithm for Model-Free Optimal Control. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-61616-8_23
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics