Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration

Song, Ruizhuo; Wei, Qinglai; Xiao, Wendong

doi:10.1007/s00521-015-2144-0

Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration

Original Article
Published: 05 January 2016

Volume 28, pages 1435–1441, (2017)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Ruizhuo Song¹,
Qinglai Wei² &
Wendong Xiao¹

406 Accesses
2 Citations
Explore all metrics

Abstract

This paper establishes an optimal control of unknown complex-valued system. Policy iteration is used to obtain the solution of the Hamilton–Jacobi–Bellman equation. Off-policy learning allows the iterative performance index and iterative control to be obtained by completely unknown dynamics. Critic and action networks are used to get the iterative control and iterative performance index, which execute policy evaluation and policy improvement. Asymptotic stability of the closed-loop system and the convergence of the iterative performance index function are proven. By Lyapunov technique, the uniformly ultimately bounded of the weight error is proven. Simulation study demonstrates the effectiveness of the proposed optimal control method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of PID control, tuning methods and applications

Article 17 July 2020

A review of cooperative multi-agent deep reinforcement learning

Article 14 October 2022

Predictive active control of building structures using LQR and artificial intelligence

Article 19 April 2024

References

Yang CD (2009) Stability and quantization of complex-valued nonlinear quantum systems. Chaos Solitons Fractals 42(2):711–723
Article MATH MathSciNet Google Scholar
Hu J, Wang J (2012) Global stability of complex-valued recurrent neural networks with time-delays. IEEE Trans Neural Netw Learning Syst 23(6):853–865
Article Google Scholar
Zhao H, Zeng X, He Z, Jin W, Li T (2012) Complex valued pipelined decision feedback recurrent neural network for non-linear channel equalisation. IET Commun 6(9):1082–1096
Article MathSciNet Google Scholar
Ceylan R, Ceylan M, Özbay Y, Kara S (2011) Fuzzy clustering complex-valued neural network to diagnose cirrhosis disease. Expert Syst Appl 38(8):9744–9751
Article Google Scholar
Fang T, Sun J (2013) Stability analysis of complex-valued impulsive system. IET Control Theory Appl 7(8):1152–1159
Article MathSciNet Google Scholar
Song R, Zhang H, Luo Y, Wei Q (2010) Optimal control laws for time delay systems with saturating actuators based on heuristic dynamic programming. Neurocomputing 73(16–18):3020–3027
Article Google Scholar
Wei Q, Liu D, Lin H (2015) Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans Cybern. doi:10.1109/TCYB.2015.2492242
Google Scholar
Wei Q, Liu D (2012) An iterative \(\epsilon\)-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Netw 32(6):236–244
Article MATH Google Scholar
Wei Q, Wang F, Liu D, Yang X (2014) Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans Cybern 44(12):2820–2833
Article Google Scholar
Song R, Xiao W, Zhang H (2013) Multi-objective optimal control for a class of unknown nonlinear systems based on finite-approximation-error ADP algorithm. Neurocomputing 119(7):212–221
Article Google Scholar
Song R, Xiao W, Wei Q (2013) Multi-objective optimal control for a class of nonlinear time-delay systems via adaptive dynamic programming. Soft Comput 17(11):2109–2115
Article MATH Google Scholar
Song R, Xiao W, Wei Q, Sun C (2014) Neural-network-based approach to finite-time optimal control for a class of unknown nonlinear systems. Soft Comput 18(8):1645–1653
Article MATH Google Scholar
Wei Q, Liu D, Yang X (2015) Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans Neural Netw Learning Syst 26(4):866–879
Article MathSciNet Google Scholar
Zhang H, Qing C, Luo Y (2014) Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming. IEEE Trans Autom Sci Eng 11(3):839–849
Article Google Scholar
Vrabie D, Pastravanu O, Lewis F, Abu-Khalaf M (2009) Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2):477–484
Article MATH MathSciNet Google Scholar
Lewis F, Vrabie D, Vamvoudakis K (2012) Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst Mag 32(6):76–105
Article MathSciNet Google Scholar
Vrabie D, Lewis F (2011) Adaptive dynamic programming for online solution of a zero-sum differential game. J Control Theory Appl 9(3):353–360
Article MATH MathSciNet Google Scholar
Modares H, Lewis FL (2014) Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7):1780–1792
Article MATH MathSciNet Google Scholar
Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50:193–202
Article MATH MathSciNet Google Scholar
Zhang H, Qing C, Jiang B, Luo Y (2014) Online adaptive policy learning algorithm for \(H_\infty\) state feedback control of unknown affine nonlinear discrete-time systems. IEEE Trans Cybern 44(12):2706–2718
Article Google Scholar
Zhang H, Wei Q, Luo Y (2008) A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans Syst Man Cybern Part B Cybern 38(4):937–942
Article Google Scholar
Jiang Y, Jiang Z (2012) Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48:2699–2704
Article MATH MathSciNet Google Scholar
Modares H, Lewis FL, Naghibi-Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learning Syst 24(10):1513–1525
Article Google Scholar
Sutton R, Barto A (2005) Reinforcement learning: an introduction. The MIT Press, Cambridge
Google Scholar
Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50
Article Google Scholar
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38(4):943–949
Article Google Scholar
Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Syst 32(2):140–153
Article Google Scholar
Modares H, Lewis F, Jiang Z (2015) \(H_\infty\) tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neural Netw Learning Syst 26(10):2550–2562
Article MathSciNet Google Scholar
Song R, Xiao W, Zhang H, Sun C (2014) Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans Neural Netw Learning Syst 25(9):1733–1739
Article Google Scholar
Wang J, Xu X, Liu D, Sun Z, Chen Q (2014) Self-learning cruise control using kernel-based least squares policy iteration. IEEE Trans Control Syst Technol 22(3):1078–1087
Article Google Scholar
Luo B, Wu H, Huang T, Liu D (2014) Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica 50(12):3281–3290
Article MATH MathSciNet Google Scholar
Modares H, Lewis FL (2014) Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Autom Control 59:3051–3056
Article MATH MathSciNet Google Scholar
Kiumarsi B, Lewis FL, Modares H, Karimpur A, Naghibi-Sistani MB (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4):1167–1175
Article MATH MathSciNet Google Scholar
Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41:779–791
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 61304079, 61374105, and in part by Fundamental Research Funds for the Central Universities under Grant FRF-TP-15-056A3, and in part by the Open Research Project from SKLMCCS under Grant 20150104.

Author information

Authors and Affiliations

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Ruizhuo Song & Wendong Xiao
The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Qinglai Wei

Authors

Ruizhuo Song
View author publications
You can also search for this author in PubMed Google Scholar
Qinglai Wei
View author publications
You can also search for this author in PubMed Google Scholar
Wendong Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qinglai Wei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, R., Wei, Q. & Xiao, W. Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration. Neural Comput & Applic 28, 1435–1441 (2017). https://doi.org/10.1007/s00521-015-2144-0

Download citation

Received: 12 May 2015
Accepted: 21 December 2015
Published: 05 January 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s00521-015-2144-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration

Abstract

Access this article

Similar content being viewed by others

A review of PID control, tuning methods and applications

A review of cooperative multi-agent deep reinforcement learning

Predictive active control of building structures using LQR and artificial intelligence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration

Abstract

Access this article

Similar content being viewed by others

A review of PID control, tuning methods and applications

A review of cooperative multi-agent deep reinforcement learning

Predictive active control of building structures using LQR and artificial intelligence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation