Abstract
In this study, the Pareto optimal strategy problem was investigated for multi-player mean-field stochastic systems governed by Itô differential equations using the reinforcement learning (RL) method. A partially model-free solution for Pareto-optimal control was derived. First, by applying the convexity of cost functions, the Pareto optimal control problem was solved using a weighted-sum optimal control problem. Subsequently, using on-policy RL, we present a novel policy iteration (PI) algorithm based on the ℌ-representation technique. In particular, by alternating between the policy evaluation and policy update steps, the Pareto optimal control policy is obtained when no further improvement occurs in system performance, which eliminates directly solving complicated cross-coupled generalized algebraic Riccati equations (GAREs). Practical numerical examples are presented to demonstrate the effectiveness of the proposed algorithm.
References
Mu C, Wang K, Ni Z, et al. Cooperative differential game-based optimal control and its application to power systems. IEEE Trans Ind Inf, 2020, 16: 5169–5179
Dockner E J, Jorgensen S, Long N V, et al. Differential Games in Economics and Management Science. Cambridge: Cambridge University Press, 2000
Sun Q, Wang X, Yang G, et al. Optimal constraint following for fuzzy mechanical systems based on a time-varying β-measure and cooperative game theory. IEEE Trans Syst Man Cybern Syst, 2022, 52: 7574–7587
Engwerda J. The regular convex cooperative linear quadratic control problem. Automatica, 2008, 44: 2453–2457
Lin Y, Jiang X, Zhang W. Necessary and sufficient conditions for Pareto optimality of the stochastic systems in finite horizon. Automatica, 2018, 94: 341–348
Zhang W, Peng C. Indefinite mean-field stochastic cooperative linear-quadratic dynamic difference game with its application to the network security model. IEEE Trans Cybern, 2022, 52: 11805–11818
Jiang X, Su S F, Zhao D. Pareto optimal strategy under H∞ constraint for the mean-field stochastic systems in infinite horizon. IEEE Trans Cybern, 2023, 53: 6963–6976
Qi Q, Zhang H, Wu Z. Stabilization control for linear continuous-time mean-field systems. IEEE Trans Autom Control, 2019, 64: 3461–3468
Zhang T, Deng F, Shi P. Nonfragile finite-time stabilization for discrete mean-field stochastic systems. IEEE Trans Autom Control, 2023, 68: 6423–6430
Lin Y, Zhang T, Zhang W. Pareto-based guaranteed cost control of the uncertain mean-field stochastic systems in infinite horizon. Automatica, 2018, 92: 197–209
Lin Y, Zhang W. Pareto efficiency in the infinite horizon mean-field type cooperative stochastic differential game. J Franklin Inst, 2021, 358: 5532–5551
Wang T, Zhang H, Luo Y. Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm. Neurocomputing, 2018, 312: 1–8
Liu M, Wan Y, Lewis F L, et al. Adaptive optimal control for stochastic multiplayer differential games using on-policy and off-policy reinforcement learning. IEEE Trans Neural Netw Learn Syst, 2020, 31: 5522–5533
Bian T, Jiang Z P. Stochastic and adaptive optimal control of uncertain interconnected systems: a data-driven approach. Syst Control Lett, 2018, 115: 48–54
Howard R A. Dynamic Programming and Markov Processes. Cambridge: MIT Press, 1960
Barto A G, Sutton R S, Anderson C W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern, 1983, SMC-13: 834–846
Vrabie D, Pastravanu O, Abu-Khalaf M, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45: 477–484
Kiumarsi B, Vamvoudakis K G, Modares H, et al. Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans Neural Netw Learn Syst, 2018, 29: 2042–2062
Pang B, Bian T, Jiang Z P. Robust policy iteration for continuous-time linear quadratic regulation. IEEE Trans Autom Control, 2022, 67: 504–511
Li N, Li X, Peng J, et al. Stochastic linear quadratic optimal control problem: a reinforcement learning method. IEEE Trans Autom Control, 2022, 67: 5009–5016
Pang B, Jiang Z P. Reinforcement learning for adaptive optimal stationary control of linear stochastic systems. IEEE Trans Autom Control, 2022, 68: 2383–2390
Zhang W H, Chen B S. ℌ-representation and applications to generalized Lyapunov equations and linear stochastic systems. IEEE Trans Autom Control, 2012, 57: 3009–3022
Leitmann G. Cooperative and Noncooperative Many Player Differential Games. Berlin: Springer-Verlag, 1974
Engwerda J C. LQ Dynamic Optimization and Differential Games. Chichester: Wiley, 2005
Li N, Li X, Yu Z. Indefinite mean-field type linear-quadratic stochastic optimal control problems. Automatica, 2020, 122: 109267
Øksendal B. Stochastic Differential Equations: An Introduction with Applications. New York: Springer, 2013
Banez R A, Tembine H, Li L, et al. Mean-field-type game-based computation offloading in multi-access edge computing networks. IEEE Trans Wireless Commun, 2020, 19: 8366–8381
Rami M A, Xun Yu, Zhou M A. Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls. IEEE Trans Autom Control, 2000, 45: 1131–1143
Hu Z, Shi P, Zhang J, et al. Control of discrete-time stochastic systems with packet loss by event-triggered approach. IEEE Trans Syst Man Cybern Syst, 2021, 51: 755–764
Qi W, Yang X, Park J H, et al. Fuzzy SMC for quantized nonlinear stochastic switching systems with semi-Markovian process and application. IEEE Trans Cybern, 2022, 52: 9316–9325
Jiang X S, Tian S P, Zhang W H. pth moment exponential stability of general nonlinear discrete-time stochastic systems. Sci China Inf Sci, 2021, 64: 209204
Zhang T L, Xu S Y, Zhang W H. Predefined-time stabilization for nonlinear stochastic Itô systems. Sci China Inf Sci, 2023, 66: 182202
Qi W, Zhang N, Zong G, et al. Asynchronous sliding-mode control for discrete-time networked hidden stochastic jump systems with cyber attacks. IEEE Trans Cybern, 2024, 54: 1934–1946
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 62103442, 12326343, 62373229), Natural Science Foundation of Shandong Province (Grant No. ZR2021QF080), Fundamental Research Funds for the Central Universities (Grant No. 23CX06024A), and Outstanding Youth Innovation Team in Shandong Higher Education Institutions (Grant No. 2023KJ061).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, X., Wang, Y., Zhao, D. et al. Online Pareto optimal control of mean-field stochastic multi-player systems using policy iteration. Sci. China Inf. Sci. 67, 140202 (2024). https://doi.org/10.1007/s11432-023-3982-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-023-3982-y