Skip to main content

Advertisement

Log in

Online Pareto optimal control of mean-field stochastic multi-player systems using policy iteration

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

In this study, the Pareto optimal strategy problem was investigated for multi-player mean-field stochastic systems governed by Itô differential equations using the reinforcement learning (RL) method. A partially model-free solution for Pareto-optimal control was derived. First, by applying the convexity of cost functions, the Pareto optimal control problem was solved using a weighted-sum optimal control problem. Subsequently, using on-policy RL, we present a novel policy iteration (PI) algorithm based on the -representation technique. In particular, by alternating between the policy evaluation and policy update steps, the Pareto optimal control policy is obtained when no further improvement occurs in system performance, which eliminates directly solving complicated cross-coupled generalized algebraic Riccati equations (GAREs). Practical numerical examples are presented to demonstrate the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Mu C, Wang K, Ni Z, et al. Cooperative differential game-based optimal control and its application to power systems. IEEE Trans Ind Inf, 2020, 16: 5169–5179

    Article  Google Scholar 

  2. Dockner E J, Jorgensen S, Long N V, et al. Differential Games in Economics and Management Science. Cambridge: Cambridge University Press, 2000

    Book  Google Scholar 

  3. Sun Q, Wang X, Yang G, et al. Optimal constraint following for fuzzy mechanical systems based on a time-varying β-measure and cooperative game theory. IEEE Trans Syst Man Cybern Syst, 2022, 52: 7574–7587

    Article  Google Scholar 

  4. Engwerda J. The regular convex cooperative linear quadratic control problem. Automatica, 2008, 44: 2453–2457

    Article  MathSciNet  Google Scholar 

  5. Lin Y, Jiang X, Zhang W. Necessary and sufficient conditions for Pareto optimality of the stochastic systems in finite horizon. Automatica, 2018, 94: 341–348

    Article  MathSciNet  Google Scholar 

  6. Zhang W, Peng C. Indefinite mean-field stochastic cooperative linear-quadratic dynamic difference game with its application to the network security model. IEEE Trans Cybern, 2022, 52: 11805–11818

    Article  Google Scholar 

  7. Jiang X, Su S F, Zhao D. Pareto optimal strategy under H constraint for the mean-field stochastic systems in infinite horizon. IEEE Trans Cybern, 2023, 53: 6963–6976

    Article  Google Scholar 

  8. Qi Q, Zhang H, Wu Z. Stabilization control for linear continuous-time mean-field systems. IEEE Trans Autom Control, 2019, 64: 3461–3468

    Article  MathSciNet  Google Scholar 

  9. Zhang T, Deng F, Shi P. Nonfragile finite-time stabilization for discrete mean-field stochastic systems. IEEE Trans Autom Control, 2023, 68: 6423–6430

    Article  MathSciNet  Google Scholar 

  10. Lin Y, Zhang T, Zhang W. Pareto-based guaranteed cost control of the uncertain mean-field stochastic systems in infinite horizon. Automatica, 2018, 92: 197–209

    Article  MathSciNet  Google Scholar 

  11. Lin Y, Zhang W. Pareto efficiency in the infinite horizon mean-field type cooperative stochastic differential game. J Franklin Inst, 2021, 358: 5532–5551

    Article  MathSciNet  Google Scholar 

  12. Wang T, Zhang H, Luo Y. Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm. Neurocomputing, 2018, 312: 1–8

    Article  Google Scholar 

  13. Liu M, Wan Y, Lewis F L, et al. Adaptive optimal control for stochastic multiplayer differential games using on-policy and off-policy reinforcement learning. IEEE Trans Neural Netw Learn Syst, 2020, 31: 5522–5533

    Article  MathSciNet  Google Scholar 

  14. Bian T, Jiang Z P. Stochastic and adaptive optimal control of uncertain interconnected systems: a data-driven approach. Syst Control Lett, 2018, 115: 48–54

    Article  MathSciNet  Google Scholar 

  15. Howard R A. Dynamic Programming and Markov Processes. Cambridge: MIT Press, 1960

    Google Scholar 

  16. Barto A G, Sutton R S, Anderson C W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern, 1983, SMC-13: 834–846

    Article  Google Scholar 

  17. Vrabie D, Pastravanu O, Abu-Khalaf M, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45: 477–484

    Article  MathSciNet  Google Scholar 

  18. Kiumarsi B, Vamvoudakis K G, Modares H, et al. Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans Neural Netw Learn Syst, 2018, 29: 2042–2062

    Article  MathSciNet  Google Scholar 

  19. Pang B, Bian T, Jiang Z P. Robust policy iteration for continuous-time linear quadratic regulation. IEEE Trans Autom Control, 2022, 67: 504–511

    Article  MathSciNet  Google Scholar 

  20. Li N, Li X, Peng J, et al. Stochastic linear quadratic optimal control problem: a reinforcement learning method. IEEE Trans Autom Control, 2022, 67: 5009–5016

    Article  MathSciNet  Google Scholar 

  21. Pang B, Jiang Z P. Reinforcement learning for adaptive optimal stationary control of linear stochastic systems. IEEE Trans Autom Control, 2022, 68: 2383–2390

    Article  MathSciNet  Google Scholar 

  22. Zhang W H, Chen B S. ℌ-representation and applications to generalized Lyapunov equations and linear stochastic systems. IEEE Trans Autom Control, 2012, 57: 3009–3022

    Article  MathSciNet  Google Scholar 

  23. Leitmann G. Cooperative and Noncooperative Many Player Differential Games. Berlin: Springer-Verlag, 1974

    Google Scholar 

  24. Engwerda J C. LQ Dynamic Optimization and Differential Games. Chichester: Wiley, 2005

    Google Scholar 

  25. Li N, Li X, Yu Z. Indefinite mean-field type linear-quadratic stochastic optimal control problems. Automatica, 2020, 122: 109267

    Article  MathSciNet  Google Scholar 

  26. Øksendal B. Stochastic Differential Equations: An Introduction with Applications. New York: Springer, 2013

    Google Scholar 

  27. Banez R A, Tembine H, Li L, et al. Mean-field-type game-based computation offloading in multi-access edge computing networks. IEEE Trans Wireless Commun, 2020, 19: 8366–8381

    Article  Google Scholar 

  28. Rami M A, Xun Yu, Zhou M A. Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls. IEEE Trans Autom Control, 2000, 45: 1131–1143

    Article  MathSciNet  Google Scholar 

  29. Hu Z, Shi P, Zhang J, et al. Control of discrete-time stochastic systems with packet loss by event-triggered approach. IEEE Trans Syst Man Cybern Syst, 2021, 51: 755–764

    Article  Google Scholar 

  30. Qi W, Yang X, Park J H, et al. Fuzzy SMC for quantized nonlinear stochastic switching systems with semi-Markovian process and application. IEEE Trans Cybern, 2022, 52: 9316–9325

    Article  Google Scholar 

  31. Jiang X S, Tian S P, Zhang W H. pth moment exponential stability of general nonlinear discrete-time stochastic systems. Sci China Inf Sci, 2021, 64: 209204

    Article  Google Scholar 

  32. Zhang T L, Xu S Y, Zhang W H. Predefined-time stabilization for nonlinear stochastic Itô systems. Sci China Inf Sci, 2023, 66: 182202

    Article  Google Scholar 

  33. Qi W, Zhang N, Zong G, et al. Asynchronous sliding-mode control for discrete-time networked hidden stochastic jump systems with cyber attacks. IEEE Trans Cybern, 2024, 54: 1934–1946

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 62103442, 12326343, 62373229), Natural Science Foundation of Shandong Province (Grant No. ZR2021QF080), Fundamental Research Funds for the Central Universities (Grant No. 23CX06024A), and Outstanding Youth Innovation Team in Shandong Higher Education Institutions (Grant No. 2023KJ061).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongya Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, X., Wang, Y., Zhao, D. et al. Online Pareto optimal control of mean-field stochastic multi-player systems using policy iteration. Sci. China Inf. Sci. 67, 140202 (2024). https://doi.org/10.1007/s11432-023-3982-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-023-3982-y

Keywords

Navigation