Variance minimization of parameterized Markov decision processes

Xia, Li

doi:10.1007/s10626-017-0258-5

Variance minimization of parameterized Markov decision processes

Published: 26 August 2017

Volume 28, pages 63–81, (2018)
Cite this article

Discrete Event Dynamic Systems Aims and scope Submit manuscript

Li Xia¹

347 Accesses
8 Citations
Explore all metrics

Abstract

In this paper, we study the variance minimization problem of Markov decision processes (MDPs) in which the policy is parameterized by action selection probabilities or other general parameters. Different from the average or discounted criterion mostly used in the traditional MDP theory, the variance criterion is difficult to handle because of the non-Markovian property caused by the nonlinear (quadratic) structure of variance function. With the basic idea of sensitivity-based optimization, we derive a difference formula of the reward variance under any two parametric policies. A variance derivative formula is also obtained. With these sensitivity formulas, we obtain a necessary condition of the optimal policy with the minimal variance. We also prove that the optimal policy with the minimal variance can be found in the deterministic policy space. An iterative algorithm is further developed to efficiently reduce the reward variance and this algorithm can converge to the local optimal policy. Finally, we conduct some numerical experiments to demonstrate the main results of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

References

Bertsekas DP (2012) Dynamic programming and optimal control – vol II, 4th edn. Athena Scientific, Massachusetts
MATH Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming, Athena scientific. Belmont , Massachusetts
MATH Google Scholar
Cao XR (2007) Stochastic learning and optimization – a sensitivity-based approach. Springer, New York
Book MATH Google Scholar
Cao XR, Chen HF (1997) Perturbation realization, potentials, and sensitivity analysis of Markov processes. IEEE Trans Autom Control 42:1382–1393
Article MathSciNet MATH Google Scholar
Chung KJ (1994) Mean-variance tradeoffs in an undiscounted MDP: the unichain case. Oper Res 42(1): 184–188
Article MATH Google Scholar
Feinberg E, Schwartz A (2002) Handbook of Markov decision processes: methods and applications. Kluwer Academic Publishers, Boston
Book Google Scholar
Guo X, Hernandez-Lerma O (2009) Continuous-time Markov decision processes. Springer, Theory and Applications
Book MATH Google Scholar
Guo X, Huang X, Zhang Y (2015) On the first passage g-mean-variance optimality for discounted continuous-time Markov decision processes. SIAM J Control Optim 53(3):1406–1424
Article MathSciNet MATH Google Scholar
Guo X, Ye L, Yin G (2012) A mean-variance optimization problem for discounted Markov decision processes. Eur J Oper Res 220:423–429
Article MathSciNet MATH Google Scholar
Hernandez-Lerma O, Vega-Amaya O, Carrasco G (1999) Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J Control Optim 38:79–93
Article MathSciNet MATH Google Scholar
Huo H, Zou X, Guo X (2017) The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dynamic Systems: Theory and Applications
Littman ML, Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc.
Luh PB, Yu Y, Zhang B, Litvinov E, Zheng T, Zhao F, Zhao J, Wang C (2014) Grid integration of intermittent wind generation: a Markovian approach. IEEE Trans Smart Grid 5(2):732–741
Article Google Scholar
Marbach P, Tsitsiklis JN (2001) Simulation-based optimization of Markov reward processes. IEEE Trans Autom Control 46:191–209
Article MathSciNet MATH Google Scholar
Markowitz H (1952) Portfolio selection. J Financ 7:77–91
Google Scholar
Mannor S, Tsitsiklis JN (2011) Mean-variance optimization in Markov decision processes. In: Proceedings of the 28th international conference on machine learning. Bellevue, WA, USA
Melekopoglou M, Condon A (1990) On the complexity of the policy iteration algorithm for stochastic games. Technical Report CS-TR-90-941, Computer Sciences Department, University of Wisconsin Madison
Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Book MATH Google Scholar
Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Sobel MJ (1994) Mean-variance tradeoffs in an undiscounted MDP. Oper Res 42:175–183
Article MATH Google Scholar
Tamar A, Castro DD, Mannor S (2012) Policy gradients with variance related risk criteria. In: Proceedings of the 29th international conference on machine learning (ICML). Edinburgh, Scotland
Ummels BC, Gibescu M, Pelgrum E, Kling WL, Brand AJ (2007) Impacts of wind power on thermal generation unit commitment and dispatch. IEEE Trans Energy Convers 22:44–51
Article Google Scholar
Xia L (2014) Event-based optimization of admission control in open queueing networks. Discrete Event Dynamic Systems: Theory and Applications 24(2):133–151
Article MathSciNet MATH Google Scholar
Xia L (2016) Optimization of parametric policies of Markov decision processes under a variance criterion. In: Proceedings of the 13th international workshop on discrete event systems (WODES2016). Xi’an, China, May 30-June 1, pp 332–337
Xia L (2016) Optimization of Markov decision processes under the variance criterion. Automatica 73 :269–278
Article MathSciNet MATH Google Scholar
Xia L, Jia QS (2015) Parameterized Markov decision process and its application to service rate control. Automatica 54:29–35
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China (2016YFB0901900), the National Natural Science Foundation of China (61573206, 61203039, U1301254), and the Suzhou-Tsinghua Innovation Leading Action Project.

Author information

Authors and Affiliations

Center for Intelligent Networked Systems (CFINS), Department of Automation, TNList, Tsinghua University, Beijing, 100084, China
Li Xia

Authors

Li Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Xia.

Additional information

This article belongs to the Topical Collection: Special Issue on Performance Analysis and Optimization of Discrete Event Systems

Guest Editors: Christos G. Cassandras and Alessandro Giua

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xia, L. Variance minimization of parameterized Markov decision processes. Discrete Event Dyn Syst 28, 63–81 (2018). https://doi.org/10.1007/s10626-017-0258-5

Download citation

Received: 05 October 2016
Accepted: 20 July 2017
Published: 26 August 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10626-017-0258-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variance minimization of parameterized Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variance minimization of parameterized Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation