Skip to main content
Log in

Variance minimization of parameterized Markov decision processes

  • Published:
Discrete Event Dynamic Systems Aims and scope Submit manuscript

Abstract

In this paper, we study the variance minimization problem of Markov decision processes (MDPs) in which the policy is parameterized by action selection probabilities or other general parameters. Different from the average or discounted criterion mostly used in the traditional MDP theory, the variance criterion is difficult to handle because of the non-Markovian property caused by the nonlinear (quadratic) structure of variance function. With the basic idea of sensitivity-based optimization, we derive a difference formula of the reward variance under any two parametric policies. A variance derivative formula is also obtained. With these sensitivity formulas, we obtain a necessary condition of the optimal policy with the minimal variance. We also prove that the optimal policy with the minimal variance can be found in the deterministic policy space. An iterative algorithm is further developed to efficiently reduce the reward variance and this algorithm can converge to the local optimal policy. Finally, we conduct some numerical experiments to demonstrate the main results of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Bertsekas DP (2012) Dynamic programming and optimal control – vol II, 4th edn. Athena Scientific, Massachusetts

    MATH  Google Scholar 

  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming, Athena scientific. Belmont , Massachusetts

    MATH  Google Scholar 

  • Cao XR (2007) Stochastic learning and optimization – a sensitivity-based approach. Springer, New York

    Book  MATH  Google Scholar 

  • Cao XR, Chen HF (1997) Perturbation realization, potentials, and sensitivity analysis of Markov processes. IEEE Trans Autom Control 42:1382–1393

    Article  MathSciNet  MATH  Google Scholar 

  • Chung KJ (1994) Mean-variance tradeoffs in an undiscounted MDP: the unichain case. Oper Res 42(1): 184–188

    Article  MATH  Google Scholar 

  • Feinberg E, Schwartz A (2002) Handbook of Markov decision processes: methods and applications. Kluwer Academic Publishers, Boston

    Book  Google Scholar 

  • Guo X, Hernandez-Lerma O (2009) Continuous-time Markov decision processes. Springer, Theory and Applications

    Book  MATH  Google Scholar 

  • Guo X, Huang X, Zhang Y (2015) On the first passage g-mean-variance optimality for discounted continuous-time Markov decision processes. SIAM J Control Optim 53(3):1406–1424

    Article  MathSciNet  MATH  Google Scholar 

  • Guo X, Ye L, Yin G (2012) A mean-variance optimization problem for discounted Markov decision processes. Eur J Oper Res 220:423–429

    Article  MathSciNet  MATH  Google Scholar 

  • Hernandez-Lerma O, Vega-Amaya O, Carrasco G (1999) Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J Control Optim 38:79–93

    Article  MathSciNet  MATH  Google Scholar 

  • Huo H, Zou X, Guo X (2017) The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dynamic Systems: Theory and Applications

  • Littman ML, Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc.

  • Luh PB, Yu Y, Zhang B, Litvinov E, Zheng T, Zhao F, Zhao J, Wang C (2014) Grid integration of intermittent wind generation: a Markovian approach. IEEE Trans Smart Grid 5(2):732–741

    Article  Google Scholar 

  • Marbach P, Tsitsiklis JN (2001) Simulation-based optimization of Markov reward processes. IEEE Trans Autom Control 46:191–209

    Article  MathSciNet  MATH  Google Scholar 

  • Markowitz H (1952) Portfolio selection. J Financ 7:77–91

    Google Scholar 

  • Mannor S, Tsitsiklis JN (2011) Mean-variance optimization in Markov decision processes. In: Proceedings of the 28th international conference on machine learning. Bellevue, WA, USA

  • Melekopoglou M, Condon A (1990) On the complexity of the policy iteration algorithm for stochastic games. Technical Report CS-TR-90-941, Computer Sciences Department, University of Wisconsin Madison

  • Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley

  • Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

    Book  MATH  Google Scholar 

  • Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  • Sobel MJ (1994) Mean-variance tradeoffs in an undiscounted MDP. Oper Res 42:175–183

    Article  MATH  Google Scholar 

  • Tamar A, Castro DD, Mannor S (2012) Policy gradients with variance related risk criteria. In: Proceedings of the 29th international conference on machine learning (ICML). Edinburgh, Scotland

  • Ummels BC, Gibescu M, Pelgrum E, Kling WL, Brand AJ (2007) Impacts of wind power on thermal generation unit commitment and dispatch. IEEE Trans Energy Convers 22:44–51

    Article  Google Scholar 

  • Xia L (2014) Event-based optimization of admission control in open queueing networks. Discrete Event Dynamic Systems: Theory and Applications 24(2):133–151

    Article  MathSciNet  MATH  Google Scholar 

  • Xia L (2016) Optimization of parametric policies of Markov decision processes under a variance criterion. In: Proceedings of the 13th international workshop on discrete event systems (WODES2016). Xi’an, China, May 30-June 1, pp 332–337

  • Xia L (2016) Optimization of Markov decision processes under the variance criterion. Automatica 73 :269–278

    Article  MathSciNet  MATH  Google Scholar 

  • Xia L, Jia QS (2015) Parameterized Markov decision process and its application to service rate control. Automatica 54:29–35

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China (2016YFB0901900), the National Natural Science Foundation of China (61573206, 61203039, U1301254), and the Suzhou-Tsinghua Innovation Leading Action Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Xia.

Additional information

This article belongs to the Topical Collection: Special Issue on Performance Analysis and Optimization of Discrete Event Systems

Guest Editors: Christos G. Cassandras and Alessandro Giua

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, L. Variance minimization of parameterized Markov decision processes. Discrete Event Dyn Syst 28, 63–81 (2018). https://doi.org/10.1007/s10626-017-0258-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10626-017-0258-5

Keywords

Navigation