Risk-Sensitivity in Simulation Based Online Planning

Schmid, Kyrill; Belzner, Lenz; Kiermeier, Marie; Neitz, Alexander; Phan, Thomy; Gabor, Thomas; Linnhoff, Claudia

doi:10.1007/978-3-030-00111-7_20

Risk-Sensitivity in Simulation Based Online Planning

Kyrill Schmid¹⁵,
Lenz Belzner¹⁵,
Marie Kiermeier¹⁵,
Alexander Neitz¹⁶,
Thomy Phan¹⁵,
Thomas Gabor¹⁵ &
…
Claudia Linnhoff¹⁵

Conference paper
First Online: 30 August 2018

1235 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11117))

Abstract

Making decisions under risk is a competence human beings naturally display when being confronted with new and potentially dangerous learning tasks. In an effort to replicate this ability, many approaches have been promoted in different fields of artificial learning and planning. To plan domains with inherent risk in the presence of a simulation model we propose Risk-Sensitive Online Planning (RISEON) that extends traditional online planning by using an appropriate risk-aware optimization objective. The objective we use is Conditional Value at Risk (CVaR), where risk-sensitivity can be controlled by setting the quantile size to fit a given risk level. By using CVaR the planner shifts its focus from risk-neutral sample means towards the tail of loss distributions, thus considers an adjustable share of high costs. We evaluate RISEON in a smart grid planning scenario and in a continuous control task, where the planner has to steer a vehicle towards risky checkboxes, and empirically show that the proposed algorithm can be used to plan w.r.t. risk-sensitivity.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Galichet, N., Sebag, M., Teytaud, O.: Exploration vs exploitation vs safety: risk-aware multi-armed bandits. In: ACML, pp. 245–260 (2013)
Google Scholar
Heger, M.: Consideration of risk in reinforcement learning. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 105–111 (1994)
Google Scholar
Howard, R.A., Matheson, J.E.: Risk-sensitive Markov decision processes. Manag. Sci. 18(7), 356–369 (1972)
Article MathSciNet Google Scholar
Kisiala, J.: Conditional value-at-risk: theory and applications. arXiv preprint arXiv:1511.00140 (2015)
Moldovan, T.M.: Safety, risk awareness and exploration in reinforcement learning. Ph.D. thesis, University of California, Berkeley (2014)
Google Scholar
Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
MathSciNet MATH Google Scholar
Belzner, L., Hennicker, R., Wirsing, M.: OnPlan: a framework for simulation-based online planning. In: Braga, C., Ölveczky, P.C. (eds.) FACS 2015. LNCS, vol. 9539, pp. 1–30. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28934-2_1
Chapter Google Scholar
Howard, R.A.: Dynamic Programming and Markov Processes. Wiley for The Massachusetts Institute of Technology, New York (1964)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)
MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT press, Cambridge (1998)
Google Scholar
Weinstein, A.: Local planning for continuous Markov decision processes. Rutgers The State University of New Jersey-New Brunswick (2014)
Google Scholar
Weinstein, A., Littman, M.L.: Open-loop planning in large-scale stochastic domains. In: AAAI (2013)
Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)
Article MathSciNet Google Scholar
Belzner, L.: Time-adaptive cross entropy planning. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 254–259. ACM (2016)
Google Scholar
Liu, Y.: Decision-theoretic planning under risk-sensitive planning objectives. Ph.D. thesis, Georgia Institute of Technology (2005)
Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Chung, K.J., Sobel, M.J.: Discounted MDP’s: distribution functions and exponential utility maximization. SIAM J. Control Optim. 25(1), 49–62 (1987)
Article MathSciNet Google Scholar
Moldovan, T.M., Abbeel, P.: Risk aversion in Markov decision processes via near optimal Chernoff bounds. In: NIPS, pp. 3140–3148 (2012)
Google Scholar
Kashima, H.: Risk-sensitive learning via minimization of empirical conditional value-at-risk. IEICE Trans. Inf. Syst. 90(12), 2043–2052 (2007)
Article Google Scholar
Rockafellar, R.T., Uryasev, S., et al.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
Article Google Scholar
Chen, S.X.: Nonparametric estimation of expected shortfall. J. Financ. Econom. 6(1), 87–107 (2008)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Mobile and Distributed Systems Group, LMU Munich, Oettingenstr. 67, Munich, Germany
Kyrill Schmid, Lenz Belzner, Marie Kiermeier, Thomy Phan, Thomas Gabor & Claudia Linnhoff
Empirical Inference, Max Planck Institute for Intelligent Systems, Max-Planck-Ring 4, Tübingen, Germany
Alexander Neitz

Authors

Kyrill Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Lenz Belzner
View author publications
You can also search for this author in PubMed Google Scholar
Marie Kiermeier
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Neitz
View author publications
You can also search for this author in PubMed Google Scholar
Thomy Phan
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Gabor
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Linnhoff
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyrill Schmid .

Editor information

Editors and Affiliations

TU Berlin, Berlin, Germany
Frank Trollmann
TU Dresden, Dresden, Germany
Anni-Yasmin Turhan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schmid, K. et al. (2018). Risk-Sensitivity in Simulation Based Online Planning. In: Trollmann, F., Turhan, AY. (eds) KI 2018: Advances in Artificial Intelligence. KI 2018. Lecture Notes in Computer Science(), vol 11117. Springer, Cham. https://doi.org/10.1007/978-3-030-00111-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-00111-7_20
Published: 30 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00110-0
Online ISBN: 978-3-030-00111-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics