Abstract
Making decisions under risk is a competence human beings naturally display when being confronted with new and potentially dangerous learning tasks. In an effort to replicate this ability, many approaches have been promoted in different fields of artificial learning and planning. To plan domains with inherent risk in the presence of a simulation model we propose Risk-Sensitive Online Planning (RISEON) that extends traditional online planning by using an appropriate risk-aware optimization objective. The objective we use is Conditional Value at Risk (CVaR), where risk-sensitivity can be controlled by setting the quantile size to fit a given risk level. By using CVaR the planner shifts its focus from risk-neutral sample means towards the tail of loss distributions, thus considers an adjustable share of high costs. We evaluate RISEON in a smart grid planning scenario and in a continuous control task, where the planner has to steer a vehicle towards risky checkboxes, and empirically show that the proposed algorithm can be used to plan w.r.t. risk-sensitivity.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Galichet, N., Sebag, M., Teytaud, O.: Exploration vs exploitation vs safety: risk-aware multi-armed bandits. In: ACML, pp. 245–260 (2013)
Heger, M.: Consideration of risk in reinforcement learning. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 105–111 (1994)
Howard, R.A., Matheson, J.E.: Risk-sensitive Markov decision processes. Manag. Sci. 18(7), 356–369 (1972)
Kisiala, J.: Conditional value-at-risk: theory and applications. arXiv preprint arXiv:1511.00140 (2015)
Moldovan, T.M.: Safety, risk awareness and exploration in reinforcement learning. Ph.D. thesis, University of California, Berkeley (2014)
Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
Belzner, L., Hennicker, R., Wirsing, M.: OnPlan: a framework for simulation-based online planning. In: Braga, C., Ölveczky, P.C. (eds.) FACS 2015. LNCS, vol. 9539, pp. 1–30. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28934-2_1
Howard, R.A.: Dynamic Programming and Markov Processes. Wiley for The Massachusetts Institute of Technology, New York (1964)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT press, Cambridge (1998)
Weinstein, A.: Local planning for continuous Markov decision processes. Rutgers The State University of New Jersey-New Brunswick (2014)
Weinstein, A., Littman, M.L.: Open-loop planning in large-scale stochastic domains. In: AAAI (2013)
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)
Belzner, L.: Time-adaptive cross entropy planning. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 254–259. ACM (2016)
Liu, Y.: Decision-theoretic planning under risk-sensitive planning objectives. Ph.D. thesis, Georgia Institute of Technology (2005)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Chung, K.J., Sobel, M.J.: Discounted MDP’s: distribution functions and exponential utility maximization. SIAM J. Control Optim. 25(1), 49–62 (1987)
Moldovan, T.M., Abbeel, P.: Risk aversion in Markov decision processes via near optimal Chernoff bounds. In: NIPS, pp. 3140–3148 (2012)
Kashima, H.: Risk-sensitive learning via minimization of empirical conditional value-at-risk. IEICE Trans. Inf. Syst. 90(12), 2043–2052 (2007)
Rockafellar, R.T., Uryasev, S., et al.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
Chen, S.X.: Nonparametric estimation of expected shortfall. J. Financ. Econom. 6(1), 87–107 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Schmid, K. et al. (2018). Risk-Sensitivity in Simulation Based Online Planning. In: Trollmann, F., Turhan, AY. (eds) KI 2018: Advances in Artificial Intelligence. KI 2018. Lecture Notes in Computer Science(), vol 11117. Springer, Cham. https://doi.org/10.1007/978-3-030-00111-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-00111-7_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00110-0
Online ISBN: 978-3-030-00111-7
eBook Packages: Computer ScienceComputer Science (R0)