Skip to main content

Risk-Sensitivity in Simulation Based Online Planning

  • Conference paper
  • First Online:
  • 1235 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11117))

Abstract

Making decisions under risk is a competence human beings naturally display when being confronted with new and potentially dangerous learning tasks. In an effort to replicate this ability, many approaches have been promoted in different fields of artificial learning and planning. To plan domains with inherent risk in the presence of a simulation model we propose Risk-Sensitive Online Planning (RISEON) that extends traditional online planning by using an appropriate risk-aware optimization objective. The objective we use is Conditional Value at Risk (CVaR), where risk-sensitivity can be controlled by setting the quantile size to fit a given risk level. By using CVaR the planner shifts its focus from risk-neutral sample means towards the tail of loss distributions, thus considers an adjustable share of high costs. We evaluate RISEON in a smart grid planning scenario and in a continuous control task, where the planner has to steer a vehicle towards risky checkboxes, and empirically show that the proposed algorithm can be used to plan w.r.t. risk-sensitivity.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Galichet, N., Sebag, M., Teytaud, O.: Exploration vs exploitation vs safety: risk-aware multi-armed bandits. In: ACML, pp. 245–260 (2013)

    Google Scholar 

  2. Heger, M.: Consideration of risk in reinforcement learning. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 105–111 (1994)

    Google Scholar 

  3. Howard, R.A., Matheson, J.E.: Risk-sensitive Markov decision processes. Manag. Sci. 18(7), 356–369 (1972)

    Article  MathSciNet  Google Scholar 

  4. Kisiala, J.: Conditional value-at-risk: theory and applications. arXiv preprint arXiv:1511.00140 (2015)

  5. Moldovan, T.M.: Safety, risk awareness and exploration in reinforcement learning. Ph.D. thesis, University of California, Berkeley (2014)

    Google Scholar 

  6. Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  7. Belzner, L., Hennicker, R., Wirsing, M.: OnPlan: a framework for simulation-based online planning. In: Braga, C., Ölveczky, P.C. (eds.) FACS 2015. LNCS, vol. 9539, pp. 1–30. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28934-2_1

    Chapter  Google Scholar 

  8. Howard, R.A.: Dynamic Programming and Markov Processes. Wiley for The Massachusetts Institute of Technology, New York (1964)

    Google Scholar 

  9. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)

    MATH  Google Scholar 

  10. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT press, Cambridge (1998)

    Google Scholar 

  11. Weinstein, A.: Local planning for continuous Markov decision processes. Rutgers The State University of New Jersey-New Brunswick (2014)

    Google Scholar 

  12. Weinstein, A., Littman, M.L.: Open-loop planning in large-scale stochastic domains. In: AAAI (2013)

    Google Scholar 

  13. Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29

    Chapter  Google Scholar 

  14. De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)

    Article  MathSciNet  Google Scholar 

  15. Belzner, L.: Time-adaptive cross entropy planning. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 254–259. ACM (2016)

    Google Scholar 

  16. Liu, Y.: Decision-theoretic planning under risk-sensitive planning objectives. Ph.D. thesis, Georgia Institute of Technology (2005)

    Google Scholar 

  17. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  18. Chung, K.J., Sobel, M.J.: Discounted MDP’s: distribution functions and exponential utility maximization. SIAM J. Control Optim. 25(1), 49–62 (1987)

    Article  MathSciNet  Google Scholar 

  19. Moldovan, T.M., Abbeel, P.: Risk aversion in Markov decision processes via near optimal Chernoff bounds. In: NIPS, pp. 3140–3148 (2012)

    Google Scholar 

  20. Kashima, H.: Risk-sensitive learning via minimization of empirical conditional value-at-risk. IEICE Trans. Inf. Syst. 90(12), 2043–2052 (2007)

    Article  Google Scholar 

  21. Rockafellar, R.T., Uryasev, S., et al.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)

    Article  Google Scholar 

  22. Chen, S.X.: Nonparametric estimation of expected shortfall. J. Financ. Econom. 6(1), 87–107 (2008)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyrill Schmid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schmid, K. et al. (2018). Risk-Sensitivity in Simulation Based Online Planning. In: Trollmann, F., Turhan, AY. (eds) KI 2018: Advances in Artificial Intelligence. KI 2018. Lecture Notes in Computer Science(), vol 11117. Springer, Cham. https://doi.org/10.1007/978-3-030-00111-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00111-7_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00110-0

  • Online ISBN: 978-3-030-00111-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics