Abstract
Markov decision processes (MDPs) are applied as a standard model in Artificial Intelligence planning. MDPs are used to construct optimal or near optimal policies or plans. One area that is often missing from discussions of planning under stochastic environment is how MDPs handle safety constraints expressed as probability of reaching threat states. We introduce a method for finding a value optimal policy satisfying the safety constraint, and report on the validity and effectiveness of our method through a set of experiments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Asimov, I.: Runaround. Astounding Science Fiction (March 1942)
Weld, D., Etzioni, O.: The first law of robotics (a call to arms). In: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-1994), Seattle, Washington. AAAI Press, Menlo Park (1994)
Etzioni, O.: Intelligence without robots (a reply to brooks). AI Magazine (1993)
Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research 11, 1–94 (1999)
Kaelbling, L., Littman, M., Moore, A.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Ghallab, M., Nau, D., Traverso, P.: Automated Planning: Theory and Practice, ch. 16, draft edn., pp. 411–433. Morgan Kaufmann Publishers, San Francisco (2003)
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambidge, England (1989)
Neuneier, R., Mihatsch, O.: Risk sensitive reinforcement learning. In: Proceedings of the 1998 conference on Advances in neural information processing systems II, pp. 1031–1037. MIT Press, Cambridge (1999)
Draper, D., Hanks, S., Weld, D.: Probabilistic planning with information gathering and contingent execution. Technical report, Department of Computer Science and Engineering, Seattle, WA (December 1993)
Draper, D., Hanks, S., Weld, D.: Probabilistic planning with information gathering and contingent execution. In: Hammond, K. (ed.) Proceedings of the Second International Conference on AI Planning Systems, Menlo Park, California, pp. 31–36. American Association for Artificial Intelligence (1994)
Fulkerson, M.S., Littman, M.L., Keim, G.A.: Speeding safely: Multi-criteria optimization in probabilistic planning. In: AAAI/IAAI, p. 831 (1997)
Geibel, P.: Reinforcement learning with bounded risk. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pp. 162–169 (2001)
MathPath. The regula falsi method for square-roots (January 2004), http://www.mathpath.org/Algor/algor.square.root.regula.falsi.htm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ren, H., Bitaghsir, A.A., Barley, M. (2009). Safe Stochastic Planning: Planning to Avoid Fatal States. In: Barley, M., Mouratidis, H., Unruh, A., Spears, D., Scerri, P., Massacci, F. (eds) Safety and Security in Multiagent Systems. Lecture Notes in Computer Science(), vol 4324. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04879-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-04879-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04878-4
Online ISBN: 978-3-642-04879-1
eBook Packages: Computer ScienceComputer Science (R0)