Abstract
This paper proposes a set of methods for solving stochastic decision problems modeled as partially observable Markov decision processes (POMDPs). This approach (Real Time Heuristic Decision System, RT-HDS) is based on the use of prediction methods combined with several existing heuristic decision algorithms. The prediction process is one of tree creation. The value function for the last step uses some of the classic heuristic decision methods. To illustrate how this approach works, comparative results of different algorithms with a variety of simple and complex benchmark problems are reported. The algorithm has also been tested in a mobile robot supervision architecture.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aberdeen, D. (2002). “A Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes.” Technical Report. Research School of Information Science and Engineering, The Australian National University.
Barto, A., S. Bradtke, and S. Singh, (1995). “Learning to Act Using Real-Time Dynamic Programming.” Artificial Intelligence 72, 81–138.
Bayer, V. and T. Dietterich. (2001). “Two Heuristics for Solving POMDPs Having a Delayed Need to Observe”. In IJCAI Workshop on Planning under Uncertainty And Incomplete Information.
Bonet, B. and H. Geffner. (1998). “Solving Large POMDPs Using Real Time Dynamic Programming.” In AAAI Fall Symposium on POMDPs.
Bonet, B. and H. Geffner. (2001). “GPT: A Tool for Planning with Uncertainty and Partial Information.” In A. Cimatti, H. Geffner, E. Giunchiglia, and J. Rintanen (eds.), The IJCAI-01 Workshop on Planning with Uncertainty and Partial Information, Seattle, WA pp. 82–87
Boutilier, C. and D. Poole. (1996). “Computing Optimal Policies for Partially Observable Decision Processes using Compact Representations.” AAAI-96, Portland, OR.
Cassandra, A.R. (1998a). “Exact And Approximate Algorithms for Partially Observable Markov Decision Process.” PhD Thesis, Department of Computer Science, Brown University, Providence, Rhode Island.
Cassandra, A.R. (1998b). “A Survey of POMDP Applications.” In Working Notes of AAAI 1998 Fall Symposium on Planning with Partially Observable Markov Decision Processes, pp. 17–24.
Cassandra, A.R. (1999). Tony’s POMDP Web Page: http://www.cs.brown.edu/research/ai/pomdp/index.html.
Cassandra, A.R., L.P. Kaelbling, and J.A. Kurien. (1996). “Acting Under Uncertainty: Discrete Bayesian Models for Mobile-Robot Navigation.” In Proceedings of the International Conference on Intelligent Robots and Systems, IEE/RSJ.
Cassandra, A.R., L.P. Kaelbling, and M.L. Littman. (1994). “Acting Optimally in Partially Observable Stochastic Domains.” In Proceedings of the Twelfth National Conference on Artificial Intelligence.
Cassandra, A.R. M.L. Littman, and N. Zhang. (1997). “Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes.” In Proceedings of the 13 Annual Conference on Uncertainty in Artificial Intelligence (UAI-97).
Dean, T. and M. Boddy. (1988). “An Analysis of Time-Dependent Planning.” In Proceedings of the Seventh National Conference on Artificial Intelligence.
Dean, T., L.P. Kaelbling, J. Kirman, and A. Nicholson. (1995). Planning Under Time Constraints in Stochastic Domains.” Artificial Intelligence 76 (1/2), 35–74.
Dearden, R. and C. Boutilier. (1994). “Integrating Planning and Execution in Stochastic Domains.” In R. Lopez de Mantaras and D.L. Poole, (eds.), Uncertainty in Artificial Intelligence: In Proceedings of the Tenth Conference, San Francisco, CA.: Morgan Kaufmann Publishers, pp. 162–169.
Dearden, R. and C. Boutilier. (1997). “Abstraction and Approximate Decision Theoretic Planning.”Artificial Intelligence 89 (1/2), 219–283.
Fernández, J.L. (2000). “Supervision, Detection, Diagnosis and Exception Recovery in Autonomous Mobile Robots.” Ph.D. Thesis, University of Vigo, Spain (English version available for download from: http://www.cs.cmu.edu/~joaquin/research/tesis.tar.gz).
Fernández, J.L., R. Sanz, and A.R. Diéguez. (2003). “Probabilistic Models for Monitoring and Fault Diagnosis: Application and Evaluation in A Mobile Robot.” To appear In Applied Artificial Intelligence.
Fernández, J.L. and R.G. Simmons. (1998). “Robust Execution Monitoring for Navigation Plans.” In Proceedings 1998 IEEE/RSJ International Conference.
Fernández, J.L., R.G. Simmons, R. Sanz, and A.R. Diéguez. (2001). “A Robust Stochastic Supervision Architecture for an Indoor Mobile Robot.” In Proceeding International Conference on Field and Service Robotics (FSR 2001), pp. 269–274.
Hansen, E.A. (1998). “Solving POMDPs by Searching in Policy Space.” Proceedings of XIV International Conference on Uncertainty on Artificial Intelligence (UAI-98).
Hauskrecht, M. (1996). “Planning and Control in Stochastic Domains with Imperfect Information.” PhD Thesis, EECS, MIT.
Hauskrecht, M. (2000). “Value-Functions Approximations for Partially Observable Markov Decision Processes.” Journal of Artificial Intelligence Research 13, 33–94.
Howard, R.A. and J.E. Matheson. (1981). “Influence Diagrams.” In R.A. Howard and J.E. Matheson (eds.), The Principles and Applications of Decision Analysis, Strategic Decision Group, CA pp. 720–762.
Kaelbling, L.P. M.L. Littman, and A.R. Cassandra. (1998). “Planning and Acting in Partially Observable Stochastic Domains.” Artificial Intelligence 101, 99–134.
Kearns, M., Y. Mansour, and A.Y. Ng. (2000). “Approximate Planning in Large POMDPs via Reusable Trajectories.” Advances in Neural Information Processing Systems, Cambridge.: MIT Press Vol. 12.
Korf, R. (1990). “Real-Time Heuristic Search.” Artificial Intelligence 42, 189–211.
Kushmerick, N., S. Hanks, and D.S. Weld. (1995). “An Algorithm for Probabilistic Planning.”Artificial Intelligence 76, 239–286.
Littman, M.L. (1994). “The Witness Algorithm for Solving Partially Observable Markov Decision Processes.” Technical Report CS-Brown University, Providence, Rhode Island. pp. 94–40,
Littman, M.L., A.R. Cassandra, and L. Kaelbling. (1995). “Learning Policies for Partially Observable Environments: Scaling Up.” In Proceedings of the 12th International Conference on Machine Learning.
Lovejoy, W.S. (1991). “A Survey of Algorithmic Methods for Partially Observable Markov Decision Processes.” Annals of Operations Research 28, 47–66.
Monahan, G.E. (1982). “A Survey of Partially Observable Markov Decision Process: Theory, Models and Algorithms.” Management Science 28, 1–16.
Nikovski, D. and I. Nourbakhsh. (2000). “Learning Probabilistic Models for Decision-Theoretic Navigation of Mobile Robots.” In Proceedings 17th International Conference on Machine Learning.
Nourbakhsh, I., R. Powers, and S. Birchfield, (1995). “DERVISH an Office-Navigating Robot.” AI magazine 16, 53–60.
Parr, R. and S.J. Russell. (1995). “Approximating Optimal Policies for Partially Observable Stochastic Domains.” In Proceedings of IJCAI-95 International Conference.
Pyeatt, L.D. and A.E. Howe. (2000). “A Parallel Algorithm for POMDP Solution.” In Proceedings 5th European Conference on Planning ECP’99, Durham, UK, 1999 (Published In Recent Advances in AI Planning, Lecture Notes in Artificial Intelligence, Springer-Verlag, Vol. 1809.
Satia, J.K. and R.E. Lave. (1973). “Markovian Decision Processes with Probabilistic Observation of States.” Management Science 20, 1–13.
Sawaki, K. and A. Ichikawa. (1978). “Optimal Control for Partially Observable Markov Decision Processes Over an Infinite Horizon.” Journal of the Operations Research Society of Japan 21, 1–14.
Simmons, R.G., J.L. Fernández, R. Goodwin, S. Koenig, and J. O’Sullivan. (2000). “Lessons Learned from Xavier.” IEEE Robotics and Automation Magazine 7, 35–39.
Simmons, R.G. and S. Koenig. (1995). “Probabilistic Navigation in Partially Observable Environments.” In Proceedings of IJCAI-95 International Conference.
Smart, W.D. and L.P. Kaelbling. (2000). “Practical Reinforcement Learning in Continuous Spaces.” In Proceedings 17th International Conference on Machine Learning.
Sondik, E.J. (1978). “The Optimal Control of Partially Observable Markov Process Over The Infinite Horizon: Discounted Cost.” Operations Research 26, 282–304.
Washington, R. (1996). “Incremental Markov-Model Planning.” In Proceedings 8th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’96).
Washington, R. (1998). “Markov Tracking for Agent Coordination.” In Proceedings of Agents’98: Second International Conference on Autonomous Agents.
Zhang, N.L. and W. Liu. (1996). “Planning in Stochastic Domains: Problem Characteristics and Approximation.” Technical Report HKUST-CS96-31, Department of Computer Science, Hong Kong University of Science and Technology.
Zhang, N.L. and W. Zhang. (2001). “Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Process.” Journal of Artificial Intelligence Research 14, 29–51.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Fernández, J.L., Sanz, R., Simmons, R.G. et al. Heuristic anytime approaches to stochastic decision processes. J Heuristics 12, 181–209 (2006). https://doi.org/10.1007/s10732-006-4834-3
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10732-006-4834-3