Skip to main content
Log in

Heuristic anytime approaches to stochastic decision processes

  • Published:
Journal of Heuristics Aims and scope Submit manuscript

Abstract

This paper proposes a set of methods for solving stochastic decision problems modeled as partially observable Markov decision processes (POMDPs). This approach (Real Time Heuristic Decision System, RT-HDS) is based on the use of prediction methods combined with several existing heuristic decision algorithms. The prediction process is one of tree creation. The value function for the last step uses some of the classic heuristic decision methods. To illustrate how this approach works, comparative results of different algorithms with a variety of simple and complex benchmark problems are reported. The algorithm has also been tested in a mobile robot supervision architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aberdeen, D. (2002). “A Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes.” Technical Report. Research School of Information Science and Engineering, The Australian National University.

  • Barto, A., S. Bradtke, and S. Singh, (1995). “Learning to Act Using Real-Time Dynamic Programming.” Artificial Intelligence 72, 81–138.

    Article  Google Scholar 

  • Bayer, V. and T. Dietterich. (2001). “Two Heuristics for Solving POMDPs Having a Delayed Need to Observe”. In IJCAI Workshop on Planning under Uncertainty And Incomplete Information.

  • Bonet, B. and H. Geffner. (1998). “Solving Large POMDPs Using Real Time Dynamic Programming.” In AAAI Fall Symposium on POMDPs.

  • Bonet, B. and H. Geffner. (2001). “GPT: A Tool for Planning with Uncertainty and Partial Information.” In A. Cimatti, H. Geffner, E. Giunchiglia, and J. Rintanen (eds.), The IJCAI-01 Workshop on Planning with Uncertainty and Partial Information, Seattle, WA pp. 82–87

  • Boutilier, C. and D. Poole. (1996). “Computing Optimal Policies for Partially Observable Decision Processes using Compact Representations.” AAAI-96, Portland, OR.

  • Cassandra, A.R. (1998a). “Exact And Approximate Algorithms for Partially Observable Markov Decision Process.” PhD Thesis, Department of Computer Science, Brown University, Providence, Rhode Island.

  • Cassandra, A.R. (1998b). “A Survey of POMDP Applications.” In Working Notes of AAAI 1998 Fall Symposium on Planning with Partially Observable Markov Decision Processes, pp. 17–24.

  • Cassandra, A.R. (1999). Tony’s POMDP Web Page: http://www.cs.brown.edu/research/ai/pomdp/index.html.

  • Cassandra, A.R., L.P. Kaelbling, and J.A. Kurien. (1996). “Acting Under Uncertainty: Discrete Bayesian Models for Mobile-Robot Navigation.” In Proceedings of the International Conference on Intelligent Robots and Systems, IEE/RSJ.

  • Cassandra, A.R., L.P. Kaelbling, and M.L. Littman. (1994). “Acting Optimally in Partially Observable Stochastic Domains.” In Proceedings of the Twelfth National Conference on Artificial Intelligence.

  • Cassandra, A.R. M.L. Littman, and N. Zhang. (1997). “Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes.” In Proceedings of the 13 Annual Conference on Uncertainty in Artificial Intelligence (UAI-97).

  • Dean, T. and M. Boddy. (1988). “An Analysis of Time-Dependent Planning.” In Proceedings of the Seventh National Conference on Artificial Intelligence.

  • Dean, T., L.P. Kaelbling, J. Kirman, and A. Nicholson. (1995). Planning Under Time Constraints in Stochastic Domains.” Artificial Intelligence 76 (1/2), 35–74.

    Google Scholar 

  • Dearden, R. and C. Boutilier. (1994). “Integrating Planning and Execution in Stochastic Domains.” In R. Lopez de Mantaras and D.L. Poole, (eds.), Uncertainty in Artificial Intelligence: In Proceedings of the Tenth Conference, San Francisco, CA.: Morgan Kaufmann Publishers, pp. 162–169.

  • Dearden, R. and C. Boutilier. (1997). “Abstraction and Approximate Decision Theoretic Planning.”Artificial Intelligence 89 (1/2), 219–283.

    MathSciNet  Google Scholar 

  • Fernández, J.L. (2000). “Supervision, Detection, Diagnosis and Exception Recovery in Autonomous Mobile Robots.” Ph.D. Thesis, University of Vigo, Spain (English version available for download from: http://www.cs.cmu.edu/~joaquin/research/tesis.tar.gz).

  • Fernández, J.L., R. Sanz, and A.R. Diéguez. (2003). “Probabilistic Models for Monitoring and Fault Diagnosis: Application and Evaluation in A Mobile Robot.” To appear In Applied Artificial Intelligence.

  • Fernández, J.L. and R.G. Simmons. (1998). “Robust Execution Monitoring for Navigation Plans.” In Proceedings 1998 IEEE/RSJ International Conference.

  • Fernández, J.L., R.G. Simmons, R. Sanz, and A.R. Diéguez. (2001). “A Robust Stochastic Supervision Architecture for an Indoor Mobile Robot.” In Proceeding International Conference on Field and Service Robotics (FSR 2001), pp. 269–274.

  • Hansen, E.A. (1998). “Solving POMDPs by Searching in Policy Space.” Proceedings of XIV International Conference on Uncertainty on Artificial Intelligence (UAI-98).

  • Hauskrecht, M. (1996). “Planning and Control in Stochastic Domains with Imperfect Information.” PhD Thesis, EECS, MIT.

  • Hauskrecht, M. (2000). “Value-Functions Approximations for Partially Observable Markov Decision Processes.” Journal of Artificial Intelligence Research 13, 33–94.

    MATH  MathSciNet  Google Scholar 

  • Howard, R.A. and J.E. Matheson. (1981). “Influence Diagrams.” In R.A. Howard and J.E. Matheson (eds.), The Principles and Applications of Decision Analysis, Strategic Decision Group, CA pp. 720–762.

  • Kaelbling, L.P. M.L. Littman, and A.R. Cassandra. (1998). “Planning and Acting in Partially Observable Stochastic Domains.” Artificial Intelligence 101, 99–134.

    Article  MathSciNet  Google Scholar 

  • Kearns, M., Y. Mansour, and A.Y. Ng. (2000). “Approximate Planning in Large POMDPs via Reusable Trajectories.” Advances in Neural Information Processing Systems, Cambridge.: MIT Press Vol. 12.

    Google Scholar 

  • Korf, R. (1990). “Real-Time Heuristic Search.” Artificial Intelligence 42, 189–211.

    Article  MATH  Google Scholar 

  • Kushmerick, N., S. Hanks, and D.S. Weld. (1995). “An Algorithm for Probabilistic Planning.”Artificial Intelligence 76, 239–286.

    Article  Google Scholar 

  • Littman, M.L. (1994). “The Witness Algorithm for Solving Partially Observable Markov Decision Processes.” Technical Report CS-Brown University, Providence, Rhode Island. pp. 94–40,

  • Littman, M.L., A.R. Cassandra, and L. Kaelbling. (1995). “Learning Policies for Partially Observable Environments: Scaling Up.” In Proceedings of the 12th International Conference on Machine Learning.

  • Lovejoy, W.S. (1991). “A Survey of Algorithmic Methods for Partially Observable Markov Decision Processes.” Annals of Operations Research 28, 47–66.

    Article  MATH  MathSciNet  Google Scholar 

  • Monahan, G.E. (1982). “A Survey of Partially Observable Markov Decision Process: Theory, Models and Algorithms.” Management Science 28, 1–16.

    MATH  MathSciNet  Google Scholar 

  • Nikovski, D. and I. Nourbakhsh. (2000). “Learning Probabilistic Models for Decision-Theoretic Navigation of Mobile Robots.” In Proceedings 17th International Conference on Machine Learning.

  • Nourbakhsh, I., R. Powers, and S. Birchfield, (1995). “DERVISH an Office-Navigating Robot.” AI magazine 16, 53–60.

    Google Scholar 

  • Parr, R. and S.J. Russell. (1995). “Approximating Optimal Policies for Partially Observable Stochastic Domains.” In Proceedings of IJCAI-95 International Conference.

  • Pyeatt, L.D. and A.E. Howe. (2000). “A Parallel Algorithm for POMDP Solution.” In Proceedings 5th European Conference on Planning ECP’99, Durham, UK, 1999 (Published In Recent Advances in AI Planning, Lecture Notes in Artificial Intelligence, Springer-Verlag, Vol. 1809.

  • Satia, J.K. and R.E. Lave. (1973). “Markovian Decision Processes with Probabilistic Observation of States.” Management Science 20, 1–13.

    MathSciNet  Google Scholar 

  • Sawaki, K. and A. Ichikawa. (1978). “Optimal Control for Partially Observable Markov Decision Processes Over an Infinite Horizon.” Journal of the Operations Research Society of Japan 21, 1–14.

    MathSciNet  Google Scholar 

  • Simmons, R.G., J.L. Fernández, R. Goodwin, S. Koenig, and J. O’Sullivan. (2000). “Lessons Learned from Xavier.” IEEE Robotics and Automation Magazine 7, 35–39.

    Google Scholar 

  • Simmons, R.G. and S. Koenig. (1995). “Probabilistic Navigation in Partially Observable Environments.” In Proceedings of IJCAI-95 International Conference.

  • Smart, W.D. and L.P. Kaelbling. (2000). “Practical Reinforcement Learning in Continuous Spaces.” In Proceedings 17th International Conference on Machine Learning.

  • Sondik, E.J. (1978). “The Optimal Control of Partially Observable Markov Process Over The Infinite Horizon: Discounted Cost.” Operations Research 26, 282–304.

    MATH  MathSciNet  Google Scholar 

  • Washington, R. (1996). “Incremental Markov-Model Planning.” In Proceedings 8th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’96).

  • Washington, R. (1998). “Markov Tracking for Agent Coordination.” In Proceedings of Agents’98: Second International Conference on Autonomous Agents.

  • Zhang, N.L. and W. Liu. (1996). “Planning in Stochastic Domains: Problem Characteristics and Approximation.” Technical Report HKUST-CS96-31, Department of Computer Science, Hong Kong University of Science and Technology.

  • Zhang, N.L. and W. Zhang. (2001). “Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Process.” Journal of Artificial Intelligence Research 14, 29–51.

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Joaquín L. Fernández or Rafael Sanz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fernández, J.L., Sanz, R., Simmons, R.G. et al. Heuristic anytime approaches to stochastic decision processes. J Heuristics 12, 181–209 (2006). https://doi.org/10.1007/s10732-006-4834-3

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10732-006-4834-3

Keywords

Navigation