Skip to main content

Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1828))

Abstract

Problem formulation is often an important first step for solving a problem effectively. In sequential decision problems, Markov decision process (MDP) ([2]; [22]) is a model formulation that has been commonly used, due to its generality, flexibility, and applicability to a wide range of problems. Despite these advantages, there are three necessary conditions that must be satisfied before the MDP model can be applied; that is,

  1. 1.

    The environment model is given in advance (a completely-known environment).

  2. 2.

    The environment states are completely observable (fully-observable states, implying a Markovian environment).

  3. 3.

    The environment parameters do not change over time (a stationary environment).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • R. E. Bellman, (1957). Dynamic Programming. Princeton University Press, Princeton, NJ.

    Google Scholar 

  • R. E. Bellman, (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6:679–684.

    MATH  Google Scholar 

  • J. A. Boyan and M. L. Littman, (1994). Packet routing in dynamically changing networks: a reinforcement learning approach. In Advances in Neural Information Processing Systems 6, pages 671–678, San Mateo, California. Morgan Kaugmann.

    Google Scholar 

  • A. R. Cassandra, M. L. Littman, and N. Zhang, (1997) Incremental pruning: A simple, fast, exact algorithm for partially observable Markov decision processes. In Uncertainty in Artificial Intelligence, Providence, RI.

    Google Scholar 

  • H.-T. Cheng, (1988). Algorithms for Partially Observable Markov Decision Processes. PhD thesis, University of British Columbia, British Columbia, Canada.

    Google Scholar 

  • S. P. M. Choi, (2000). Reinforcement Learning in Nonstationary Environments. PhD thesis, Hong Kong University of Science and Technology, Department of Computer Science, HKUST, Clear Water Bay, Hong Kong, China, Jan.

    Google Scholar 

  • S. P. M. Choi, D. Y. Yeung, and N. L. Zhang, (1999). An environment model for nonstationary reinforcement learning. In Advances in Neural Information Processing Systems 12. To appear.

    Google Scholar 

  • L. Chrisman, (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In AAAI-92.

    Google Scholar 

  • R. H. Crites and A. G. Barto, (1996). Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems 8.

    Google Scholar 

  • P. Dayan and T. J. Sejnowski, (1996). Exploration bonuses and dual control. Machine Learning, 25(1):5–22, Oct.

    Google Scholar 

  • T. Jaakkola, S. P. Singh, and M. I. Jordan, (1995). Monte-Carlo reinforcement learning in non-Markovian decision problems. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, MA. The MIT Press.

    Google Scholar 

  • L. P. Kaelbling, M. L. Littman, and A. W. Moore, (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237–285, May.

    Google Scholar 

  • L. J. Lin and T. M. Mitchell, (1992). Memory approaches to reinforcement learning in non-Markovian domains. Technical Report CMU-CS-92-138, Carnegie Mellon University, School of Computer Science.

    Google Scholar 

  • M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, (1995a). Learning policies for partially observable environments: Scaling up. In A. Prieditis and S. Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 362–370, San Francisco, CA. Morgan Kaufmann.

    Google Scholar 

  • M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, (1995b). Efficient dynamicprogramming updates in partially observable Markov decision processes. Technical Report TR CS-95-19, Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA.

    Google Scholar 

  • M. L. Littman and D. H. Ackley, (1991). Adaptation in constant utility non-stationary environments. In R. K. Belew and L. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 136–142, San Mateo, CA, Dec. Morgan Kaufmann.

    Google Scholar 

  • W. S. Lovejoy, (1991). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28:47–66.

    Article  MATH  MathSciNet  Google Scholar 

  • A. McCallum, (1993). Overcoming incomplete perception with utile distinction memory. In Tenth International Machine Learning Conference, Amherst, MA.

    Google Scholar 

  • A. McCallum, (1995). Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester, Dec.

    Google Scholar 

  • G. E. Monahan, (1982). A survey of partially observable Markov decision processes: Theory, models and algorithms. Management Science, 28:1–16.

    MATH  MathSciNet  Google Scholar 

  • C. H. Papadimitriou and J. N. Tsitsiklis (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3):441–450.

    Article  MATH  MathSciNet  Google Scholar 

  • M. L. Puterman (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons.

    Google Scholar 

  • L. R. Rabiner, (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), Feb.

    Google Scholar 

  • J. H. Schmidhuber (1990). Reinforcement learning in Markovian and non-Markovian environments. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems, volume 3, pages 500–506, San Mateo, CA. Morgan Kaufmann.

    Google Scholar 

  • S. Singh and D. P. Bertsekas, (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems 9, 1997.

    Google Scholar 

  • E. J. Sondik, (1971). The Optimal Control of Partially Observable Markov Processes. PhD thesis, Stanford University, Stanford, California, USA.

    Google Scholar 

  • R. S. Sutton, (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, pages 216–224. Morgan Kaufmann.

    Google Scholar 

  • R. S. Sutton and A. G. Barto, (1998). Reinforcement Learning: An Introduction. The MIT Press.

    Google Scholar 

  • C. C. White III, (1991). Partially observed markov decision processes: A survey. Annals of Operations Research, 32.

    Google Scholar 

  • N. L. Zhang, S. S. Lee, and W. Zhang, (1999). A method for speeding up value iteration in partially observable markov decision processes. In Proceeding of 15th Conference on Uncertainties in Artificial Intelligence.

    Google Scholar 

  • N. L. Zhang and W. Liu, (1997). A model approximation scheme for planning in partially observable stochastic domains. Journal of Artificial Intelligence Research, 7:199–230.

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Choi, S.P.M., Yeung, DY., Zhang, N.L. (2000). Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making. In: Sun, R., Giles, C.L. (eds) Sequence Learning. Lecture Notes in Computer Science(), vol 1828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44565-X_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-44565-X_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41597-8

  • Online ISBN: 978-3-540-44565-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics