Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making

Choi, Samuel P. M.; Yeung, Dit-Yan; Zhang, Nevin L.

doi:10.1007/3-540-44565-X_12

Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making

Samuel P. M. Choi³,
Dit-Yan Yeung³ &
Nevin L. Zhang³

Chapter
First Online: 01 January 2001

1242 Accesses
11 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1828))

Abstract

Problem formulation is often an important first step for solving a problem effectively. In sequential decision problems, Markov decision process (MDP) ([2]; [22]) is a model formulation that has been commonly used, due to its generality, flexibility, and applicability to a wide range of problems. Despite these advantages, there are three necessary conditions that must be satisfied before the MDP model can be applied; that is,

1.
The environment model is given in advance (a completely-known environment).
2.
The environment states are completely observable (fully-observable states, implying a Markovian environment).
3.
The environment parameters do not change over time (a stationary environment).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. E. Bellman, (1957). Dynamic Programming. Princeton University Press, Princeton, NJ.
Google Scholar
R. E. Bellman, (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6:679–684.
MATH Google Scholar
J. A. Boyan and M. L. Littman, (1994). Packet routing in dynamically changing networks: a reinforcement learning approach. In Advances in Neural Information Processing Systems 6, pages 671–678, San Mateo, California. Morgan Kaugmann.
Google Scholar
A. R. Cassandra, M. L. Littman, and N. Zhang, (1997) Incremental pruning: A simple, fast, exact algorithm for partially observable Markov decision processes. In Uncertainty in Artificial Intelligence, Providence, RI.
Google Scholar
H.-T. Cheng, (1988). Algorithms for Partially Observable Markov Decision Processes. PhD thesis, University of British Columbia, British Columbia, Canada.
Google Scholar
S. P. M. Choi, (2000). Reinforcement Learning in Nonstationary Environments. PhD thesis, Hong Kong University of Science and Technology, Department of Computer Science, HKUST, Clear Water Bay, Hong Kong, China, Jan.
Google Scholar
S. P. M. Choi, D. Y. Yeung, and N. L. Zhang, (1999). An environment model for nonstationary reinforcement learning. In Advances in Neural Information Processing Systems 12. To appear.
Google Scholar
L. Chrisman, (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In AAAI-92.
Google Scholar
R. H. Crites and A. G. Barto, (1996). Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems 8.
Google Scholar
P. Dayan and T. J. Sejnowski, (1996). Exploration bonuses and dual control. Machine Learning, 25(1):5–22, Oct.
Google Scholar
T. Jaakkola, S. P. Singh, and M. I. Jordan, (1995). Monte-Carlo reinforcement learning in non-Markovian decision problems. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, MA. The MIT Press.
Google Scholar
L. P. Kaelbling, M. L. Littman, and A. W. Moore, (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237–285, May.
Google Scholar
L. J. Lin and T. M. Mitchell, (1992). Memory approaches to reinforcement learning in non-Markovian domains. Technical Report CMU-CS-92-138, Carnegie Mellon University, School of Computer Science.
Google Scholar
M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, (1995a). Learning policies for partially observable environments: Scaling up. In A. Prieditis and S. Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 362–370, San Francisco, CA. Morgan Kaufmann.
Google Scholar
M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, (1995b). Efficient dynamicprogramming updates in partially observable Markov decision processes. Technical Report TR CS-95-19, Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA.
Google Scholar
M. L. Littman and D. H. Ackley, (1991). Adaptation in constant utility non-stationary environments. In R. K. Belew and L. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 136–142, San Mateo, CA, Dec. Morgan Kaufmann.
Google Scholar
W. S. Lovejoy, (1991). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28:47–66.
Article MATH MathSciNet Google Scholar
A. McCallum, (1993). Overcoming incomplete perception with utile distinction memory. In Tenth International Machine Learning Conference, Amherst, MA.
Google Scholar
A. McCallum, (1995). Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester, Dec.
Google Scholar
G. E. Monahan, (1982). A survey of partially observable Markov decision processes: Theory, models and algorithms. Management Science, 28:1–16.
MATH MathSciNet Google Scholar
C. H. Papadimitriou and J. N. Tsitsiklis (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3):441–450.
Article MATH MathSciNet Google Scholar
M. L. Puterman (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons.
Google Scholar
L. R. Rabiner, (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), Feb.
Google Scholar
J. H. Schmidhuber (1990). Reinforcement learning in Markovian and non-Markovian environments. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems, volume 3, pages 500–506, San Mateo, CA. Morgan Kaufmann.
Google Scholar
S. Singh and D. P. Bertsekas, (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems 9, 1997.
Google Scholar
E. J. Sondik, (1971). The Optimal Control of Partially Observable Markov Processes. PhD thesis, Stanford University, Stanford, California, USA.
Google Scholar
R. S. Sutton, (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, pages 216–224. Morgan Kaufmann.
Google Scholar
R. S. Sutton and A. G. Barto, (1998). Reinforcement Learning: An Introduction. The MIT Press.
Google Scholar
C. C. White III, (1991). Partially observed markov decision processes: A survey. Annals of Operations Research, 32.
Google Scholar
N. L. Zhang, S. S. Lee, and W. Zhang, (1999). A method for speeding up value iteration in partially observable markov decision processes. In Proceeding of 15th Conference on Uncertainties in Artificial Intelligence.
Google Scholar
N. L. Zhang and W. Liu, (1997). A model approximation scheme for planning in partially observable stochastic domains. Journal of Artificial Intelligence Research, 7:199–230.
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Samuel P. M. Choi, Dit-Yan Yeung & Nevin L. Zhang

Authors

Samuel P. M. Choi
View author publications
You can also search for this author in PubMed Google Scholar
Dit-Yan Yeung
View author publications
You can also search for this author in PubMed Google Scholar
Nevin L. Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CECS Department, University of Missouri-Columbia, 201 Engineering Building West, Columbia, MO, 65211-2060, USA
Ron Sun
NEC Research Institute, 4 Independence Way, Princeton, NJ, 08540, USA
C. Lee Giles

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Choi, S.P.M., Yeung, DY., Zhang, N.L. (2000). Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making. In: Sun, R., Giles, C.L. (eds) Sequence Learning. Lecture Notes in Computer Science(), vol 1828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44565-X_12

Download citation

DOI: https://doi.org/10.1007/3-540-44565-X_12
Published: 07 December 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41597-8
Online ISBN: 978-3-540-44565-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics