A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Zheng, Lei; Cho, Siu-Yeung

doi:10.1007/s11063-011-9172-2

A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Published: 19 February 2011

Volume 33, pages 187–200, (2011)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Lei Zheng¹ &
Siu-Yeung Cho¹

272 Accesses
5 Citations
Explore all metrics

Abstract

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree’s state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bellman RE (2003) Dynamic programming. Dover, Mineola
MATH Google Scholar
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4: 237–285
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT, Cambridge
Google Scholar
Sondik E (1971) The optimal control of partially observable Markov decision processes. Ph.D. thesis, Stanford University, Palo Alto
Papadimitriou CH, Tsitsiklis JN (1987) The complexity of markov decision processes. Math Oper Res 12(3):441–450. http://www.jstor.org/stable/3689975
Madani O, Hanks S, Condon A (1999) On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In: Proceedings of the AAAI ’99/IAAI ’99. American Association for Artificial Intelligence, Menlo Park, CA, USA, pp 541–548
Hauskrecht M (2000) Value-function approximations for partially observable Markov decision processes. J Artif Intell Res 13: 33–94
MathSciNet MATH Google Scholar
Littman ML (1994) Memoryless policies: theoretical limitations and practical results. In: From animals to animats 3: proceedings of the third international conference on simulation of adaptive behavior. MIT, Cambridge, pp 238–245
Hauskrecht M, Hauskrecht M (1997) Planning and control in stochastic domains with imperfect information. Technical reports. Massachusetts Institute of Technology, Cambridge
Google Scholar
Hansen EA (1998) Solving POMDPs by searching in policy space. In: Proceedings of the fourteenth conference on uncertainty in artificial intelligence. San Francisco, CA, USA, pp 211–219
Smith T, Simmons RG (2005) Point-based POMDP algorithms: improved analysis and implementation. In: Proceedings of international conference on uncertainty in artificial intelligence. Edinburgh, Scotland
Spaan M, Vlassis N (2005) Perseus: randomized point-based value iteration for POMDPs. J Artif Intell Res 24: 195–220
MATH Google Scholar
Pineau J, Gordon G, Thrun S (2003) Point-based value iteration: an anytime algorithm for POMDPs. In: Proceedings of international joint conference on artificial intelligence, vol 18. Acapulco, Mexico, pp 1025–1032
Chrisman L (1992) Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: Proceedings of the tenth national conference on artificial intelligence. AAAI, Menlo Park, pp 183–188
Loch J, Singh S (1998) Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: Proceedings of the fifteenth international conference on machine learning. Morgan Kaufmann, San Fransisco, pp 323–331
Mccallum AK (1996) Learning to use selective attention and short-term memory in sequential tasks. In: From animals to animats 4: proceedings of the fourth international conference on simulation of adaptive behavior. MIT, Cambridge, pp 315–324
Peshkin L, Meuleau N, Kaelbling LP (1999) Learning policies with external memory. In: ICML ’99: proceedings of the sixteenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 307–314
Lanzi PL (2000) Adaptive agents with reinforcement learning and internal memory. In: From animals to animats 6: proceedings of the sixth international conference on simulation of adaptive behavior. MIT, Cambridge, pp 333–342
Gomez FJ, Miikkulainen R (1999) Solving non-markovian control tasks with neuroevolution. In: IJCAI’99: proceedings of the 16th international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, pp 1356–1361
Bakker B, Kleij GVdVVd (2000) Trading off perception with internal state: reinforcement learning and analysis of q-elman networks in a markovian task. In: IJCNN ’00: proceedings of the IEEE-INNS-ENNS international joint conference on neural networks (IJCNN’00), Vol 3. IEEE Computer Society, Washington, p 3213
Bakker B (2001) Reinforcement learning with lstm in non-markovian tasks with longterm dependencies, Technical reports. Leiden University, Leiden
Google Scholar
Mccallum RA (1993) Overcoming incomplete perception with utile distinction memory. In: Proceedings of the tenth international conference on machine learning. Morgan Kaufmann, San Fransisco, pp 190–196
Meuleau N, Peshkin L, eung Kim K, Kaelbling LP (1999) Learning finite-state controllers for partially observable environments. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Fransisco, pp 427–436
Doshi-Velez F (2009) The infinite partially observable markov decision process. Neural Inf Process Syst 22: 477–485
Google Scholar
Littman ML, Sutton RS, Singh S (2002) Predictive representations of state. Advances in neural information processing systems 14. MIT, Cambridge, pp 1555–1561
Singh S, Littman ML, Jong NK, Pardoe D, Stone P (2003) Learning predictive state representations. In: Proceedings of the twentieth international conference on machine learning. Washington, DC, USA, pp 712–719
James MR, Singh S (2004) Learning and discovery of predictive state representations in dynamical systems with reset. In: Proceedings of the twenty-first international conference on Machine learning. Banff, Alberta, Canada, pp 417–424
Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1997) Numerical recipes in C: the art of scientific computing, chap 14.3, 2nd edn. Cambridge University Press, Cambridge, pp 623–626
Zheng L, Cho SY, Quek C (2008) A memory-based reinforcement learning algorithm for partially observable Markovian decision processes. In: Proceedings of IEEE world congress on computational intelligence. Hong Kong, pp 800–805
Littman ML, Cassandra AR, Kaelbling LP (1995) Learning policies for partially observable environments: scaling up. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann, San Fransisco, pp 362–370
Shani G, Brafman RI, Shimony SE (2005) Model-based online learning of POMDPs. In: Proceedings of European conference on machine learning. Porto, Portugal
Mccallum RA (1995) Instance-based utile distinctions for reinforcement learning with hidden state. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann, San Fransisco, pp 387–395

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
Lei Zheng & Siu-Yeung Cho

Authors

Lei Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Siu-Yeung Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siu-Yeung Cho.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, L., Cho, SY. A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems. Neural Process Lett 33, 187–200 (2011). https://doi.org/10.1007/s11063-011-9172-2

Download citation

Published: 19 February 2011
Issue Date: April 2011
DOI: https://doi.org/10.1007/s11063-011-9172-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation