Reinforcement Learning in Nonstationary Environment Navigation Tasks

Lane, Terran; Ridens, Martin; Stevens, Scott

doi:10.1007/978-3-540-72665-4_37

Terran Lane¹,
Martin Ridens¹ &
Scott Stevens¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4509))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

1043 Accesses
4 Citations

Abstract

The field of reinforcement learning (RL) has achieved great strides in learning control knowledge from closed-loop interaction with environments. “Classical” RL, based on atomic state space representations, suffers from an inability to adapt to nonstationarities in the target Markov decision process (i.e., environment). Relational RL is widely seen as being a potential solution to this shortcoming. In this paper, we demonstrate a class of “pseudo-relational” learning methods for nonstationary navigational RL domains – domains in which the location of the goal, or even the structure of the environment, can change over time. Our approach is closely related to deictic representations, which have previously been found to be troublesome for RL. The key insight of this paper is that navigational problems are a highly constrained class of MDP, possessing a strong native topology that relaxes some of the partial observability difficulties arising from deixis. Agents can employ local information that is relevant to their near-term action choices to act effectively. We demonstrate that, unlike an atomic representation, our agents can learn to fluidly adapt to changing goal locations and environment structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Finney, S., Gardiol, N.H., Kaelbling, L.P., Oates, T.: The thing that we tried didn’t work very well: Deictic representation in reinforcement learning. In: UAI-2002 (2002)
Google Scholar
McCallum, A.: Overcoming incomplete perception with utile distinction memory. In: ICML-93 (1993)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. In: Optimization and neural computation series, Athena Scientific, Belmont (1996)
Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
Article MathSciNet Google Scholar
Mahadevan, S.: Proto-value functions: Developmental reinforcement learning. In: ICML-2005 (2005)
Google Scholar
Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43(1–2), 7–52 (2001)
Article Google Scholar
van Otterlo, M.: A survey of reinforcement learning in relational domains. Technical Report TR-CTIT-05-31, University of Twente, Centre for Telematics and Information Technology (July 2005)
Google Scholar
Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias: Solving relational markov decision processes. Journal of Artificial Intelligence Research 25, 75–118 (2006)
MathSciNet Google Scholar
Dean, T., Givan, R.: Model minimization in Markov decision processes. In: AAAI-97, pp. 106–111 (1997)
Google Scholar
Ravindran, B., Barto, A.G.: Relativized options: Choosing the right transformation. In: ICML-2003, pp. 608–615 (2003)
Google Scholar
Ravindran, B.: An Algebraic Approach to Abstraction in Reinforcement Learning. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, MA (2004)
Google Scholar
Tash, J., Russell, S.: Control strategies for a stochastic planner. In: AAAI-94 (1994)
Google Scholar
Dean, T., Kaelbling, L.P., Kirman, J., Nicholson, A.: Planning under time constraints in stochastic domains. Artificial Intelligence 76 (1995)
Google Scholar
Baum, J., Nicholson, A.E.: Dynamic non-uniform abstractions for approximate planning in large structured stochastic domains. Technical Report 1998/18, School of Computer Science and Software Engineering, Monash University, Melbourne (1998)
Google Scholar
Glaubius, R., Smart, W.D.: Manifold representations for value-function approximation in reinforcement learning. Technical Report 05-19, Department of Computer Science and Engineering, Washington University in St. Louis (2005)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York (1994)
MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of New Mexico,
Terran Lane, Martin Ridens & Scott Stevens

Authors

Terran Lane
View author publications
You can also search for this author in PubMed Google Scholar
Martin Ridens
View author publications
You can also search for this author in PubMed Google Scholar
Scott Stevens
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ziad Kobti Dan Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lane, T., Ridens, M., Stevens, S. (2007). Reinforcement Learning in Nonstationary Environment Navigation Tasks. In: Kobti, Z., Wu, D. (eds) Advances in Artificial Intelligence. Canadian AI 2007. Lecture Notes in Computer Science(), vol 4509. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72665-4_37

Download citation

DOI: https://doi.org/10.1007/978-3-540-72665-4_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72664-7
Online ISBN: 978-3-540-72665-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics