Abstract
The field of reinforcement learning (RL) has achieved great strides in learning control knowledge from closed-loop interaction with environments. “Classical” RL, based on atomic state space representations, suffers from an inability to adapt to nonstationarities in the target Markov decision process (i.e., environment). Relational RL is widely seen as being a potential solution to this shortcoming. In this paper, we demonstrate a class of “pseudo-relational” learning methods for nonstationary navigational RL domains – domains in which the location of the goal, or even the structure of the environment, can change over time. Our approach is closely related to deictic representations, which have previously been found to be troublesome for RL. The key insight of this paper is that navigational problems are a highly constrained class of MDP, possessing a strong native topology that relaxes some of the partial observability difficulties arising from deixis. Agents can employ local information that is relevant to their near-term action choices to act effectively. We demonstrate that, unlike an atomic representation, our agents can learn to fluidly adapt to changing goal locations and environment structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Finney, S., Gardiol, N.H., Kaelbling, L.P., Oates, T.: The thing that we tried didn’t work very well: Deictic representation in reinforcement learning. In: UAI-2002 (2002)
McCallum, A.: Overcoming incomplete perception with utile distinction memory. In: ICML-93 (1993)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. In: Optimization and neural computation series, Athena Scientific, Belmont (1996)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
Mahadevan, S.: Proto-value functions: Developmental reinforcement learning. In: ICML-2005 (2005)
Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43(1–2), 7–52 (2001)
van Otterlo, M.: A survey of reinforcement learning in relational domains. Technical Report TR-CTIT-05-31, University of Twente, Centre for Telematics and Information Technology (July 2005)
Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias: Solving relational markov decision processes. Journal of Artificial Intelligence Research 25, 75–118 (2006)
Dean, T., Givan, R.: Model minimization in Markov decision processes. In: AAAI-97, pp. 106–111 (1997)
Ravindran, B., Barto, A.G.: Relativized options: Choosing the right transformation. In: ICML-2003, pp. 608–615 (2003)
Ravindran, B.: An Algebraic Approach to Abstraction in Reinforcement Learning. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, MA (2004)
Tash, J., Russell, S.: Control strategies for a stochastic planner. In: AAAI-94 (1994)
Dean, T., Kaelbling, L.P., Kirman, J., Nicholson, A.: Planning under time constraints in stochastic domains. Artificial Intelligence 76 (1995)
Baum, J., Nicholson, A.E.: Dynamic non-uniform abstractions for approximate planning in large structured stochastic domains. Technical Report 1998/18, School of Computer Science and Software Engineering, Monash University, Melbourne (1998)
Glaubius, R., Smart, W.D.: Manifold representations for value-function approximation in reinforcement learning. Technical Report 05-19, Department of Computer Science and Engineering, Washington University in St. Louis (2005)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York (1994)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Lane, T., Ridens, M., Stevens, S. (2007). Reinforcement Learning in Nonstationary Environment Navigation Tasks. In: Kobti, Z., Wu, D. (eds) Advances in Artificial Intelligence. Canadian AI 2007. Lecture Notes in Computer Science(), vol 4509. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72665-4_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-72665-4_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72664-7
Online ISBN: 978-3-540-72665-4
eBook Packages: Computer ScienceComputer Science (R0)