Abstract
We present a new adaptive connectionist planning method. By interaction with an environment a world model is progressively constructed using the backpropagation learning algorithm. The planner constructs a look-ahead plan by iteratively using this model to predict future reinforcements. Future reinforcement is maximized to derive suboptimal plans, thus determining good actions directly from the knowledge of the model network (strategic level). This is done by gradient descent in action space.
The problem of finding good initial plans is solved by the use of an “experience” network (intuition level). The appropriateness of this planning method for finding suboptimal actions in unknown environments is demonstrated with a target tracking problem.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
C. W. Anderson. Learning and problem solving with multilayer connectionist systems. Technical Report COINS TR 86–50, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA, 1986.
A. G. Barto. Connectionist learning for control: An overview. Technical Report COINS TR 89–89, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA, September 1989.
J. L. Elman. Finding structure in time. Technical Report CRL Technical Report 8801, Center for Research in Language, University of California, San Diego, 1988.
M. Gherrity. A learning algorithm for analog, fully recurrent neural networks. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.
M. I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings of the Conference on Cognitive Science, 1986.
M. I. Jordan. Generic constraints on unspecified target constraints. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.
J. Kindermann and A. Linden. Inversion of neural nets. Journal of Parallel Computing,1990. (to appear).
A. Linden and J. Kindermann. Inversion of multilayer nets. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE.
M. C. Mozer. A focused backpropagation algorithm for temporal pattern recognition. Technical Report CRGTR-88–3, Depts. of Psychology and Computer Science, University of Toronto, Toronto, Jun 1988.
P. Munro. A dual backpropagation scheme for scalar-reward learning. In Ninth Annual Conference of the Cognitive Science Society,pages 165–176, Hillsdale, NJ, 1987. Cognitive Science Society, Lawrence Erlbaum.
D. Nguyen and B. Widrow. The truck backer-upper: An example of self-learning in neural networks. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC, San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.
N. J. Nilsson. Principles of Artificial Intelligence. Springer Verlag, Berlin, 1982.
B. A. Pearlmutter. Learning state space trajectories in recurrent neural networks. Technical Report CMU-CS88–191, Carnegie Mellon University, 1988.
A. J. Robinson. Dynamic Error Propagation Networks. PhD thesis, Cambridge University Engineering Dept., Cambridge, UK, February 1989.
A. J. Robinson and F. Fallside. Dynamic reinforcement driven error propagation networks with application to game playing. to be presented at the Eleventh Annual Conference of the Cognitive Science Society, Ann Arbor, 1989.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing. Vol. I + II. MIT Press, 1986.
R. S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, 1984.
S. Thrun. A general feed-forward algorithm for gradient-descent in neural networks. Technical Report In press, GMD, Sankt Augustin, FRG, 1990.
S. Thrun and A. Linden. Inversion in time. In Proceedings of the EURASIP Workshop on Neural Networks, Sesimbra, Portugal, February 15–17. EURASIP, 1990.
P. J. Werbos. Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man, and Cybernetics, SMC 17: 7–19, 1987.
R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Technical Report ICS Report 8805, Institute for Cognitive Science, University of California, San Diego, CA, 1988.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1990 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thrun, S., Möller, K., Linden, A. (1990). Adaptive Look-Ahead Planning. In: Dorffner, G. (eds) Konnektionismus in Artificial Intelligence und Kognitionsforschung. Informatik-Fachberichte, vol 252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76070-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-76070-9_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-53131-9
Online ISBN: 978-3-642-76070-9
eBook Packages: Springer Book Archive