Adaptive Look-Ahead Planning

Thrun, Sebastian; Möller, Knut; Linden, Alexander

doi:10.1007/978-3-642-76070-9_29

Adaptive Look-Ahead Planning

Sebastian Thrun^2,3,
Knut Möller³ &
Alexander Linden²

Conference paper

64 Accesses
1 Citations

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 252))

Abstract

We present a new adaptive connectionist planning method. By interaction with an environment a world model is progressively constructed using the backpropagation learning algorithm. The planner constructs a look-ahead plan by iteratively using this model to predict future reinforcements. Future reinforcement is maximized to derive suboptimal plans, thus determining good actions directly from the knowledge of the model network (strategic level). This is done by gradient descent in action space.

The problem of finding good initial plans is solved by the use of an “experience” network (intuition level). The appropriateness of this planning method for finding suboptimal actions in unknown environments is demonstrated with a target tracking problem.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C. W. Anderson. Learning and problem solving with multilayer connectionist systems. Technical Report COINS TR 86–50, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA, 1986.
Google Scholar
A. G. Barto. Connectionist learning for control: An overview. Technical Report COINS TR 89–89, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA, September 1989.
Google Scholar
J. L. Elman. Finding structure in time. Technical Report CRL Technical Report 8801, Center for Research in Language, University of California, San Diego, 1988.
Google Scholar
M. Gherrity. A learning algorithm for analog, fully recurrent neural networks. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.
Google Scholar
M. I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings of the Conference on Cognitive Science, 1986.
Google Scholar
M. I. Jordan. Generic constraints on unspecified target constraints. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.
Google Scholar
J. Kindermann and A. Linden. Inversion of neural nets. Journal of Parallel Computing,1990. (to appear).
Google Scholar
A. Linden and J. Kindermann. Inversion of multilayer nets. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE.
Google Scholar
M. C. Mozer. A focused backpropagation algorithm for temporal pattern recognition. Technical Report CRGTR-88–3, Depts. of Psychology and Computer Science, University of Toronto, Toronto, Jun 1988.
Google Scholar
P. Munro. A dual backpropagation scheme for scalar-reward learning. In Ninth Annual Conference of the Cognitive Science Society,pages 165–176, Hillsdale, NJ, 1987. Cognitive Science Society, Lawrence Erlbaum.
Google Scholar
D. Nguyen and B. Widrow. The truck backer-upper: An example of self-learning in neural networks. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC, San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.
Google Scholar
N. J. Nilsson. Principles of Artificial Intelligence. Springer Verlag, Berlin, 1982.
MATH Google Scholar
B. A. Pearlmutter. Learning state space trajectories in recurrent neural networks. Technical Report CMU-CS88–191, Carnegie Mellon University, 1988.
Google Scholar
A. J. Robinson. Dynamic Error Propagation Networks. PhD thesis, Cambridge University Engineering Dept., Cambridge, UK, February 1989.
Google Scholar
A. J. Robinson and F. Fallside. Dynamic reinforcement driven error propagation networks with application to game playing. to be presented at the Eleventh Annual Conference of the Cognitive Science Society, Ann Arbor, 1989.
Google Scholar
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing. Vol. I + II. MIT Press, 1986.
Google Scholar
R. S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, 1984.
Google Scholar
S. Thrun. A general feed-forward algorithm for gradient-descent in neural networks. Technical Report In press, GMD, Sankt Augustin, FRG, 1990.
Google Scholar
S. Thrun and A. Linden. Inversion in time. In Proceedings of the EURASIP Workshop on Neural Networks, Sesimbra, Portugal, February 15–17. EURASIP, 1990.
Google Scholar
P. J. Werbos. Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man, and Cybernetics, SMC 17: 7–19, 1987.
Article Google Scholar
R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Technical Report ICS Report 8805, Institute for Cognitive Science, University of California, San Diego, CA, 1988.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computer Science, German National Research, D-5205, St. Augustin, Postfach 1240, Germany
Sebastian Thrun & Alexander Linden
Department of Computer Science, University of Bonn, D-5300, Bonn, Römerstr. 164, Germany
Sebastian Thrun & Knut Möller

Authors

Sebastian Thrun
View author publications
You can also search for this author in PubMed Google Scholar
Knut Möller
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Linden
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Medizinische Kybernetik und Artificial Intelligence, Universität Wien, Freyung 6/2, A-1010, Wien, Österreich
Georg Dorffner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thrun, S., Möller, K., Linden, A. (1990). Adaptive Look-Ahead Planning. In: Dorffner, G. (eds) Konnektionismus in Artificial Intelligence und Kognitionsforschung. Informatik-Fachberichte, vol 252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76070-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-76070-9_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-53131-9
Online ISBN: 978-3-642-76070-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics