Skip to main content

Adaptive Look-Ahead Planning

  • Conference paper

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 252))

Abstract

We present a new adaptive connectionist planning method. By interaction with an environment a world model is progressively constructed using the backpropagation learning algorithm. The planner constructs a look-ahead plan by iteratively using this model to predict future reinforcements. Future reinforcement is maximized to derive suboptimal plans, thus determining good actions directly from the knowledge of the model network (strategic level). This is done by gradient descent in action space.

The problem of finding good initial plans is solved by the use of an “experience” network (intuition level). The appropriateness of this planning method for finding suboptimal actions in unknown environments is demonstrated with a target tracking problem.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. W. Anderson. Learning and problem solving with multilayer connectionist systems. Technical Report COINS TR 86–50, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA, 1986.

    Google Scholar 

  2. A. G. Barto. Connectionist learning for control: An overview. Technical Report COINS TR 89–89, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA, September 1989.

    Google Scholar 

  3. J. L. Elman. Finding structure in time. Technical Report CRL Technical Report 8801, Center for Research in Language, University of California, San Diego, 1988.

    Google Scholar 

  4. M. Gherrity. A learning algorithm for analog, fully recurrent neural networks. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.

    Google Scholar 

  5. M. I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings of the Conference on Cognitive Science, 1986.

    Google Scholar 

  6. M. I. Jordan. Generic constraints on unspecified target constraints. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.

    Google Scholar 

  7. J. Kindermann and A. Linden. Inversion of neural nets. Journal of Parallel Computing,1990. (to appear).

    Google Scholar 

  8. A. Linden and J. Kindermann. Inversion of multilayer nets. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE.

    Google Scholar 

  9. M. C. Mozer. A focused backpropagation algorithm for temporal pattern recognition. Technical Report CRGTR-88–3, Depts. of Psychology and Computer Science, University of Toronto, Toronto, Jun 1988.

    Google Scholar 

  10. P. Munro. A dual backpropagation scheme for scalar-reward learning. In Ninth Annual Conference of the Cognitive Science Society,pages 165–176, Hillsdale, NJ, 1987. Cognitive Science Society, Lawrence Erlbaum.

    Google Scholar 

  11. D. Nguyen and B. Widrow. The truck backer-upper: An example of self-learning in neural networks. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC, San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.

    Google Scholar 

  12. N. J. Nilsson. Principles of Artificial Intelligence. Springer Verlag, Berlin, 1982.

    MATH  Google Scholar 

  13. B. A. Pearlmutter. Learning state space trajectories in recurrent neural networks. Technical Report CMU-CS88–191, Carnegie Mellon University, 1988.

    Google Scholar 

  14. A. J. Robinson. Dynamic Error Propagation Networks. PhD thesis, Cambridge University Engineering Dept., Cambridge, UK, February 1989.

    Google Scholar 

  15. A. J. Robinson and F. Fallside. Dynamic reinforcement driven error propagation networks with application to game playing. to be presented at the Eleventh Annual Conference of the Cognitive Science Society, Ann Arbor, 1989.

    Google Scholar 

  16. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing. Vol. I + II. MIT Press, 1986.

    Google Scholar 

  17. R. S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, 1984.

    Google Scholar 

  18. S. Thrun. A general feed-forward algorithm for gradient-descent in neural networks. Technical Report In press, GMD, Sankt Augustin, FRG, 1990.

    Google Scholar 

  19. S. Thrun and A. Linden. Inversion in time. In Proceedings of the EURASIP Workshop on Neural Networks, Sesimbra, Portugal, February 15–17. EURASIP, 1990.

    Google Scholar 

  20. P. J. Werbos. Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man, and Cybernetics, SMC 17: 7–19, 1987.

    Article  Google Scholar 

  21. R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Technical Report ICS Report 8805, Institute for Cognitive Science, University of California, San Diego, CA, 1988.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1990 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Thrun, S., Möller, K., Linden, A. (1990). Adaptive Look-Ahead Planning. In: Dorffner, G. (eds) Konnektionismus in Artificial Intelligence und Kognitionsforschung. Informatik-Fachberichte, vol 252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76070-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-76070-9_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-53131-9

  • Online ISBN: 978-3-642-76070-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics