Skip to main content

On discontinuous Q-Functions in reinforcement learning

  • Technical Papers
  • Conference paper
  • First Online:
GWAI-92: Advances in Artificial Intelligence

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 671))

  • 202 Accesses

Abstract

This paper considers the application of reinforcement learning to path finding tasks in continuous state space in the presence of obstacles. We show that cumulative evaluation functions (as Q-Functions [28] and V-Functions [4]) may be discontinuous if forbidden regions (as implied by obstacles) exist in state space. As the infinite number of states requires the use of function approximators such as backpropagation nets [16, 12, 24], we argue that these discontinuities imply severe difficulties in learning cumulative evaluation functions. The discontinuities we detected might also explain why recent applications of reinforcement learning systems to complex tasks [12] failed to show desired performance. In our conclusion, we outline some ideas to circumvent the problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. W. Anderson. Learning and problem solving with multilayer connectionist systems. Technical Report COINS TR 86-50, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA, 1986.

    Google Scholar 

  2. A. G. Barto. Simulation Experiments with Goal-Seeking Adaptive Elements. COINS, Amherst, Massachusetts, 1984. AFWAL-TR-84-1022.

    Google Scholar 

  3. A. G. Barto, S. J. Bradtke, and S. P. Singh. Real-time learning and control using asynchronous dynamic programming. Technical report, University of Massachusetts, Departement of Computer Science, Amherst MA 01003, August 1991.

    Google Scholar 

  4. A. G. Barto, R. S. Sutton, and C. J. C. H. Watkins. Learning and sequential decision making. Technical Report COINS 89-95, Department of Computer Science, University of Massachusetts, MA, September 1989.

    Google Scholar 

  5. A. G. Barto, R. S. Sutton, and C. J. C. H. Watkins. Learning and sequential decision making. In M. Gabriel and J. W. Moore, editors, Learning and Computational Neuroscience, pages 539–602. MIT Press, Massachusetts, 1990.

    Google Scholar 

  6. R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, New Jersey, 1957.

    Google Scholar 

  7. P. Dayan. The convergence of TD(λ) for general λ. Machine Learning Journal, 8(3/4), May 1992. Special Issue on Reinforcement Learning.

    Google Scholar 

  8. D. Fox, V. Heinze, K. Möller, and S. B. Thrun. Learning by error driven decomposition. In Proc. of NeuroNimes, France, 1991.

    Google Scholar 

  9. G. E. Hinton. Connectionist learning procedures. Artificial Intelligence, 40:185–234, 1989.

    Article  Google Scholar 

  10. R. E. Korf. Real-time heuristic search: New results. In AAAI-88, pages 139–143, 1988.

    Google Scholar 

  11. P. R. Kumar. A survery of some results in stochastic adaptive control. SIAM Journal of Control and Optimization, 23:329–380, 1985.

    Article  Google Scholar 

  12. L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning Journal, 8(3/4), 1992. Special Issue on Reinforcement Learning.

    Google Scholar 

  13. A. Linden. Untersuchung von Backpropagation in konnektionistischen Systemen. Diplomarbeit, Universität Bonn, Informatik-Institutsbericht Nr. 80, 1990.

    Google Scholar 

  14. J. del R. Millán and C. Torras. A reinforcement connectionist approach to robot path finding in non-maze-like environments. Machine Learning Journal, 8(3/4), May 1992. Special Issue on Reinforcement Learning.

    Google Scholar 

  15. M. Minsky. Steps toward artificial intelligence. In E.A. Feigenbaum and J. Feldman, editors, Computers and Thought, pages 406–450. McGraw-Hill, 1961.

    Google Scholar 

  16. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing. Vol. I + II. MIT Press, 1986.

    Google Scholar 

  17. A. L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, pages 210–229, 1959. Reprinted in E.Ã. Feigenbaum and J. Feldman (Eds.) 1963, Computers and Thought, McGraw-Hill, New York.

    Google Scholar 

  18. A. L. Samuel. Some studies in machine learning using the game of checkers. ii — recent progress. IBM Journal on Research and Development, pages 601–617, 1967.

    Google Scholar 

  19. F. J. Śmieja and H. Mühlenbein. Reflective modular neural network systems. Technical Report 633, GMD, Sankt Augustin, Germany, February 1992.

    Google Scholar 

  20. F. J. Śmieja. Multiple network systems (MINOS) modules: Task division and module discrimination. In Proceedings of the 8th AISB conference on Artificial Intelligence, Leeds, 16–19 April, 1991, 1991.

    Google Scholar 

  21. R. S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, 1984.

    Google Scholar 

  22. R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.

    Google Scholar 

  23. R. S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. To appear in: Proceedings of the Seventh International Conference on Machine Learning, June 1990, 1990.

    Google Scholar 

  24. R. S. Sutton, A. G. Barto, and R. J. Williams. Reinforcement Learning is Direct Adaptive Optimal Control. In Proceedings of the 1991 American Control Conference, 1991.

    Google Scholar 

  25. G. Tesauro. Practical issues in temporal difference learning. Machine Learning Journal, 8(3/4), 1992. Special Issue on Reinforcement Learning.

    Google Scholar 

  26. S. B. Thrun. Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, Pennsylvania, January 1992.

    Google Scholar 

  27. C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, 1989.

    Google Scholar 

  28. C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning Journal, 8(3/4), May 1992. Special Issue on Reinforcement Learning.

    Google Scholar 

  29. P. Werbos. Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks, 3:179–189, 1990.

    Article  Google Scholar 

  30. P. Werbos and J. Titus. An empirical test of new forecasting methods derived from a theory of intelligence: The predicition of conflict in latin america. IEEE Transactions on Systems, Man, and Cybernetics, SMC-8:657–666, 1978.

    Google Scholar 

  31. Paul J. Werbos. Backpropagation and neurocontrol: A review and prospectus. In Proceedings of IJCNN89 Washington, pages I 209–216, 1989.

    Google Scholar 

  32. S. D. Whitehead. A study of cooperative mechanisms for faster reinforcement learning. Technical Report 365, University of Rochester, Computer Science Department, Rochester, NY, March 1991.

    Google Scholar 

  33. S. D. Whitehead and D. H. Ballard. Active perception and reinforcement learning. Neural Computation, 2:409–419, 1990.

    Google Scholar 

  34. R. J. Williams. Reinforcement-learning connectionist systems. Technical Report NU-CCS-87-3, College of Computer Science, Northeastern University, Boston, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hans Jürgen Ohlbach

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Linden, A. (1993). On discontinuous Q-Functions in reinforcement learning. In: Jürgen Ohlbach, H. (eds) GWAI-92: Advances in Artificial Intelligence. Lecture Notes in Computer Science, vol 671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0019005

Download citation

  • DOI: https://doi.org/10.1007/BFb0019005

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56667-0

  • Online ISBN: 978-3-540-47626-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics