On discontinuous Q-Functions in reinforcement learning

Linden, Alexander

doi:10.1007/BFb0019005

Alexander Linden¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 671))

202 Accesses

Abstract

This paper considers the application of reinforcement learning to path finding tasks in continuous state space in the presence of obstacles. We show that cumulative evaluation functions (as Q-Functions [28] and V-Functions [4]) may be discontinuous if forbidden regions (as implied by obstacles) exist in state space. As the infinite number of states requires the use of function approximators such as backpropagation nets [16, 12, 24], we argue that these discontinuities imply severe difficulties in learning cumulative evaluation functions. The discontinuities we detected might also explain why recent applications of reinforcement learning systems to complex tasks [12] failed to show desired performance. In our conclusion, we outline some ideas to circumvent the problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C. W. Anderson. Learning and problem solving with multilayer connectionist systems. Technical Report COINS TR 86-50, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA, 1986.
Google Scholar
A. G. Barto. Simulation Experiments with Goal-Seeking Adaptive Elements. COINS, Amherst, Massachusetts, 1984. AFWAL-TR-84-1022.
Google Scholar
A. G. Barto, S. J. Bradtke, and S. P. Singh. Real-time learning and control using asynchronous dynamic programming. Technical report, University of Massachusetts, Departement of Computer Science, Amherst MA 01003, August 1991.
Google Scholar
A. G. Barto, R. S. Sutton, and C. J. C. H. Watkins. Learning and sequential decision making. Technical Report COINS 89-95, Department of Computer Science, University of Massachusetts, MA, September 1989.
Google Scholar
A. G. Barto, R. S. Sutton, and C. J. C. H. Watkins. Learning and sequential decision making. In M. Gabriel and J. W. Moore, editors, Learning and Computational Neuroscience, pages 539–602. MIT Press, Massachusetts, 1990.
Google Scholar
R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, New Jersey, 1957.
Google Scholar
P. Dayan. The convergence of TD(λ) for general λ. Machine Learning Journal, 8(3/4), May 1992. Special Issue on Reinforcement Learning.
Google Scholar
D. Fox, V. Heinze, K. Möller, and S. B. Thrun. Learning by error driven decomposition. In Proc. of NeuroNimes, France, 1991.
Google Scholar
G. E. Hinton. Connectionist learning procedures. Artificial Intelligence, 40:185–234, 1989.
Article Google Scholar
R. E. Korf. Real-time heuristic search: New results. In AAAI-88, pages 139–143, 1988.
Google Scholar
P. R. Kumar. A survery of some results in stochastic adaptive control. SIAM Journal of Control and Optimization, 23:329–380, 1985.
Article Google Scholar
L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning Journal, 8(3/4), 1992. Special Issue on Reinforcement Learning.
Google Scholar
A. Linden. Untersuchung von Backpropagation in konnektionistischen Systemen. Diplomarbeit, Universität Bonn, Informatik-Institutsbericht Nr. 80, 1990.
Google Scholar
J. del R. Millán and C. Torras. A reinforcement connectionist approach to robot path finding in non-maze-like environments. Machine Learning Journal, 8(3/4), May 1992. Special Issue on Reinforcement Learning.
Google Scholar
M. Minsky. Steps toward artificial intelligence. In E.A. Feigenbaum and J. Feldman, editors, Computers and Thought, pages 406–450. McGraw-Hill, 1961.
Google Scholar
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing. Vol. I + II. MIT Press, 1986.
Google Scholar
A. L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, pages 210–229, 1959. Reprinted in E.Ã. Feigenbaum and J. Feldman (Eds.) 1963, Computers and Thought, McGraw-Hill, New York.
Google Scholar
A. L. Samuel. Some studies in machine learning using the game of checkers. ii — recent progress. IBM Journal on Research and Development, pages 601–617, 1967.
Google Scholar
F. J. Śmieja and H. Mühlenbein. Reflective modular neural network systems. Technical Report 633, GMD, Sankt Augustin, Germany, February 1992.
Google Scholar
F. J. Śmieja. Multiple network systems (MINOS) modules: Task division and module discrimination. In Proceedings of the 8th AISB conference on Artificial Intelligence, Leeds, 16–19 April, 1991, 1991.
Google Scholar
R. S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, 1984.
Google Scholar
R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.
Google Scholar
R. S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. To appear in: Proceedings of the Seventh International Conference on Machine Learning, June 1990, 1990.
Google Scholar
R. S. Sutton, A. G. Barto, and R. J. Williams. Reinforcement Learning is Direct Adaptive Optimal Control. In Proceedings of the 1991 American Control Conference, 1991.
Google Scholar
G. Tesauro. Practical issues in temporal difference learning. Machine Learning Journal, 8(3/4), 1992. Special Issue on Reinforcement Learning.
Google Scholar
S. B. Thrun. Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, Pennsylvania, January 1992.
Google Scholar
C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, 1989.
Google Scholar
C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning Journal, 8(3/4), May 1992. Special Issue on Reinforcement Learning.
Google Scholar
P. Werbos. Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks, 3:179–189, 1990.
Article Google Scholar
P. Werbos and J. Titus. An empirical test of new forecasting methods derived from a theory of intelligence: The predicition of conflict in latin america. IEEE Transactions on Systems, Man, and Cybernetics, SMC-8:657–666, 1978.
Google Scholar
Paul J. Werbos. Backpropagation and neurocontrol: A review and prospectus. In Proceedings of IJCNN89 Washington, pages I 209–216, 1989.
Google Scholar
S. D. Whitehead. A study of cooperative mechanisms for faster reinforcement learning. Technical Report 365, University of Rochester, Computer Science Department, Rochester, NY, March 1991.
Google Scholar
S. D. Whitehead and D. H. Ballard. Active perception and reinforcement learning. Neural Computation, 2:409–419, 1990.
Google Scholar
R. J. Williams. Reinforcement-learning connectionist systems. Technical Report NU-CCS-87-3, College of Computer Science, Northeastern University, Boston, 1987.
Google Scholar

Download references

Author information

Authors and Affiliations

AI Research Division, German National Research Center for Computer Science (GMD), P. O. Box 1316, W-5205, Sankt Augustin, Germany
Alexander Linden

Authors

Alexander Linden
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hans Jürgen Ohlbach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Linden, A. (1993). On discontinuous Q-Functions in reinforcement learning. In: Jürgen Ohlbach, H. (eds) GWAI-92: Advances in Artificial Intelligence. Lecture Notes in Computer Science, vol 671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0019005

Download citation

DOI: https://doi.org/10.1007/BFb0019005
Published: 13 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56667-0
Online ISBN: 978-3-540-47626-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics