Abstract
This paper considers the application of reinforcement learning to path finding tasks in continuous state space in the presence of obstacles. We show that cumulative evaluation functions (as Q-Functions [28] and V-Functions [4]) may be discontinuous if forbidden regions (as implied by obstacles) exist in state space. As the infinite number of states requires the use of function approximators such as backpropagation nets [16, 12, 24], we argue that these discontinuities imply severe difficulties in learning cumulative evaluation functions. The discontinuities we detected might also explain why recent applications of reinforcement learning systems to complex tasks [12] failed to show desired performance. In our conclusion, we outline some ideas to circumvent the problem.
Preview
Unable to display preview. Download preview PDF.
References
C. W. Anderson. Learning and problem solving with multilayer connectionist systems. Technical Report COINS TR 86-50, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA, 1986.
A. G. Barto. Simulation Experiments with Goal-Seeking Adaptive Elements. COINS, Amherst, Massachusetts, 1984. AFWAL-TR-84-1022.
A. G. Barto, S. J. Bradtke, and S. P. Singh. Real-time learning and control using asynchronous dynamic programming. Technical report, University of Massachusetts, Departement of Computer Science, Amherst MA 01003, August 1991.
A. G. Barto, R. S. Sutton, and C. J. C. H. Watkins. Learning and sequential decision making. Technical Report COINS 89-95, Department of Computer Science, University of Massachusetts, MA, September 1989.
A. G. Barto, R. S. Sutton, and C. J. C. H. Watkins. Learning and sequential decision making. In M. Gabriel and J. W. Moore, editors, Learning and Computational Neuroscience, pages 539–602. MIT Press, Massachusetts, 1990.
R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, New Jersey, 1957.
P. Dayan. The convergence of TD(λ) for general λ. Machine Learning Journal, 8(3/4), May 1992. Special Issue on Reinforcement Learning.
D. Fox, V. Heinze, K. Möller, and S. B. Thrun. Learning by error driven decomposition. In Proc. of NeuroNimes, France, 1991.
G. E. Hinton. Connectionist learning procedures. Artificial Intelligence, 40:185–234, 1989.
R. E. Korf. Real-time heuristic search: New results. In AAAI-88, pages 139–143, 1988.
P. R. Kumar. A survery of some results in stochastic adaptive control. SIAM Journal of Control and Optimization, 23:329–380, 1985.
L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning Journal, 8(3/4), 1992. Special Issue on Reinforcement Learning.
A. Linden. Untersuchung von Backpropagation in konnektionistischen Systemen. Diplomarbeit, Universität Bonn, Informatik-Institutsbericht Nr. 80, 1990.
J. del R. Millán and C. Torras. A reinforcement connectionist approach to robot path finding in non-maze-like environments. Machine Learning Journal, 8(3/4), May 1992. Special Issue on Reinforcement Learning.
M. Minsky. Steps toward artificial intelligence. In E.A. Feigenbaum and J. Feldman, editors, Computers and Thought, pages 406–450. McGraw-Hill, 1961.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing. Vol. I + II. MIT Press, 1986.
A. L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, pages 210–229, 1959. Reprinted in E.Ã. Feigenbaum and J. Feldman (Eds.) 1963, Computers and Thought, McGraw-Hill, New York.
A. L. Samuel. Some studies in machine learning using the game of checkers. ii — recent progress. IBM Journal on Research and Development, pages 601–617, 1967.
F. J. Śmieja and H. Mühlenbein. Reflective modular neural network systems. Technical Report 633, GMD, Sankt Augustin, Germany, February 1992.
F. J. Śmieja. Multiple network systems (MINOS) modules: Task division and module discrimination. In Proceedings of the 8th AISB conference on Artificial Intelligence, Leeds, 16–19 April, 1991, 1991.
R. S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, 1984.
R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.
R. S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. To appear in: Proceedings of the Seventh International Conference on Machine Learning, June 1990, 1990.
R. S. Sutton, A. G. Barto, and R. J. Williams. Reinforcement Learning is Direct Adaptive Optimal Control. In Proceedings of the 1991 American Control Conference, 1991.
G. Tesauro. Practical issues in temporal difference learning. Machine Learning Journal, 8(3/4), 1992. Special Issue on Reinforcement Learning.
S. B. Thrun. Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, Pennsylvania, January 1992.
C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, 1989.
C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning Journal, 8(3/4), May 1992. Special Issue on Reinforcement Learning.
P. Werbos. Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks, 3:179–189, 1990.
P. Werbos and J. Titus. An empirical test of new forecasting methods derived from a theory of intelligence: The predicition of conflict in latin america. IEEE Transactions on Systems, Man, and Cybernetics, SMC-8:657–666, 1978.
Paul J. Werbos. Backpropagation and neurocontrol: A review and prospectus. In Proceedings of IJCNN89 Washington, pages I 209–216, 1989.
S. D. Whitehead. A study of cooperative mechanisms for faster reinforcement learning. Technical Report 365, University of Rochester, Computer Science Department, Rochester, NY, March 1991.
S. D. Whitehead and D. H. Ballard. Active perception and reinforcement learning. Neural Computation, 2:409–419, 1990.
R. J. Williams. Reinforcement-learning connectionist systems. Technical Report NU-CCS-87-3, College of Computer Science, Northeastern University, Boston, 1987.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Linden, A. (1993). On discontinuous Q-Functions in reinforcement learning. In: Jürgen Ohlbach, H. (eds) GWAI-92: Advances in Artificial Intelligence. Lecture Notes in Computer Science, vol 671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0019005
Download citation
DOI: https://doi.org/10.1007/BFb0019005
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56667-0
Online ISBN: 978-3-540-47626-9
eBook Packages: Springer Book Archive