Abstract
Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic algorithms can be used in conjunction with lazy learning to solve examples of a difficult class of delayed reinforcement learning problems better than either method alone. This class, the class of differential games, includes numerous important control problems that arise in robotics, planning, game playing, and other areas, and solutions for differential games suggest solution strategies for the general class of planning and control problems. We conducted a series of experiments applying three learning approaches – lazy Q-learning, k-nearest neighbor (k-NN), and a genetic algorithm – to a particular differential game called a pursuit game. Our experiments demonstrate that k-NN had great difficulty solving the problem, while a lazy version of Q-learning performed moderately well and the genetic algorithm performed even better. These results motivated the next step in the experiments, where we hypothesized k-NN was having difficulty because it did not have good examples – a common source of difficulty for lazy learning. Therefore, we used the genetic algorithm as a bootstrapping method for k-NN to create a system to provide these examples. Our experiments demonstrate that the resulting joint system learned to solve the pursuit games with a high degree of accuracy – outperforming either method alone – and with relatively small memory requirements.
Similar content being viewed by others
References
Aha, D. & Salzberg, S. (1993). Learning to catch: Applying nearest neighbor algorithms to dynamic control tasks. In Proceedings of the Fourth International Workshop on AI and Statistics, pp. 363–368. Ft. Launderdale.
Aha, D. W. (1992). Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies 16: 267–287.
Atkeson, C. (1990). Using local models to control movement. In Touretzky, D. S. (ed.), Advances in Neural Information Processing Systems 2, 316–323. San Mateo, CA: Morgan Kaufman.
Atkeson, C. G. (1992). Memory-based approaches to approximating continuous functions. In Casdagli, M. & Eubanks, S. (eds.), Nonlinear Modeling and Forecasting, pp. 503–521. Addison Wesley.
Barto, A., Sutton, R. & Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 13: 835–846.
Barto, A., Sutton, R. & & Watkins, C. (1990). Learning and sequential decision making. In Gabriel & Moore (eds.), Learning and Computational Neuroscience, pp. 539–602. Cambridge: MIT Press.
Basar, T. & Olsder, G. J. (1982). Dynamic Noncooperative Game Theory. Academic Press: London.
Booker, L., Goldberg, D. & Holland, J. (1989). Classifier systems and genetic algorithms. Artificial Intelligence 40: 235–282.
Chapman, D. (1987). Planning for conjunctive goals. Artificial Intelligence 32: 333–377.
Clouse, J. & Utgoff, P. (1992). A teaching method for reinforcement learning. In Proceedings of the Ninth International Conference on Machine Learning, pp. 92–101. Aberdeen, Scotland: Morgan Kaufman.
Colombetti, M. & Dorigo, M. (1994). Training agents to perform sequential behavior. Adaptive Behavior 2(3): 247–275.
Dasarathy, B. V. (ed.) (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. Los Alamitos, CA: IEEE Computer Society Press.
Devijver, P. A. (1986). On the editing rate of the multiedit algorithm. Pattern Recognition Letters 4: 9–12.
Devijver, P. A. & Kittler, J. (1982). Pattern Recognition: A Statistical Approach. Englewood Cliffs, New Jersey: Prentice-Hall.
Dorigo, M. & Colombetti, M. (1994). Robot shaping: Developing autonomous agents through learning. Artificial Intelligence 71(2): 321–370.
Friedman, A. (1971). Differential Games. New York: Wiley Interscience.
Goldberg, D. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, Massachusetts: Addison-Wesley.
Gordon, D. & Subramanian, D. (1993a). A multistrategy learning scheme for agent knowledge acquisition. Informatica 17: 331–346.
Gordon, D. & Subramanian, D. (1993b). A multistrategy learning scheme for assimilating advice in embedded agents. In Proceedings of the Second International Workshop on Multistrategy Learning, pp. 218–233. George Mason University.
Grefenstette, J. (1988). Credit assignment in rule discovery systems based on genetic algorithms. Machine Learning 3, 225–245.
Grefenstette, J. (1991). Lamarkian learning in multi-agent environments. In Proceedings of the Fourth International Conference of Genetic Algorithms, pp. 303–310. Morgan Kaufmann.
Grefenstette, J., Ramsey, C. & Schultz, A. (1990). Learning sequential decision rules using simulation models and competition. Machine Learning 5: 355–381.
Hart, P. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14(3): 515–516.
Holland, J. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor, Michigan: University of Michigan Press.
Imado, F. & Ishihara, T. (1993). Pursuit-evasion geometry analysis between two missiles and an aircraft. Computers and Mathematics with Applications 26(3): 125–139.
Isaacs, R. (1963). Differential games: A mathematical theory with applications to warfare and other topics. Tech. Rep. Research Contribution No. 1, Center for Naval Analysis, Washington, D.C.
Lin, L. (1991). Programming robots using reinforcement learning and teaching. In Proceedings of the Eight National Conference on Artificial Intelligence, pp. 781–786. AAAI Press.
Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Machine Conference, pp. 157–163. New Brunswick, NJ: Morgan Kaufmann.
McCallum, R. A. (1995). Instance-based state identification for reinforcement learning. In Advances in Neural Information Processing Systems 7, pp. 377–384.
Millan, J. & Torras, C. (1992). A reinforcement connectionist approach to robot path finding in non-maze-like environments. Machine Learning 8: 363–395.
Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning 4(2): 227–243.
Moore, A. (1990). Efficient Memory-Based Learning for Robot Control. Ph.D. thesis, Computer Laboratory, Cambridge University.
Moore, A. & Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13: 103–130.
Nguyen, D. & Widrow, B. (1989). The truck backer-upper: An example of self learning in neural networks. In Proceedings of the International Joint Conference on Neural Networks, Vol. 2, pp. 357–363.
Pell, B. D. (1993). Strategy Generation and Evaluation for Meta-Game Playing. Ph.D. thesis, University of Cambridge, Cambridge, England.
Ramsey, C. L. & Grefenstette, J. J. (1994). Case-based anytime learning. In Aha, D. W. (ed.), Case Based Reasoning: Papers from the 1994 Workshop, pp. 91–95. Menlo Park, California: AAAI Press.
Ritter, G., Woodruff H., Lowry S. & Isenhour, T. (1975). An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory 21(6): 665–669.
Salzberg, S. (1991). Distance metrics for instance-based learning. In Methodologies for Intelligent Systems: 6th International Symposium, pp. 399–408.
Salzberg, S., Delcher, A., Heath, D. & Kasif, S. (1991). Learning with a helpful teacher. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pp. 705–511. Sydney, Australia: Morgan Kaufmann.
Sheppard, J. W. & Salzberg, S. L. (1993). Memory-based learning of pursuit games. Tech. Rep. JHU-93/94–02, Department of Computer Science, Johns Hopkins University, Baltimore, Maryland. Revised May, 1995.
Skalak, D. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Proceedings of the Eleventh International Machine Learning Conference, pp. 293–301. New Brunswick, NJ: Morgan Kaufman.
Smith, R. E. & Gray, B. (1993). Co-adaptive genetic algorithms: An example in othello strategy. Tech. Rep. TCGA Report No. 94002, University of Alabama, Tuscaloosa, Alabama.
Sutton, R. (1988). Learning to predict my methods of temporal differences. Machine Learning 3: 9–44.
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning 8: 257–277.
Tesauro, G. & Sejnowski, T. J. (1989). A parallel network that learns to play backgammon. Artificial Intelligence 39: 357–390.
Tomek, I. (1976). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics 6: 448–452.
van der Wal, J. (1981). Stochastic Dynamic Programming. Amsterdam: Morgan Kaufmann.
Watkins, C. (1989). Learning with Delayed Rewards. Ph.D. thesis, Cambridge University, Department of Computer Science, Cambridge, England.
Whitehead, S. (1992). Reinforcement Learning for the Adaptive Control of Perception and Action. Ph.D. thesis, Department of Computer Science, University of Rochester.
Widrow, B. (1987). The original adaptive neural net broom-balancer. In International Symposium on Circuits and Systems, pp. 351–357.
Wilson, D. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics 2(3): 408–421.
Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proceedings of the Ninth International Machine Learning Conference, pp. 470–479. Aberdeen, Scotland: Morgan Kaufman.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sheppard, J.W., Salzberg, S.L. A Teaching Strategy for Memory-Based Control. Artificial Intelligence Review 11, 343–370 (1997). https://doi.org/10.1023/A:1006597715165
Issue Date:
DOI: https://doi.org/10.1023/A:1006597715165