Skip to main content
Log in

A Teaching Strategy for Memory-Based Control

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic algorithms can be used in conjunction with lazy learning to solve examples of a difficult class of delayed reinforcement learning problems better than either method alone. This class, the class of differential games, includes numerous important control problems that arise in robotics, planning, game playing, and other areas, and solutions for differential games suggest solution strategies for the general class of planning and control problems. We conducted a series of experiments applying three learning approaches – lazy Q-learning, k-nearest neighbor (k-NN), and a genetic algorithm – to a particular differential game called a pursuit game. Our experiments demonstrate that k-NN had great difficulty solving the problem, while a lazy version of Q-learning performed moderately well and the genetic algorithm performed even better. These results motivated the next step in the experiments, where we hypothesized k-NN was having difficulty because it did not have good examples – a common source of difficulty for lazy learning. Therefore, we used the genetic algorithm as a bootstrapping method for k-NN to create a system to provide these examples. Our experiments demonstrate that the resulting joint system learned to solve the pursuit games with a high degree of accuracy – outperforming either method alone – and with relatively small memory requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aha, D. & Salzberg, S. (1993). Learning to catch: Applying nearest neighbor algorithms to dynamic control tasks. In Proceedings of the Fourth International Workshop on AI and Statistics, pp. 363–368. Ft. Launderdale.

  • Aha, D. W. (1992). Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies 16: 267–287.

    Google Scholar 

  • Atkeson, C. (1990). Using local models to control movement. In Touretzky, D. S. (ed.), Advances in Neural Information Processing Systems 2, 316–323. San Mateo, CA: Morgan Kaufman.

    Google Scholar 

  • Atkeson, C. G. (1992). Memory-based approaches to approximating continuous functions. In Casdagli, M. & Eubanks, S. (eds.), Nonlinear Modeling and Forecasting, pp. 503–521. Addison Wesley.

  • Barto, A., Sutton, R. & Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 13: 835–846.

    Google Scholar 

  • Barto, A., Sutton, R. & & Watkins, C. (1990). Learning and sequential decision making. In Gabriel & Moore (eds.), Learning and Computational Neuroscience, pp. 539–602. Cambridge: MIT Press.

    Google Scholar 

  • Basar, T. & Olsder, G. J. (1982). Dynamic Noncooperative Game Theory. Academic Press: London.

    Google Scholar 

  • Booker, L., Goldberg, D. & Holland, J. (1989). Classifier systems and genetic algorithms. Artificial Intelligence 40: 235–282.

    Google Scholar 

  • Chapman, D. (1987). Planning for conjunctive goals. Artificial Intelligence 32: 333–377.

    Google Scholar 

  • Clouse, J. & Utgoff, P. (1992). A teaching method for reinforcement learning. In Proceedings of the Ninth International Conference on Machine Learning, pp. 92–101. Aberdeen, Scotland: Morgan Kaufman.

    Google Scholar 

  • Colombetti, M. & Dorigo, M. (1994). Training agents to perform sequential behavior. Adaptive Behavior 2(3): 247–275.

    Google Scholar 

  • Dasarathy, B. V. (ed.) (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. Los Alamitos, CA: IEEE Computer Society Press.

    Google Scholar 

  • Devijver, P. A. (1986). On the editing rate of the multiedit algorithm. Pattern Recognition Letters 4: 9–12.

    Google Scholar 

  • Devijver, P. A. & Kittler, J. (1982). Pattern Recognition: A Statistical Approach. Englewood Cliffs, New Jersey: Prentice-Hall.

    Google Scholar 

  • Dorigo, M. & Colombetti, M. (1994). Robot shaping: Developing autonomous agents through learning. Artificial Intelligence 71(2): 321–370.

    Google Scholar 

  • Friedman, A. (1971). Differential Games. New York: Wiley Interscience.

    Google Scholar 

  • Goldberg, D. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, Massachusetts: Addison-Wesley.

    Google Scholar 

  • Gordon, D. & Subramanian, D. (1993a). A multistrategy learning scheme for agent knowledge acquisition. Informatica 17: 331–346.

    Google Scholar 

  • Gordon, D. & Subramanian, D. (1993b). A multistrategy learning scheme for assimilating advice in embedded agents. In Proceedings of the Second International Workshop on Multistrategy Learning, pp. 218–233. George Mason University.

  • Grefenstette, J. (1988). Credit assignment in rule discovery systems based on genetic algorithms. Machine Learning 3, 225–245.

    Google Scholar 

  • Grefenstette, J. (1991). Lamarkian learning in multi-agent environments. In Proceedings of the Fourth International Conference of Genetic Algorithms, pp. 303–310. Morgan Kaufmann.

  • Grefenstette, J., Ramsey, C. & Schultz, A. (1990). Learning sequential decision rules using simulation models and competition. Machine Learning 5: 355–381.

    Google Scholar 

  • Hart, P. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14(3): 515–516.

    Google Scholar 

  • Holland, J. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor, Michigan: University of Michigan Press.

    Google Scholar 

  • Imado, F. & Ishihara, T. (1993). Pursuit-evasion geometry analysis between two missiles and an aircraft. Computers and Mathematics with Applications 26(3): 125–139.

    Google Scholar 

  • Isaacs, R. (1963). Differential games: A mathematical theory with applications to warfare and other topics. Tech. Rep. Research Contribution No. 1, Center for Naval Analysis, Washington, D.C.

    Google Scholar 

  • Lin, L. (1991). Programming robots using reinforcement learning and teaching. In Proceedings of the Eight National Conference on Artificial Intelligence, pp. 781–786. AAAI Press.

  • Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Machine Conference, pp. 157–163. New Brunswick, NJ: Morgan Kaufmann.

    Google Scholar 

  • McCallum, R. A. (1995). Instance-based state identification for reinforcement learning. In Advances in Neural Information Processing Systems 7, pp. 377–384.

    Google Scholar 

  • Millan, J. & Torras, C. (1992). A reinforcement connectionist approach to robot path finding in non-maze-like environments. Machine Learning 8: 363–395.

    Google Scholar 

  • Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning 4(2): 227–243.

    Google Scholar 

  • Moore, A. (1990). Efficient Memory-Based Learning for Robot Control. Ph.D. thesis, Computer Laboratory, Cambridge University.

  • Moore, A. & Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13: 103–130.

    Google Scholar 

  • Nguyen, D. & Widrow, B. (1989). The truck backer-upper: An example of self learning in neural networks. In Proceedings of the International Joint Conference on Neural Networks, Vol. 2, pp. 357–363.

    Google Scholar 

  • Pell, B. D. (1993). Strategy Generation and Evaluation for Meta-Game Playing. Ph.D. thesis, University of Cambridge, Cambridge, England.

    Google Scholar 

  • Ramsey, C. L. & Grefenstette, J. J. (1994). Case-based anytime learning. In Aha, D. W. (ed.), Case Based Reasoning: Papers from the 1994 Workshop, pp. 91–95. Menlo Park, California: AAAI Press.

    Google Scholar 

  • Ritter, G., Woodruff H., Lowry S. & Isenhour, T. (1975). An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory 21(6): 665–669.

    Google Scholar 

  • Salzberg, S. (1991). Distance metrics for instance-based learning. In Methodologies for Intelligent Systems: 6th International Symposium, pp. 399–408.

  • Salzberg, S., Delcher, A., Heath, D. & Kasif, S. (1991). Learning with a helpful teacher. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pp. 705–511. Sydney, Australia: Morgan Kaufmann.

    Google Scholar 

  • Sheppard, J. W. & Salzberg, S. L. (1993). Memory-based learning of pursuit games. Tech. Rep. JHU-93/94–02, Department of Computer Science, Johns Hopkins University, Baltimore, Maryland. Revised May, 1995.

    Google Scholar 

  • Skalak, D. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Proceedings of the Eleventh International Machine Learning Conference, pp. 293–301. New Brunswick, NJ: Morgan Kaufman.

    Google Scholar 

  • Smith, R. E. & Gray, B. (1993). Co-adaptive genetic algorithms: An example in othello strategy. Tech. Rep. TCGA Report No. 94002, University of Alabama, Tuscaloosa, Alabama.

    Google Scholar 

  • Sutton, R. (1988). Learning to predict my methods of temporal differences. Machine Learning 3: 9–44.

    Google Scholar 

  • Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning 8: 257–277.

    Google Scholar 

  • Tesauro, G. & Sejnowski, T. J. (1989). A parallel network that learns to play backgammon. Artificial Intelligence 39: 357–390.

    Google Scholar 

  • Tomek, I. (1976). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics 6: 448–452.

    Google Scholar 

  • van der Wal, J. (1981). Stochastic Dynamic Programming. Amsterdam: Morgan Kaufmann.

    Google Scholar 

  • Watkins, C. (1989). Learning with Delayed Rewards. Ph.D. thesis, Cambridge University, Department of Computer Science, Cambridge, England.

    Google Scholar 

  • Whitehead, S. (1992). Reinforcement Learning for the Adaptive Control of Perception and Action. Ph.D. thesis, Department of Computer Science, University of Rochester.

  • Widrow, B. (1987). The original adaptive neural net broom-balancer. In International Symposium on Circuits and Systems, pp. 351–357.

  • Wilson, D. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics 2(3): 408–421.

    Google Scholar 

  • Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proceedings of the Ninth International Machine Learning Conference, pp. 470–479. Aberdeen, Scotland: Morgan Kaufman.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sheppard, J.W., Salzberg, S.L. A Teaching Strategy for Memory-Based Control. Artificial Intelligence Review 11, 343–370 (1997). https://doi.org/10.1023/A:1006597715165

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1006597715165

Navigation