Abstract
In this paper we survey the basics of reinforcement learning, generalization and abstraction. We start with an introduction to the fundamentals of reinforcement learning and motivate the necessity for generalization and abstraction. Next we summarize the most important techniques available to achieve both generalization and abstraction in reinforcement learning. We discuss basic function approximation techniques and delve into hierarchical, relational and transfer learning. All concepts and techniques are illustrated with examples.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Albus, J.S.: Brains, Behavior, and Robotics. Byte Books, Peterborough (1981)
Barto, A., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems: Theory and Application 13(4), 341–379 (2003)
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Bengio, Y., Collobert, J.L.R., Weston, J.: Curriculum learning. In: Proceedings of the Twenty-Sixth International Conference on Machine Learning (June 2009)
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 369–376. MIT Press, Cambridge (1995)
Brafman, R.I., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2003)
Brooks, R.A.: Intelligence without representation. Artificial Intelligence (47), 139–159 (1991)
Caruana, R.: Multitask learning. Machine Learning 28, 41–75 (1997)
Crites, R.H., Barto, A.G.: Improving elevator performance using reinforcement learning. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 1017–1023. MIT Press, Cambridge (1996)
Croonenborghs, T., Driessens, K., Bruynooghe, M.: Learning relational options for inductive transfer in relational reinforcement learning. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 88–97. Springer, Heidelberg (2008)
Dean, T., Givan, R.: Model minimization in Markov decision processes. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 106–111 (1997)
Dietterich, T.: An overview of MAXQ hierarchical reinforcement learning. In: Choueiry, B.Y., Walsh, T. (eds.) SARA 2000. LNCS (LNAI), vol. 1864, pp. 26–44. Springer, Heidelberg (2000)
Driessens, K.: Relational Reinforcement Learning. PhD thesis, DEPTCW (2004), http://www.cs.kuleuven.be/publicaties/doctoraten/cw/CW2004_05.abs.html
Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43, 7–52 (2001)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)
Giunchiglia, F., Walsh, T.: A theory of abstraction. Artificial Intelligence 57(2-3), 323–389 (1992)
Holte, R.C., Chouiery, B.Y.: Abstraction and reformulation in ai. Philosophical transactions of the Royal Society of London 358(1435:1), 197–204 (2003)
Howard, R.A.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)
Jong, N.K., Stone, P.: Model-based exploration in continuous state spaces. In: The Seventh Symposium on Abstraction, Reformulation, and Approximation (July 2007)
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proc. 15th International Conf. on Machine Learning, pp. 260–268. Morgan Kaufmann, San Francisco (1998)
Konidaris, G., Barto, A.: Autonomous shaping: Knowledge transfer in reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 489–496 (2006)
Konidaris, G., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 895–900 (2007)
Lazaric, A., Restelli, M., Bonarini, A.: Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th Annual ICML, pp. 544–551 (2008)
Li, L., Walsh, T.J., Littman, M.L.: Towards a unified theory of state abstraction for MDPs. In: Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics, pp. 531–539 (2006)
Mahadevan, S., Maggioni, M.: Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research 8, 2169–2231 (2007)
Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19(20), 629–679 (1994)
Noda, I., Matsubara, H., Hiraki, K., Frank, I.: Soccer server: A tool for research on multiagent systems. Applied Artificial Intelligence 12, 233–250 (1998)
Ponsen, M., Croonenborghs, T., Ramon, J., Tuyls, K., Driessens, K., van den Herik, J., Postma, E.: Learning with whom to communicate using relational reinforcement learning. In: International Conference on Autonomous Agents and Multi Agent Systems, AAMAS (2009)
Puterman, M.: Markov decision processes: Discrete stochastic dynamic programming. John Wiley and Sons, New York (1994)
Pyeatt, L.D., Howe, A.E.: Decision tree function approximation in reinforcement learning. In: Proceedings of the Third International Symposium on Adaptive Systems: Evolutionary Computation & Probabilistic Graphical Models, pp. 70–77 (2001)
Ravindran, B., Barto, A.: Model minimization in hierarchical reinforcement learning. In: Proceedings of the Fifth Symposium on Abstraction, Reformulation and Approximation (2002)
Ravindran, B., Barto, A.: An algebraic approach to abstraction in reinforcement learning. In: Twelfth Yale Workshop on Adaptive and Learning Systems, pp. 109–114 (2003)
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Technical report, Cambridge University Engineering Department (1994)
Skinner, B.F.: Science and Human Behavior. Colliler-Macmillian (1953)
Soni, V., Singh, S.: Using homomorphisms to transfer options across continuous reinforcement learning domains. In: Proceedings of the Twenty First National Conference on Artificial Intelligence (July 2006)
Sorg, J., Singh, S.: Transfer via soft homomorphisms. In: Proceedings of the Eighth International Conference on Autonomous Agents and Multiagent Systems, May 2009, pp. 741–748 (2009)
Stone, P., Kuhlmann, G., Taylor, M.E., Liu, Y.: Keepaway soccer: From machine learning testbed to benchmark. In: Bredenfeld, A., Jacoff, A., Noda, I., Takahashi, Y. (eds.) RoboCup 2005. LNCS (LNAI), vol. 4020, pp. 93–105. Springer, Heidelberg (2006)
Sutton, R.: Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bulletin 2, 160–163 (1991)
Sutton, R., Barto, A.: Reinforcement Learning: an introduction. MIT Press, Cambridge (1998)
Sutton, R., Precup, D., Singh, S.: Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)
Taylor, M.E.: Assisting transfer-enabled machine learning algorithms: Leveraging human knowledge for curriculum design. In: The AAAI 2009 Spring Symposium on Agents that Learn from Human Teachers (March 2009)
Taylor, M.E., Jong, N.K., Stone, P.: Transferring instances for model-based reinforcement learning. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 488–505. Springer, Heidelberg (2008)
Taylor, M.E., Stone, P.: Cross-domain transfer for reinforcement learning. In: Proceedings of the Twenty-Fourth International Conference on Machine Learning (June 2007)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10(1), 1633–1685 (2009)
Taylor, M.E., Stone, P., Liu, Y.: Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research 8(1), 2125–2167 (2007)
Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6(2), 215–219 (1994)
Thorndike, E., Woodworth, R.: The influence of improvement in one mental function upon the efficiency of other functions. Psychological Review 8, 247–261 (1901)
Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Advances in Neural Information Processing Systems, vol. 8, pp. 640–646 (1996)
Torrey, L., Shavlik, J.W., Walker, T., Maclin, R.: Relational macros for transfer in reinforcement learning. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 254–268. Springer, Heidelberg (2008)
Watkins, C.: Learning with Delayed Rewards. PhD thesis, Cambridge University (1989)
Weiss, G.: A multiagent variant of dyna-q. In: Proceedings of the 4th International Conference on Multi-Agent Systems (ICMAS 2000), pp. 461–462 (2000)
Wiering, M.: Explorations in Efficient Reinforcement Learning. PhD thesis, Universiteit van Amsterdam (1999)
Zucker, J.D.: A grounded theory of abstraction in artificial intelligence. Philosophical transactions of the Royal Society of London 358(1435:1), 293–309 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ponsen, M., Taylor, M.E., Tuyls, K. (2010). Abstraction and Generalization in Reinforcement Learning: A Summary and Framework. In: Taylor, M.E., Tuyls, K. (eds) Adaptive and Learning Agents. ALA 2009. Lecture Notes in Computer Science(), vol 5924. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11814-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-11814-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11813-5
Online ISBN: 978-3-642-11814-2
eBook Packages: Computer ScienceComputer Science (R0)