Looking Back
This book has provided the reader with a thorough description of the field of reinforcement learning (RL). In this last chapter we will first discuss what has been accomplished with this book, followed by a description of those topics that were left out of this book, mainly because they are outside of the main field of RL or they are small (possibly novel and emerging) subfields within RL. After looking back what has been done in RL and in this book, a step into the future development of the field will be taken, and we will end with the opinions of some of the authors what they think will become the most important areas of research in RL.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alpaydin, E.: Introduction to Machine learning. The MIT Press (2010)
Azar, M.G., Munos, R., Ghavamzadaeh, M., Kappen, H.J.: Speedy Q-learning. Advances in Neural Information Processing Systems (2011)
Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Proceedings of the 8th Conference on Intelligent Autonomous Systems, IAS-8, pp. 438–445 (2004)
Baxter, J., Tridgell, A., Weaver, L.: Knightcap: A chess program that learns by combining TD(λ) with minimax search. Tech. rep., Australian National University, Canberra (1997)
Berliner, H.: Experiences in evaluation with BKG - a program that plays backgammon. In: Proceedings of IJCAI, pp. 428–433 (1977)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic Programming. Athena Scientific, Belmont (1996)
Bishop, C.: Pattern Recognition and Machine learning. Springer, Heidelberg (2006)
Coates, A., Abbeel, P., Ng, A.: Apprenticeship learning for helicopter control. Commun. ACM 52(7), 97–105 (2009)
Coulom, R.: Efficient Selectivity and Backup Operators in Monte-carlo Tree Search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)
Cramer, N.L.: A representation for the adaptive generation of simple sequential programs. In: Grefenstette, J. (ed.) Proceedings of an International Conference on Genetic Algorithms and Their Applications, pp. 183–187 (1985)
Crites, R., Barto, A.: Improving elevator performance using reinforcement learning. In: Touretzky, D., Mozer, M., Hasselmo, M. (eds.) Advances in Neural Information Processing Systems, Cambridge, MA, vol. 8, pp. 1017–1023 (1996)
Di Caro, G., Dorigo, M.: An adaptive multi-agent routing algorithm inspired by ants behavior. In: Proceedings of PART 1998 - Fifth Annual Australasian Conference on Parallel and Real-Time Systems (1998)
Dietterich, T., Wang, X.: Batch value function approximation via support vectors. In: Advances in Neural Information Processing Systems, vol. 14, pp. 1491–1498 (2002)
Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning approach to the traveling salesman problem. Evolutionary Computation 1(1), 53–66 (1997)
Dorigo, M., Maniezzo, V., Colorni, A.: The ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B 26(1), 29–41 (1996)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)
Gambardella, L.M., Taillard, E., Dorigo, M.: Ant colonies for the qadratic assignement problem. Journal of the Operational Research Society 50, 167–176 (1999)
van Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, vol. 23, pp. 2613–2621 (2010)
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)
Kober, J., Peters, J.: Policy Search for Motor Primitives in Robotics. Machine Learning 84(1-2), 171–203 (2011)
Kolmogorov, A.: Three approaches to the quantitative definition of information. Problems of Information Transmission 1, 1–11 (1965)
Koza, J.R.: Genetic evolution and co-evolution of computer programs. In: Langton, C., Taylor, C., Farmer, J.D., Rasmussen, S. (eds.) Artificial Life II, pp. 313–324. Addison Wesley Publishing Company (1992)
Koza, J.R.: Genetic Programming II – Automatic Discovery of Reusable Programs. MIT Press (1994)
Li, M., Vitányi, P.M.B.: An introduction to Kolmogorov complexity and its applications. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, pp. 188–254. Elsevier Science Publishers B.V (1990)
Littman, M., Boyan, J.: A distributed reinforcement learning scheme for network routing. In: Alspector, J., Goodman, R., Brown, T. (eds.) Proceedings of the First International Workshop on Applications of Neural Networks to Telecommunication, Hillsdale, New Jersey, pp. 45–51 (1993)
Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 315–333 (2004)
Maei, H., Szepesvari, C., Bhatnagar, S., Sutton, R.: Toward off-policy learning control with function approximation. In: Proceedings of the International Conference on Machine Learning, pp. 719–726 (2010)
McCallum, R.A.: Instance-based utile distinctions for reinforcement learning with hidden state. In: Prieditis, A., Russell, S. (eds.) Machine Learning: Proceedings of the Twelfth International Conference, pp. 387–395. Morgan Kaufmann Publishers, San Francisco (1995)
McGovern, A., Andrew, G., Barto, E.M.: Scheduling straight-line code using reinforcement learning and rollouts. In: Proceedings of Neural Information Processing Systems. MIT Press (1999)
Mitchell, T.M.: Machine learning. McGraw Hill, New York (1996)
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)
Moriarty, D.E., Miikkulainen, R.: Efficient reinforcement learning through symbiotic evolution. Machine Learning 22, 11–32 (1996)
Nevmyvaka, Y., Feng, Y., Kearns, M.: Reinforcement learning for optimized trade execution. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 673–680 (2006)
Nouri, A., Littman, M.: Dimension reduction and its application to model-based exploration in continuous spaces. Machine Learning 81(1), 85–98 (2010)
van Otterlo, M.: Efficient reinforcement learning using relational aggregation. Proceedings of the Sixth European Workshop on Reinforcement Learning, EWRL-6 (2003)
Peters, J., Mülling, K., Altun, Y.: Relative entropy policy search. In: Fox, M., Poole, D. (eds.) Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010 (2010)
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement Learning for Humanoid Robotics. In: IEEE-RAS International Conference on Humanoid Robots, Humanoids (2003)
Peters, J., Schaal, S.: Reinforcement Learning of Motor Skills with Policy Gradients. Neural Networks 21(4), 682–697 (2008), doi:10.1016/j.neunet.2008.02.003
Poland, J., Hutter, M.: Universal learning of repeated matrix games. In: Proc. 15th Annual Machine Learning Conf. of Belgium and The Netherlands (Benelearn 2006), Ghent, pp. 7–14 (2006)
Riedmiller, M.: Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)
Riedmiller, S., Riedmiller, M.: A neural reinforcement learning approach to learn local dispatching policies in production scheduling. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI 1999) (1999)
Ring, M.: Continual learning in reinforcement environments. PhD thesis, University of Texas, Austin, Texas (1994)
Sałustowicz, R.P., Schmidhuber, J.H.: Probabilistic incremental program evolution. Evolutionary Computation 5(2), 123–141 (1997)
Schmidhuber, J.: The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 216–228. Springer, Heidelberg (2002)
Schmidhuber, J.: Optimal ordered problem solver. Machine Learning 54, 211–254 (2004)
Schmidhuber, J.: Ultimate cognition à la Gödel. Cognitive Computation 1(2), 177–193 (2009)
Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions on Autonomous Mental Development 2(3), 230–247 (2010)
Schmidhuber, J., Zhao, J., Schraudolph, N.: Reinforcement learning with self-modifying policies. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 293–309. Kluwer (1997a)
Schmidhuber, J., Zhao, J., Schraudolph, N.N.: Reinforcement learning with self-modifying policies. In: Thrun, S., Pratt, L. (eds.) Learning to Learn. Kluwer (1997b)
Schmidhuber, J.H.: Temporal-difference-driven learning in recurrent networks. In: Eckmiller, R., Hartmann, G., Hauske, G. (eds.) Parallel Processing in Neural Systems and Computers, pp. 209–212. North-Holland (1990)
Schmidhuber, J.H.: Curious model-building control systems. In: Proceedings of the International Joint Conference on Neural Networks, vol. 2, pp. 1458–1463. IEEE, Singapore (1991a)
Schmidhuber, J.H.: A possibility for implementing curiosity and boredom in model-building neural controllers. In: Meyer, J.A., Wilson, S.W. (eds.) Proceedings of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pp. 222–227. MIT Press/Bradford Books (1991b)
Schmidhuber, J.H.: A general method for incremental self-improvement and multi-agent learning in unrestricted environments. In: Yao, X. (ed.) Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore (1996)
Schmidhuber, J.H., Zhao, J., Wiering, M.A.: Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28, 105–130 (1997c)
Schoknecht, R.: Optimality of reinforcement learning algorithms with linear function approximation. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, NIPS 2002, pp. 1555–1562 (2002)
van Seijen, H., Whiteson, S., van Hasselt, H., Wiering, M.: Exploiting best-match equations for efficient reinforcement learning. Journal of Machine Learning Research 12, 2045–2094 (2011)
Simpkins, C., Bhat, S., Isbell Jr., C., Mateas, M.: Towards adaptive programming: integrating reinforcement learning into a programming language. SIGPLAN Not. 43, 603–614 (2008)
Singh, S., Litman, D., Kearns, M., Walker, M.: Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system. Journal of Artificial Intelligence Research 16, 105–133 (2002)
Smart, W., Kaelbling, L.: Effective reinforcement learning for mobile robots. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3404–3410 (2002)
Solomonoff, R.: A formal theory of inductive inference. Part I. Information and Control 7, 1–22 (1964)
Solomonoff, R.: Complexity-based induction systems. IEEE Transactions on Information Theory IT-24(5), 422–432 (1978)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Sutton, R.S.: Integrated architectures for learning, planning and reacting based on dynamic programming. In: Machine Learning: Proceedings of the Seventh International Workshop (1990)
Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and semi-MDPs: Learning, planning, learning and sequential decision making. Tech. Rep. COINS 89-95, University of Massachusetts, Amherst (1998)
Tanner, B., White, A.: RL-Glue: Language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research 10, 2133–2136 (2009)
Tesauro, G.: Temporal difference learning and TD-Gammon. Communications of the ACM 38, 58–68 (1995)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from game tree search. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1937–1945 (2009)
Veness, J., Ng, K., Hutter, M., Uther, W., Silver, D.: A Monte-carlo AIXI approximation. Journal of Artificial Intelligence Research (2011)
Watkins, C.J.C.H.: Learning from delayed rewards. PhD thesis, King’s College, Cambridge, England (1989)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
Westra, J.: Organizing adaptation using agents in serious games. PhD thesis, Utrecht University (2011)
Wiering, M.: Self-play and using an expert to learn to play backgammon with temporal difference learning. Journal of Intelligent Learning Systems and Applications 2(2), 57–68 (2010)
Wiering, M., van Hasselt, H.: Ensemble algorithms in reinforcement learning. IEEE Transactions, SMC Part B, Special Issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control (2008)
Wiering, M., van Hasselt, H.: The QV family compared to other reinforcement learning algorithms. In: Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), pp. 101–108 (2009)
Wiering, M.A.: Multi-agent reinforcement learning for traffic light control. In: Langley, P. (ed.) Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1151–1158 (2000)
Wiering, M.A., Schmidhuber, J.H.: Solving POMDPs with Levin search and EIRA. In: Saitta, L. (ed.) Machine Learning: Proceedings of the Thirteenth International Conference, pp. 534–542. Morgan Kaufmann Publishers, San Francisco (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wiering, M., van Otterlo, M. (2012). Conclusions, Future Directions and Outlook. In: Wiering, M., van Otterlo, M. (eds) Reinforcement Learning. Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27645-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-27645-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27644-6
Online ISBN: 978-3-642-27645-3
eBook Packages: EngineeringEngineering (R0)