Conclusions, Future Directions and Outlook

Wiering, Marco; van Otterlo, Martijn

doi:10.1007/978-3-642-27645-3_19

Conclusions, Future Directions and Outlook

Marco Wiering³ &
Martijn van Otterlo⁴

Chapter

28k Accesses
12 Citations

Part of the book series: Adaptation, Learning, and Optimization ((ALO,volume 12))

Looking Back

This book has provided the reader with a thorough description of the field of reinforcement learning (RL). In this last chapter we will first discuss what has been accomplished with this book, followed by a description of those topics that were left out of this book, mainly because they are outside of the main field of RL or they are small (possibly novel and emerging) subfields within RL. After looking back what has been done in RL and in this book, a step into the future development of the field will be taken, and we will end with the opinions of some of the authors what they think will become the most important areas of research in RL.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alpaydin, E.: Introduction to Machine learning. The MIT Press (2010)
Google Scholar
Azar, M.G., Munos, R., Ghavamzadaeh, M., Kappen, H.J.: Speedy Q-learning. Advances in Neural Information Processing Systems (2011)
Google Scholar
Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Proceedings of the 8th Conference on Intelligent Autonomous Systems, IAS-8, pp. 438–445 (2004)
Google Scholar
Baxter, J., Tridgell, A., Weaver, L.: Knightcap: A chess program that learns by combining TD(λ) with minimax search. Tech. rep., Australian National University, Canberra (1997)
Google Scholar
Berliner, H.: Experiences in evaluation with BKG - a program that plays backgammon. In: Proceedings of IJCAI, pp. 428–433 (1977)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Bishop, C.: Pattern Recognition and Machine learning. Springer, Heidelberg (2006)
MATH Google Scholar
Coates, A., Abbeel, P., Ng, A.: Apprenticeship learning for helicopter control. Commun. ACM 52(7), 97–105 (2009)
Article Google Scholar
Coulom, R.: Efficient Selectivity and Backup Operators in Monte-carlo Tree Search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)
Chapter Google Scholar
Cramer, N.L.: A representation for the adaptive generation of simple sequential programs. In: Grefenstette, J. (ed.) Proceedings of an International Conference on Genetic Algorithms and Their Applications, pp. 183–187 (1985)
Google Scholar
Crites, R., Barto, A.: Improving elevator performance using reinforcement learning. In: Touretzky, D., Mozer, M., Hasselmo, M. (eds.) Advances in Neural Information Processing Systems, Cambridge, MA, vol. 8, pp. 1017–1023 (1996)
Google Scholar
Di Caro, G., Dorigo, M.: An adaptive multi-agent routing algorithm inspired by ants behavior. In: Proceedings of PART 1998 - Fifth Annual Australasian Conference on Parallel and Real-Time Systems (1998)
Google Scholar
Dietterich, T., Wang, X.: Batch value function approximation via support vectors. In: Advances in Neural Information Processing Systems, vol. 14, pp. 1491–1498 (2002)
Google Scholar
Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning approach to the traveling salesman problem. Evolutionary Computation 1(1), 53–66 (1997)
Article Google Scholar
Dorigo, M., Maniezzo, V., Colorni, A.: The ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B 26(1), 29–41 (1996)
Article Google Scholar
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)
MathSciNet MATH Google Scholar
Gambardella, L.M., Taillard, E., Dorigo, M.: Ant colonies for the qadratic assignement problem. Journal of the Operational Research Society 50, 167–176 (1999)
MATH Google Scholar
van Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, vol. 23, pp. 2613–2621 (2010)
Google Scholar
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)
Google Scholar
Kober, J., Peters, J.: Policy Search for Motor Primitives in Robotics. Machine Learning 84(1-2), 171–203 (2011)
Article MATH Google Scholar
Kolmogorov, A.: Three approaches to the quantitative definition of information. Problems of Information Transmission 1, 1–11 (1965)
Google Scholar
Koza, J.R.: Genetic evolution and co-evolution of computer programs. In: Langton, C., Taylor, C., Farmer, J.D., Rasmussen, S. (eds.) Artificial Life II, pp. 313–324. Addison Wesley Publishing Company (1992)
Google Scholar
Koza, J.R.: Genetic Programming II – Automatic Discovery of Reusable Programs. MIT Press (1994)
Google Scholar
Li, M., Vitányi, P.M.B.: An introduction to Kolmogorov complexity and its applications. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, pp. 188–254. Elsevier Science Publishers B.V (1990)
Google Scholar
Littman, M., Boyan, J.: A distributed reinforcement learning scheme for network routing. In: Alspector, J., Goodman, R., Brown, T. (eds.) Proceedings of the First International Workshop on Applications of Neural Networks to Telecommunication, Hillsdale, New Jersey, pp. 45–51 (1993)
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 315–333 (2004)
Google Scholar
Maei, H., Szepesvari, C., Bhatnagar, S., Sutton, R.: Toward off-policy learning control with function approximation. In: Proceedings of the International Conference on Machine Learning, pp. 719–726 (2010)
Google Scholar
McCallum, R.A.: Instance-based utile distinctions for reinforcement learning with hidden state. In: Prieditis, A., Russell, S. (eds.) Machine Learning: Proceedings of the Twelfth International Conference, pp. 387–395. Morgan Kaufmann Publishers, San Francisco (1995)
Google Scholar
McGovern, A., Andrew, G., Barto, E.M.: Scheduling straight-line code using reinforcement learning and rollouts. In: Proceedings of Neural Information Processing Systems. MIT Press (1999)
Google Scholar
Mitchell, T.M.: Machine learning. McGraw Hill, New York (1996)
MATH Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)
Google Scholar
Moriarty, D.E., Miikkulainen, R.: Efficient reinforcement learning through symbiotic evolution. Machine Learning 22, 11–32 (1996)
Google Scholar
Nevmyvaka, Y., Feng, Y., Kearns, M.: Reinforcement learning for optimized trade execution. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 673–680 (2006)
Google Scholar
Nouri, A., Littman, M.: Dimension reduction and its application to model-based exploration in continuous spaces. Machine Learning 81(1), 85–98 (2010)
Article Google Scholar
van Otterlo, M.: Efficient reinforcement learning using relational aggregation. Proceedings of the Sixth European Workshop on Reinforcement Learning, EWRL-6 (2003)
Google Scholar
Peters, J., Mülling, K., Altun, Y.: Relative entropy policy search. In: Fox, M., Poole, D. (eds.) Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010 (2010)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement Learning for Humanoid Robotics. In: IEEE-RAS International Conference on Humanoid Robots, Humanoids (2003)
Google Scholar
Peters, J., Schaal, S.: Reinforcement Learning of Motor Skills with Policy Gradients. Neural Networks 21(4), 682–697 (2008), doi:10.1016/j.neunet.2008.02.003
Article Google Scholar
Poland, J., Hutter, M.: Universal learning of repeated matrix games. In: Proc. 15th Annual Machine Learning Conf. of Belgium and The Netherlands (Benelearn 2006), Ghent, pp. 7–14 (2006)
Google Scholar
Riedmiller, M.: Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)
Chapter Google Scholar
Riedmiller, S., Riedmiller, M.: A neural reinforcement learning approach to learn local dispatching policies in production scheduling. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI 1999) (1999)
Google Scholar
Ring, M.: Continual learning in reinforcement environments. PhD thesis, University of Texas, Austin, Texas (1994)
Google Scholar
Sałustowicz, R.P., Schmidhuber, J.H.: Probabilistic incremental program evolution. Evolutionary Computation 5(2), 123–141 (1997)
Article Google Scholar
Schmidhuber, J.: The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 216–228. Springer, Heidelberg (2002)
Chapter Google Scholar
Schmidhuber, J.: Optimal ordered problem solver. Machine Learning 54, 211–254 (2004)
Article MATH Google Scholar
Schmidhuber, J.: Ultimate cognition à la Gödel. Cognitive Computation 1(2), 177–193 (2009)
Article Google Scholar
Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions on Autonomous Mental Development 2(3), 230–247 (2010)
Article Google Scholar
Schmidhuber, J., Zhao, J., Schraudolph, N.: Reinforcement learning with self-modifying policies. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 293–309. Kluwer (1997a)
Google Scholar
Schmidhuber, J., Zhao, J., Schraudolph, N.N.: Reinforcement learning with self-modifying policies. In: Thrun, S., Pratt, L. (eds.) Learning to Learn. Kluwer (1997b)
Google Scholar
Schmidhuber, J.H.: Temporal-difference-driven learning in recurrent networks. In: Eckmiller, R., Hartmann, G., Hauske, G. (eds.) Parallel Processing in Neural Systems and Computers, pp. 209–212. North-Holland (1990)
Google Scholar
Schmidhuber, J.H.: Curious model-building control systems. In: Proceedings of the International Joint Conference on Neural Networks, vol. 2, pp. 1458–1463. IEEE, Singapore (1991a)
Google Scholar
Schmidhuber, J.H.: A possibility for implementing curiosity and boredom in model-building neural controllers. In: Meyer, J.A., Wilson, S.W. (eds.) Proceedings of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pp. 222–227. MIT Press/Bradford Books (1991b)
Google Scholar
Schmidhuber, J.H.: A general method for incremental self-improvement and multi-agent learning in unrestricted environments. In: Yao, X. (ed.) Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore (1996)
Google Scholar
Schmidhuber, J.H., Zhao, J., Wiering, M.A.: Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28, 105–130 (1997c)
Article Google Scholar
Schoknecht, R.: Optimality of reinforcement learning algorithms with linear function approximation. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, NIPS 2002, pp. 1555–1562 (2002)
Google Scholar
van Seijen, H., Whiteson, S., van Hasselt, H., Wiering, M.: Exploiting best-match equations for efficient reinforcement learning. Journal of Machine Learning Research 12, 2045–2094 (2011)
Google Scholar
Simpkins, C., Bhat, S., Isbell Jr., C., Mateas, M.: Towards adaptive programming: integrating reinforcement learning into a programming language. SIGPLAN Not. 43, 603–614 (2008)
Article Google Scholar
Singh, S., Litman, D., Kearns, M., Walker, M.: Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system. Journal of Artificial Intelligence Research 16, 105–133 (2002)
Google Scholar
Smart, W., Kaelbling, L.: Effective reinforcement learning for mobile robots. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3404–3410 (2002)
Google Scholar
Solomonoff, R.: A formal theory of inductive inference. Part I. Information and Control 7, 1–22 (1964)
Article MathSciNet MATH Google Scholar
Solomonoff, R.: Complexity-based induction systems. IEEE Transactions on Information Theory IT-24(5), 422–432 (1978)
Article MathSciNet Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning and reacting based on dynamic programming. In: Machine Learning: Proceedings of the Seventh International Workshop (1990)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and semi-MDPs: Learning, planning, learning and sequential decision making. Tech. Rep. COINS 89-95, University of Massachusetts, Amherst (1998)
Google Scholar
Tanner, B., White, A.: RL-Glue: Language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research 10, 2133–2136 (2009)
Google Scholar
Tesauro, G.: Temporal difference learning and TD-Gammon. Communications of the ACM 38, 58–68 (1995)
Article Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Book MATH Google Scholar
Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from game tree search. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1937–1945 (2009)
Google Scholar
Veness, J., Ng, K., Hutter, M., Uther, W., Silver, D.: A Monte-carlo AIXI approximation. Journal of Artificial Intelligence Research (2011)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. PhD thesis, King’s College, Cambridge, England (1989)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
MATH Google Scholar
Westra, J.: Organizing adaptation using agents in serious games. PhD thesis, Utrecht University (2011)
Google Scholar
Wiering, M.: Self-play and using an expert to learn to play backgammon with temporal difference learning. Journal of Intelligent Learning Systems and Applications 2(2), 57–68 (2010)
Article Google Scholar
Wiering, M., van Hasselt, H.: Ensemble algorithms in reinforcement learning. IEEE Transactions, SMC Part B, Special Issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control (2008)
Google Scholar
Wiering, M., van Hasselt, H.: The QV family compared to other reinforcement learning algorithms. In: Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), pp. 101–108 (2009)
Google Scholar
Wiering, M.A.: Multi-agent reinforcement learning for traffic light control. In: Langley, P. (ed.) Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1151–1158 (2000)
Google Scholar
Wiering, M.A., Schmidhuber, J.H.: Solving POMDPs with Levin search and EIRA. In: Saitta, L. (ed.) Machine Learning: Proceedings of the Thirteenth International Conference, pp. 534–542. Morgan Kaufmann Publishers, San Francisco (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Artificial Intelligence, University of Groningen, Groningen, The Netherlands
Marco Wiering
Radboud University, Nijmegen, The Netherlands
Martijn van Otterlo

Authors

Marco Wiering
View author publications
You can also search for this author in PubMed Google Scholar
Martijn van Otterlo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Wiering .

Editor information

Editors and Affiliations

Fac. Mathematics &, Natural Sciences, University of Groningen, Nijenborgh 9, Groningen, 9747 AG, Netherlands
Marco Wiering
, Artificial Intelligence, Radboud University Nijmegen, B.02.30 Spinozagebouw, Montessorilaan 3, Nijmegen, 6500, Netherlands
Martijn van Otterlo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wiering, M., van Otterlo, M. (2012). Conclusions, Future Directions and Outlook. In: Wiering, M., van Otterlo, M. (eds) Reinforcement Learning. Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27645-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-27645-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27644-6
Online ISBN: 978-3-642-27645-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics