Skip to main content

Conclusions, Future Directions and Outlook

  • Chapter

Part of the book series: Adaptation, Learning, and Optimization ((ALO,volume 12))

Looking Back

This book has provided the reader with a thorough description of the field of reinforcement learning (RL). In this last chapter we will first discuss what has been accomplished with this book, followed by a description of those topics that were left out of this book, mainly because they are outside of the main field of RL or they are small (possibly novel and emerging) subfields within RL. After looking back what has been done in RL and in this book, a step into the future development of the field will be taken, and we will end with the opinions of some of the authors what they think will become the most important areas of research in RL.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Alpaydin, E.: Introduction to Machine learning. The MIT Press (2010)

    Google Scholar 

  • Azar, M.G., Munos, R., Ghavamzadaeh, M., Kappen, H.J.: Speedy Q-learning. Advances in Neural Information Processing Systems (2011)

    Google Scholar 

  • Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Proceedings of the 8th Conference on Intelligent Autonomous Systems, IAS-8, pp. 438–445 (2004)

    Google Scholar 

  • Baxter, J., Tridgell, A., Weaver, L.: Knightcap: A chess program that learns by combining TD(λ) with minimax search. Tech. rep., Australian National University, Canberra (1997)

    Google Scholar 

  • Berliner, H.: Experiences in evaluation with BKG - a program that plays backgammon. In: Proceedings of IJCAI, pp. 428–433 (1977)

    Google Scholar 

  • Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  • Bishop, C.: Pattern Recognition and Machine learning. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  • Coates, A., Abbeel, P., Ng, A.: Apprenticeship learning for helicopter control. Commun. ACM 52(7), 97–105 (2009)

    Article  Google Scholar 

  • Coulom, R.: Efficient Selectivity and Backup Operators in Monte-carlo Tree Search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  • Cramer, N.L.: A representation for the adaptive generation of simple sequential programs. In: Grefenstette, J. (ed.) Proceedings of an International Conference on Genetic Algorithms and Their Applications, pp. 183–187 (1985)

    Google Scholar 

  • Crites, R., Barto, A.: Improving elevator performance using reinforcement learning. In: Touretzky, D., Mozer, M., Hasselmo, M. (eds.) Advances in Neural Information Processing Systems, Cambridge, MA, vol. 8, pp. 1017–1023 (1996)

    Google Scholar 

  • Di Caro, G., Dorigo, M.: An adaptive multi-agent routing algorithm inspired by ants behavior. In: Proceedings of PART 1998 - Fifth Annual Australasian Conference on Parallel and Real-Time Systems (1998)

    Google Scholar 

  • Dietterich, T., Wang, X.: Batch value function approximation via support vectors. In: Advances in Neural Information Processing Systems, vol. 14, pp. 1491–1498 (2002)

    Google Scholar 

  • Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning approach to the traveling salesman problem. Evolutionary Computation 1(1), 53–66 (1997)

    Article  Google Scholar 

  • Dorigo, M., Maniezzo, V., Colorni, A.: The ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B 26(1), 29–41 (1996)

    Article  Google Scholar 

  • Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)

    MathSciNet  MATH  Google Scholar 

  • Gambardella, L.M., Taillard, E., Dorigo, M.: Ant colonies for the qadratic assignement problem. Journal of the Operational Research Society 50, 167–176 (1999)

    MATH  Google Scholar 

  • van Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, vol. 23, pp. 2613–2621 (2010)

    Google Scholar 

  • Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)

    Google Scholar 

  • Kober, J., Peters, J.: Policy Search for Motor Primitives in Robotics. Machine Learning 84(1-2), 171–203 (2011)

    Article  MATH  Google Scholar 

  • Kolmogorov, A.: Three approaches to the quantitative definition of information. Problems of Information Transmission 1, 1–11 (1965)

    Google Scholar 

  • Koza, J.R.: Genetic evolution and co-evolution of computer programs. In: Langton, C., Taylor, C., Farmer, J.D., Rasmussen, S. (eds.) Artificial Life II, pp. 313–324. Addison Wesley Publishing Company (1992)

    Google Scholar 

  • Koza, J.R.: Genetic Programming II – Automatic Discovery of Reusable Programs. MIT Press (1994)

    Google Scholar 

  • Li, M., Vitányi, P.M.B.: An introduction to Kolmogorov complexity and its applications. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, pp. 188–254. Elsevier Science Publishers B.V (1990)

    Google Scholar 

  • Littman, M., Boyan, J.: A distributed reinforcement learning scheme for network routing. In: Alspector, J., Goodman, R., Brown, T. (eds.) Proceedings of the First International Workshop on Applications of Neural Networks to Telecommunication, Hillsdale, New Jersey, pp. 45–51 (1993)

    Google Scholar 

  • Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 315–333 (2004)

    Google Scholar 

  • Maei, H., Szepesvari, C., Bhatnagar, S., Sutton, R.: Toward off-policy learning control with function approximation. In: Proceedings of the International Conference on Machine Learning, pp. 719–726 (2010)

    Google Scholar 

  • McCallum, R.A.: Instance-based utile distinctions for reinforcement learning with hidden state. In: Prieditis, A., Russell, S. (eds.) Machine Learning: Proceedings of the Twelfth International Conference, pp. 387–395. Morgan Kaufmann Publishers, San Francisco (1995)

    Google Scholar 

  • McGovern, A., Andrew, G., Barto, E.M.: Scheduling straight-line code using reinforcement learning and rollouts. In: Proceedings of Neural Information Processing Systems. MIT Press (1999)

    Google Scholar 

  • Mitchell, T.M.: Machine learning. McGraw Hill, New York (1996)

    MATH  Google Scholar 

  • Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)

    Google Scholar 

  • Moriarty, D.E., Miikkulainen, R.: Efficient reinforcement learning through symbiotic evolution. Machine Learning 22, 11–32 (1996)

    Google Scholar 

  • Nevmyvaka, Y., Feng, Y., Kearns, M.: Reinforcement learning for optimized trade execution. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 673–680 (2006)

    Google Scholar 

  • Nouri, A., Littman, M.: Dimension reduction and its application to model-based exploration in continuous spaces. Machine Learning 81(1), 85–98 (2010)

    Article  Google Scholar 

  • van Otterlo, M.: Efficient reinforcement learning using relational aggregation. Proceedings of the Sixth European Workshop on Reinforcement Learning, EWRL-6 (2003)

    Google Scholar 

  • Peters, J., Mülling, K., Altun, Y.: Relative entropy policy search. In: Fox, M., Poole, D. (eds.) Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010 (2010)

    Google Scholar 

  • Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement Learning for Humanoid Robotics. In: IEEE-RAS International Conference on Humanoid Robots, Humanoids (2003)

    Google Scholar 

  • Peters, J., Schaal, S.: Reinforcement Learning of Motor Skills with Policy Gradients. Neural Networks 21(4), 682–697 (2008), doi:10.1016/j.neunet.2008.02.003

    Article  Google Scholar 

  • Poland, J., Hutter, M.: Universal learning of repeated matrix games. In: Proc. 15th Annual Machine Learning Conf. of Belgium and The Netherlands (Benelearn 2006), Ghent, pp. 7–14 (2006)

    Google Scholar 

  • Riedmiller, M.: Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  • Riedmiller, S., Riedmiller, M.: A neural reinforcement learning approach to learn local dispatching policies in production scheduling. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI 1999) (1999)

    Google Scholar 

  • Ring, M.: Continual learning in reinforcement environments. PhD thesis, University of Texas, Austin, Texas (1994)

    Google Scholar 

  • Sałustowicz, R.P., Schmidhuber, J.H.: Probabilistic incremental program evolution. Evolutionary Computation 5(2), 123–141 (1997)

    Article  Google Scholar 

  • Schmidhuber, J.: The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 216–228. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Schmidhuber, J.: Optimal ordered problem solver. Machine Learning 54, 211–254 (2004)

    Article  MATH  Google Scholar 

  • Schmidhuber, J.: Ultimate cognition à la Gödel. Cognitive Computation 1(2), 177–193 (2009)

    Article  Google Scholar 

  • Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions on Autonomous Mental Development 2(3), 230–247 (2010)

    Article  Google Scholar 

  • Schmidhuber, J., Zhao, J., Schraudolph, N.: Reinforcement learning with self-modifying policies. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 293–309. Kluwer (1997a)

    Google Scholar 

  • Schmidhuber, J., Zhao, J., Schraudolph, N.N.: Reinforcement learning with self-modifying policies. In: Thrun, S., Pratt, L. (eds.) Learning to Learn. Kluwer (1997b)

    Google Scholar 

  • Schmidhuber, J.H.: Temporal-difference-driven learning in recurrent networks. In: Eckmiller, R., Hartmann, G., Hauske, G. (eds.) Parallel Processing in Neural Systems and Computers, pp. 209–212. North-Holland (1990)

    Google Scholar 

  • Schmidhuber, J.H.: Curious model-building control systems. In: Proceedings of the International Joint Conference on Neural Networks, vol. 2, pp. 1458–1463. IEEE, Singapore (1991a)

    Google Scholar 

  • Schmidhuber, J.H.: A possibility for implementing curiosity and boredom in model-building neural controllers. In: Meyer, J.A., Wilson, S.W. (eds.) Proceedings of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pp. 222–227. MIT Press/Bradford Books (1991b)

    Google Scholar 

  • Schmidhuber, J.H.: A general method for incremental self-improvement and multi-agent learning in unrestricted environments. In: Yao, X. (ed.) Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore (1996)

    Google Scholar 

  • Schmidhuber, J.H., Zhao, J., Wiering, M.A.: Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28, 105–130 (1997c)

    Article  Google Scholar 

  • Schoknecht, R.: Optimality of reinforcement learning algorithms with linear function approximation. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, NIPS 2002, pp. 1555–1562 (2002)

    Google Scholar 

  • van Seijen, H., Whiteson, S., van Hasselt, H., Wiering, M.: Exploiting best-match equations for efficient reinforcement learning. Journal of Machine Learning Research 12, 2045–2094 (2011)

    Google Scholar 

  • Simpkins, C., Bhat, S., Isbell Jr., C., Mateas, M.: Towards adaptive programming: integrating reinforcement learning into a programming language. SIGPLAN Not. 43, 603–614 (2008)

    Article  Google Scholar 

  • Singh, S., Litman, D., Kearns, M., Walker, M.: Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system. Journal of Artificial Intelligence Research 16, 105–133 (2002)

    Google Scholar 

  • Smart, W., Kaelbling, L.: Effective reinforcement learning for mobile robots. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3404–3410 (2002)

    Google Scholar 

  • Solomonoff, R.: A formal theory of inductive inference. Part I. Information and Control 7, 1–22 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  • Solomonoff, R.: Complexity-based induction systems. IEEE Transactions on Information Theory IT-24(5), 422–432 (1978)

    Article  MathSciNet  Google Scholar 

  • Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)

    Google Scholar 

  • Sutton, R.S.: Integrated architectures for learning, planning and reacting based on dynamic programming. In: Machine Learning: Proceedings of the Seventh International Workshop (1990)

    Google Scholar 

  • Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and semi-MDPs: Learning, planning, learning and sequential decision making. Tech. Rep. COINS 89-95, University of Massachusetts, Amherst (1998)

    Google Scholar 

  • Tanner, B., White, A.: RL-Glue: Language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research 10, 2133–2136 (2009)

    Google Scholar 

  • Tesauro, G.: Temporal difference learning and TD-Gammon. Communications of the ACM 38, 58–68 (1995)

    Article  Google Scholar 

  • Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    Book  MATH  Google Scholar 

  • Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from game tree search. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1937–1945 (2009)

    Google Scholar 

  • Veness, J., Ng, K., Hutter, M., Uther, W., Silver, D.: A Monte-carlo AIXI approximation. Journal of Artificial Intelligence Research (2011)

    Google Scholar 

  • Watkins, C.J.C.H.: Learning from delayed rewards. PhD thesis, King’s College, Cambridge, England (1989)

    Google Scholar 

  • Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)

    MATH  Google Scholar 

  • Westra, J.: Organizing adaptation using agents in serious games. PhD thesis, Utrecht University (2011)

    Google Scholar 

  • Wiering, M.: Self-play and using an expert to learn to play backgammon with temporal difference learning. Journal of Intelligent Learning Systems and Applications 2(2), 57–68 (2010)

    Article  Google Scholar 

  • Wiering, M., van Hasselt, H.: Ensemble algorithms in reinforcement learning. IEEE Transactions, SMC Part B, Special Issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control (2008)

    Google Scholar 

  • Wiering, M., van Hasselt, H.: The QV family compared to other reinforcement learning algorithms. In: Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), pp. 101–108 (2009)

    Google Scholar 

  • Wiering, M.A.: Multi-agent reinforcement learning for traffic light control. In: Langley, P. (ed.) Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1151–1158 (2000)

    Google Scholar 

  • Wiering, M.A., Schmidhuber, J.H.: Solving POMDPs with Levin search and EIRA. In: Saitta, L. (ed.) Machine Learning: Proceedings of the Thirteenth International Conference, pp. 534–542. Morgan Kaufmann Publishers, San Francisco (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Wiering .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wiering, M., van Otterlo, M. (2012). Conclusions, Future Directions and Outlook. In: Wiering, M., van Otterlo, M. (eds) Reinforcement Learning. Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27645-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27645-3_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27644-6

  • Online ISBN: 978-3-642-27645-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics