Skip to main content

Balancing Exploration and Exploitation in Forward Model Learning

  • Chapter
  • First Online:

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 379))

Abstract

Forward model learning algorithms enable the application of simulation-based search methods in environments for which the forward model is unknown. Multiple studies have shown great performance in game-related and motion control applications. In these, forward model learning agents often required less training time while achieving a similar performance than state-of-the-art reinforcement learning methods. However, several problems can emerge when replacing the environment’s true model with a learned approximation. While the true forward model allows the accurate prediction of future time-steps, a learned forward model may always be inaccurate in its prediction. These inaccuracies become problematic when planning long action sequences since the confidence in predicted time-steps reduces with increasing depth of the simulation. In this work, we explore methods for balancing risk and reward in decision-making using inaccurate forward models. Therefore, we propose methods for measuring the variance of a forward model and the confidence in the predicted outcome of planned action sequences. Based on these metrics, we define methods for learning and using forward models under consideration of their current prediction accuracy. Proposed methods have been tested in various motion control tasks of the Open AI Gym framework. Results show that the information on the model’s accuracy can be used to increase the efficiency of the agent’s training and the agent’s performance during evaluation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/ADockhorn/Balancing-Exploration-And-Exploitation-in-Forward-Model-Learning.

References

  1. Apeldoorn, D., Dockhorn, A.: Exception-tolerant hierarchical knowledge bases for forward model learning. IEEE Trans. Games 1–14 (2020). https://doi.org/10.1109/TG.2020.3008002

  2. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. SMC-13(5), 834–846 (1983). https://doi.org/10.1109/TSMC.1983.6313077

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324

    Article  MATH  Google Scholar 

  4. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym (2016)

    Google Scholar 

  5. Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012). https://doi.org/10.1109/TCIAIG.2012.2186810

    Article  Google Scholar 

  6. Clary, P., Morais, P., Fern, A., Hurst, J.: Monte-Carlo planning for agile legged locomotion. In: Proceedings International Conference on Automated Planning and Scheduling, ICAPS 2018-June (Icaps), pp. 446–450 (2018)

    Google Scholar 

  7. Dearden, A., Demiris, Y.: Learning forward models for robots. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 1440–1445 (2005)

    Google Scholar 

  8. Dockhorn, A.: Prediction-based search for autonomous game-playing. Ph.D. thesis, Otto-von-Guericke-Universität Magdeburg, Fakultät für Informatik (2020). https://doi.org/10.25673/34014

  9. Dockhorn, A., Apeldoorn, D.: Forward model approximation for general video game learning. In: Proceedings of the 2018 IEEE Conference on Computational Intelligence and Games (CIG’18), pp. 425–432. IEEE (2018). https://doi.org/10.1109/CIG.2018.8490411

  10. Dockhorn, A., Doell, C., Hewelt, M., Kruse, R.: A decision heuristic for Monte Carlo tree search Doppelkopf agents. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE (2017). https://doi.org/10.1109/SSCI.2017.8285181

  11. Dockhorn, A., Kruse, R.: Detecting sensor dependencies for building complementary model ensembles. In: Proceedings of the 28. Workshop Computational Intelligence, Dortmund, 29–30 Nov 2018, pp. 217–234 (2018)

    Google Scholar 

  12. Dockhorn, A., Kruse, R.: Forward model learning for motion control tasks. In: 2020 IEEE 10th International Conference on Intelligent Systems (IS), pp. 1–5. IEEE (2020). https://doi.org/10.1109/IS48319.2020.9199978

  13. Dockhorn, A., Lucas, S.M., Volz, V., Bravi, I., Gaina, R.D., Perez-Liebana, D.: Learning local forward models on unforgiving games. In: 2019 IEEE Conference on Games (CoG), pp. 1–4. IEEE, London (2019). https://doi.org/10.1109/CIG.2019.8848044

  14. Dockhorn, A., Tippelt, T., Kruse, R.: Model decomposition for forward model approximation. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1751–1757. IEEE (2018). https://doi.org/10.1109/SSCI.2018.8628624

  15. Gaina, R.D., Liu, J., Lucas, S.M., Pérez-Liébana, D.: Analysis of vanilla rolling horizon evolution parameters in general video game playing. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10199, pp. 418–434. Springer (2017). https://doi.org/10.1007/978-3-319-55849-3_28

  16. Gaina, R.D., Lucas, S.M., Perez-Liebana, D.: Rolling horizon evolution enhancements in general video game playing. In: 2017 IEEE Conference on Computational Intelligence and Games (CIG), pp. 88–95. IEEE (2017). https://doi.org/10.1109/CIG.2017.8080420

  17. Gu, S., Lillicrap, T., Sutskever, U., Levine, S.: Continuous deep q-learning with model-based acceleration. In: 33rd International Conference on Machine Learning, ICML 2016, vol. 6, pp. 4135–4148 (2016)

    Google Scholar 

  18. Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 2450–2462. Curran Associates, Inc. (2018)

    Google Scholar 

  19. Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to Control: Learning Behaviors by Latent Imagination (2019). http://arxiv.org/abs/1912.01603

  20. Henaff, M., Whitney, W.F., LeCun, Y.: Model-based planning with discrete and continuous actions (2017)

    Google Scholar 

  21. Howard, R.: Dynamic Programming and Markov Processes. Technology Press Research Monographs. Technology Press of Massachusetts Institute of Technology (1960)

    Google Scholar 

  22. Lucas, S.M., Dockhorn, A., Volz, V., Bamford, C., Gaina, R.D., Bravi, I., Perez-Liebana, D., Mostaghim, S., Kruse, R.: A local approach to forward model learning: results on the game of life game. In: 2019 IEEE Conference on Games (CoG), pp. 1–8. IEEE (2019). https://doi.org/10.1109/CIG.2019.8848002

  23. Maschler, M., Solan, E., Zamir, S.: Game Theory. Cambridge University Press, Cambridge (2013). https://doi.org/10.1017/CBO9780511794216

  24. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.a., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236

  25. Nguyen-Tuong, D., Peters, J.: Model learning for robot control: a survey. Cognit. Process. 12(4), 319–340 (2011). https://doi.org/10.1007/s10339-011-0404-1

    Article  Google Scholar 

  26. Perez Liebana, D., Dieskau, J., Hunermund, M., Mostaghim, S., Lucas, S.: Open loop search for general video game playing. In: Proceedings of the 2015 on Genetic and Evolutionary Computation Conference—GECCO ’15, pp. 337–344. ACM Press, New York, NY, USA (2015). https://doi.org/10.1145/2739480.2754811

  27. Perez-Liebana, D., Lucas, S.M., Gaina, R.D., Togelius, J., Khalifa, A., Liu, J.: General Video Game Artificial Intelligence, vol. 3. Morgan & Claypool Publishers (2019). https://gaigresearch.github.io/gvgaibook/

  28. Racanière, S., Weber, T., Reichert, D.P., Buesing, L., Guez, A., Rezende, D., Badia, A.P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Hassabis, D., Silver, D., Wierstra, D.: Imagination-augmented agents for deep reinforcement learning. Advances in Neural Information Processing Systems 2017-Decem (Nips), pp. 5691–5702 (2017)

    Google Scholar 

  29. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. University Press Group Limited, New York (2006)

    MATH  Google Scholar 

  30. Rubinstein, R.Y., Kroese, D.P.: Simulation and the Monte Carlo Method. Wiley (2016). https://doi.org/10.1002/9781118631980

  31. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961

    Article  Google Scholar 

  32. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988). https://doi.org/10.1007/BF00115009

    Article  Google Scholar 

  33. Sutton, R.S., Barto, A.G.: Reinforcement Learning, 2nd edn. The MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  34. Szita, I., Lorincz, A.: Learning Tetris using the noisy cross-entropy method. Neural Comput. 18(12), 2936–2941 (2006). https://doi.org/10.1162/neco.2006.18.12.2936

    Article  MATH  Google Scholar 

  35. Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992). https://doi.org/10.1007/bf00992698

    Article  MATH  Google Scholar 

  36. Weber, T., Racanière, S., Reichert, D.P., Buesing, L., Guez, A., Rezende, D.J., Badia, A.P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Silver, D., Wierstra, D.: Imagination-Augmented Agents for Deep Reinforcement Learning (2017). http://arxiv.org/abs/1707.06203

  37. Yannakakis, G.N., Togelius, J.: Artificial Intelligence and Games. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-63519-4

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Dockhorn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dockhorn, A., Kruse, R. (2022). Balancing Exploration and Exploitation in Forward Model Learning. In: Sgurev, V., Jotsov, V., Kacprzyk, J. (eds) Advances in Intelligent Systems Research and Innovation. Studies in Systems, Decision and Control, vol 379. Springer, Cham. https://doi.org/10.1007/978-3-030-78124-8_1

Download citation

Publish with us

Policies and ethics