Balancing Exploration and Exploitation in Forward Model Learning

Dockhorn, Alexander; Kruse, Rudolf

doi:10.1007/978-3-030-78124-8_1

Balancing Exploration and Exploitation in Forward Model Learning

Alexander Dockhorn⁵ &
Rudolf Kruse⁶

Chapter
First Online: 03 November 2021

428 Accesses
1 Citations

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 379))

Abstract

Forward model learning algorithms enable the application of simulation-based search methods in environments for which the forward model is unknown. Multiple studies have shown great performance in game-related and motion control applications. In these, forward model learning agents often required less training time while achieving a similar performance than state-of-the-art reinforcement learning methods. However, several problems can emerge when replacing the environment’s true model with a learned approximation. While the true forward model allows the accurate prediction of future time-steps, a learned forward model may always be inaccurate in its prediction. These inaccuracies become problematic when planning long action sequences since the confidence in predicted time-steps reduces with increasing depth of the simulation. In this work, we explore methods for balancing risk and reward in decision-making using inaccurate forward models. Therefore, we propose methods for measuring the variance of a forward model and the confidence in the predicted outcome of planned action sequences. Based on these metrics, we define methods for learning and using forward models under consideration of their current prediction accuracy. Proposed methods have been tested in various motion control tasks of the Open AI Gym framework. Results show that the information on the model’s accuracy can be used to increase the efficiency of the agent’s training and the agent’s performance during evaluation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/ADockhorn/Balancing-Exploration-And-Exploitation-in-Forward-Model-Learning.

References

Apeldoorn, D., Dockhorn, A.: Exception-tolerant hierarchical knowledge bases for forward model learning. IEEE Trans. Games 1–14 (2020). https://doi.org/10.1109/TG.2020.3008002
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. SMC-13(5), 834–846 (1983). https://doi.org/10.1109/TSMC.1983.6313077
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324
Article MATH Google Scholar
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym (2016)
Google Scholar
Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012). https://doi.org/10.1109/TCIAIG.2012.2186810
Article Google Scholar
Clary, P., Morais, P., Fern, A., Hurst, J.: Monte-Carlo planning for agile legged locomotion. In: Proceedings International Conference on Automated Planning and Scheduling, ICAPS 2018-June (Icaps), pp. 446–450 (2018)
Google Scholar
Dearden, A., Demiris, Y.: Learning forward models for robots. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 1440–1445 (2005)
Google Scholar
Dockhorn, A.: Prediction-based search for autonomous game-playing. Ph.D. thesis, Otto-von-Guericke-Universität Magdeburg, Fakultät für Informatik (2020). https://doi.org/10.25673/34014
Dockhorn, A., Apeldoorn, D.: Forward model approximation for general video game learning. In: Proceedings of the 2018 IEEE Conference on Computational Intelligence and Games (CIG’18), pp. 425–432. IEEE (2018). https://doi.org/10.1109/CIG.2018.8490411
Dockhorn, A., Doell, C., Hewelt, M., Kruse, R.: A decision heuristic for Monte Carlo tree search Doppelkopf agents. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE (2017). https://doi.org/10.1109/SSCI.2017.8285181
Dockhorn, A., Kruse, R.: Detecting sensor dependencies for building complementary model ensembles. In: Proceedings of the 28. Workshop Computational Intelligence, Dortmund, 29–30 Nov 2018, pp. 217–234 (2018)
Google Scholar
Dockhorn, A., Kruse, R.: Forward model learning for motion control tasks. In: 2020 IEEE 10th International Conference on Intelligent Systems (IS), pp. 1–5. IEEE (2020). https://doi.org/10.1109/IS48319.2020.9199978
Dockhorn, A., Lucas, S.M., Volz, V., Bravi, I., Gaina, R.D., Perez-Liebana, D.: Learning local forward models on unforgiving games. In: 2019 IEEE Conference on Games (CoG), pp. 1–4. IEEE, London (2019). https://doi.org/10.1109/CIG.2019.8848044
Dockhorn, A., Tippelt, T., Kruse, R.: Model decomposition for forward model approximation. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1751–1757. IEEE (2018). https://doi.org/10.1109/SSCI.2018.8628624
Gaina, R.D., Liu, J., Lucas, S.M., Pérez-Liébana, D.: Analysis of vanilla rolling horizon evolution parameters in general video game playing. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10199, pp. 418–434. Springer (2017). https://doi.org/10.1007/978-3-319-55849-3_28
Gaina, R.D., Lucas, S.M., Perez-Liebana, D.: Rolling horizon evolution enhancements in general video game playing. In: 2017 IEEE Conference on Computational Intelligence and Games (CIG), pp. 88–95. IEEE (2017). https://doi.org/10.1109/CIG.2017.8080420
Gu, S., Lillicrap, T., Sutskever, U., Levine, S.: Continuous deep q-learning with model-based acceleration. In: 33rd International Conference on Machine Learning, ICML 2016, vol. 6, pp. 4135–4148 (2016)
Google Scholar
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 2450–2462. Curran Associates, Inc. (2018)
Google Scholar
Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to Control: Learning Behaviors by Latent Imagination (2019). http://arxiv.org/abs/1912.01603
Henaff, M., Whitney, W.F., LeCun, Y.: Model-based planning with discrete and continuous actions (2017)
Google Scholar
Howard, R.: Dynamic Programming and Markov Processes. Technology Press Research Monographs. Technology Press of Massachusetts Institute of Technology (1960)
Google Scholar
Lucas, S.M., Dockhorn, A., Volz, V., Bamford, C., Gaina, R.D., Bravi, I., Perez-Liebana, D., Mostaghim, S., Kruse, R.: A local approach to forward model learning: results on the game of life game. In: 2019 IEEE Conference on Games (CoG), pp. 1–8. IEEE (2019). https://doi.org/10.1109/CIG.2019.8848002
Maschler, M., Solan, E., Zamir, S.: Game Theory. Cambridge University Press, Cambridge (2013). https://doi.org/10.1017/CBO9780511794216
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.a., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Nguyen-Tuong, D., Peters, J.: Model learning for robot control: a survey. Cognit. Process. 12(4), 319–340 (2011). https://doi.org/10.1007/s10339-011-0404-1
Article Google Scholar
Perez Liebana, D., Dieskau, J., Hunermund, M., Mostaghim, S., Lucas, S.: Open loop search for general video game playing. In: Proceedings of the 2015 on Genetic and Evolutionary Computation Conference—GECCO ’15, pp. 337–344. ACM Press, New York, NY, USA (2015). https://doi.org/10.1145/2739480.2754811
Perez-Liebana, D., Lucas, S.M., Gaina, R.D., Togelius, J., Khalifa, A., Liu, J.: General Video Game Artificial Intelligence, vol. 3. Morgan & Claypool Publishers (2019). https://gaigresearch.github.io/gvgaibook/
Racanière, S., Weber, T., Reichert, D.P., Buesing, L., Guez, A., Rezende, D., Badia, A.P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Hassabis, D., Silver, D., Wierstra, D.: Imagination-augmented agents for deep reinforcement learning. Advances in Neural Information Processing Systems 2017-Decem (Nips), pp. 5691–5702 (2017)
Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. University Press Group Limited, New York (2006)
MATH Google Scholar
Rubinstein, R.Y., Kroese, D.P.: Simulation and the Monte Carlo Method. Wiley (2016). https://doi.org/10.1002/9781118631980
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961
Article Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988). https://doi.org/10.1007/BF00115009
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning, 2nd edn. The MIT Press, Cambridge (2018)
MATH Google Scholar
Szita, I., Lorincz, A.: Learning Tetris using the noisy cross-entropy method. Neural Comput. 18(12), 2936–2941 (2006). https://doi.org/10.1162/neco.2006.18.12.2936
Article MATH Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992). https://doi.org/10.1007/bf00992698
Article MATH Google Scholar
Weber, T., Racanière, S., Reichert, D.P., Buesing, L., Guez, A., Rezende, D.J., Badia, A.P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Silver, D., Wierstra, D.: Imagination-Augmented Agents for Deep Reinforcement Learning (2017). http://arxiv.org/abs/1707.06203
Yannakakis, G.N., Togelius, J.: Artificial Intelligence and Games. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-63519-4

Download references

Author information

Authors and Affiliations

Queen Mary University, London, UK
Alexander Dockhorn
Otto von Guericke University, Magdeburg, Germany
Rudolf Kruse

Authors

Alexander Dockhorn
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Kruse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Dockhorn .

Editor information

Editors and Affiliations

Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
Vassil Sgurev
University of Library Studies and Information Technologies, Sofia, Bulgaria
Vladimir Jotsov
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dockhorn, A., Kruse, R. (2022). Balancing Exploration and Exploitation in Forward Model Learning. In: Sgurev, V., Jotsov, V., Kacprzyk, J. (eds) Advances in Intelligent Systems Research and Innovation. Studies in Systems, Decision and Control, vol 379. Springer, Cham. https://doi.org/10.1007/978-3-030-78124-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-78124-8_1
Published: 03 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78123-1
Online ISBN: 978-3-030-78124-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics