Abstract
Forward model learning algorithms enable the application of simulation-based search methods in environments for which the forward model is unknown. Multiple studies have shown great performance in game-related and motion control applications. In these, forward model learning agents often required less training time while achieving a similar performance than state-of-the-art reinforcement learning methods. However, several problems can emerge when replacing the environment’s true model with a learned approximation. While the true forward model allows the accurate prediction of future time-steps, a learned forward model may always be inaccurate in its prediction. These inaccuracies become problematic when planning long action sequences since the confidence in predicted time-steps reduces with increasing depth of the simulation. In this work, we explore methods for balancing risk and reward in decision-making using inaccurate forward models. Therefore, we propose methods for measuring the variance of a forward model and the confidence in the predicted outcome of planned action sequences. Based on these metrics, we define methods for learning and using forward models under consideration of their current prediction accuracy. Proposed methods have been tested in various motion control tasks of the Open AI Gym framework. Results show that the information on the model’s accuracy can be used to increase the efficiency of the agent’s training and the agent’s performance during evaluation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Apeldoorn, D., Dockhorn, A.: Exception-tolerant hierarchical knowledge bases for forward model learning. IEEE Trans. Games 1–14 (2020). https://doi.org/10.1109/TG.2020.3008002
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. SMC-13(5), 834–846 (1983). https://doi.org/10.1109/TSMC.1983.6313077
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym (2016)
Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012). https://doi.org/10.1109/TCIAIG.2012.2186810
Clary, P., Morais, P., Fern, A., Hurst, J.: Monte-Carlo planning for agile legged locomotion. In: Proceedings International Conference on Automated Planning and Scheduling, ICAPS 2018-June (Icaps), pp. 446–450 (2018)
Dearden, A., Demiris, Y.: Learning forward models for robots. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 1440–1445 (2005)
Dockhorn, A.: Prediction-based search for autonomous game-playing. Ph.D. thesis, Otto-von-Guericke-Universität Magdeburg, Fakultät für Informatik (2020). https://doi.org/10.25673/34014
Dockhorn, A., Apeldoorn, D.: Forward model approximation for general video game learning. In: Proceedings of the 2018 IEEE Conference on Computational Intelligence and Games (CIG’18), pp. 425–432. IEEE (2018). https://doi.org/10.1109/CIG.2018.8490411
Dockhorn, A., Doell, C., Hewelt, M., Kruse, R.: A decision heuristic for Monte Carlo tree search Doppelkopf agents. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE (2017). https://doi.org/10.1109/SSCI.2017.8285181
Dockhorn, A., Kruse, R.: Detecting sensor dependencies for building complementary model ensembles. In: Proceedings of the 28. Workshop Computational Intelligence, Dortmund, 29–30 Nov 2018, pp. 217–234 (2018)
Dockhorn, A., Kruse, R.: Forward model learning for motion control tasks. In: 2020 IEEE 10th International Conference on Intelligent Systems (IS), pp. 1–5. IEEE (2020). https://doi.org/10.1109/IS48319.2020.9199978
Dockhorn, A., Lucas, S.M., Volz, V., Bravi, I., Gaina, R.D., Perez-Liebana, D.: Learning local forward models on unforgiving games. In: 2019 IEEE Conference on Games (CoG), pp. 1–4. IEEE, London (2019). https://doi.org/10.1109/CIG.2019.8848044
Dockhorn, A., Tippelt, T., Kruse, R.: Model decomposition for forward model approximation. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1751–1757. IEEE (2018). https://doi.org/10.1109/SSCI.2018.8628624
Gaina, R.D., Liu, J., Lucas, S.M., Pérez-Liébana, D.: Analysis of vanilla rolling horizon evolution parameters in general video game playing. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10199, pp. 418–434. Springer (2017). https://doi.org/10.1007/978-3-319-55849-3_28
Gaina, R.D., Lucas, S.M., Perez-Liebana, D.: Rolling horizon evolution enhancements in general video game playing. In: 2017 IEEE Conference on Computational Intelligence and Games (CIG), pp. 88–95. IEEE (2017). https://doi.org/10.1109/CIG.2017.8080420
Gu, S., Lillicrap, T., Sutskever, U., Levine, S.: Continuous deep q-learning with model-based acceleration. In: 33rd International Conference on Machine Learning, ICML 2016, vol. 6, pp. 4135–4148 (2016)
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 2450–2462. Curran Associates, Inc. (2018)
Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to Control: Learning Behaviors by Latent Imagination (2019). http://arxiv.org/abs/1912.01603
Henaff, M., Whitney, W.F., LeCun, Y.: Model-based planning with discrete and continuous actions (2017)
Howard, R.: Dynamic Programming and Markov Processes. Technology Press Research Monographs. Technology Press of Massachusetts Institute of Technology (1960)
Lucas, S.M., Dockhorn, A., Volz, V., Bamford, C., Gaina, R.D., Bravi, I., Perez-Liebana, D., Mostaghim, S., Kruse, R.: A local approach to forward model learning: results on the game of life game. In: 2019 IEEE Conference on Games (CoG), pp. 1–8. IEEE (2019). https://doi.org/10.1109/CIG.2019.8848002
Maschler, M., Solan, E., Zamir, S.: Game Theory. Cambridge University Press, Cambridge (2013). https://doi.org/10.1017/CBO9780511794216
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.a., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Nguyen-Tuong, D., Peters, J.: Model learning for robot control: a survey. Cognit. Process. 12(4), 319–340 (2011). https://doi.org/10.1007/s10339-011-0404-1
Perez Liebana, D., Dieskau, J., Hunermund, M., Mostaghim, S., Lucas, S.: Open loop search for general video game playing. In: Proceedings of the 2015 on Genetic and Evolutionary Computation Conference—GECCO ’15, pp. 337–344. ACM Press, New York, NY, USA (2015). https://doi.org/10.1145/2739480.2754811
Perez-Liebana, D., Lucas, S.M., Gaina, R.D., Togelius, J., Khalifa, A., Liu, J.: General Video Game Artificial Intelligence, vol. 3. Morgan & Claypool Publishers (2019). https://gaigresearch.github.io/gvgaibook/
Racanière, S., Weber, T., Reichert, D.P., Buesing, L., Guez, A., Rezende, D., Badia, A.P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Hassabis, D., Silver, D., Wierstra, D.: Imagination-augmented agents for deep reinforcement learning. Advances in Neural Information Processing Systems 2017-Decem (Nips), pp. 5691–5702 (2017)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. University Press Group Limited, New York (2006)
Rubinstein, R.Y., Kroese, D.P.: Simulation and the Monte Carlo Method. Wiley (2016). https://doi.org/10.1002/9781118631980
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988). https://doi.org/10.1007/BF00115009
Sutton, R.S., Barto, A.G.: Reinforcement Learning, 2nd edn. The MIT Press, Cambridge (2018)
Szita, I., Lorincz, A.: Learning Tetris using the noisy cross-entropy method. Neural Comput. 18(12), 2936–2941 (2006). https://doi.org/10.1162/neco.2006.18.12.2936
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992). https://doi.org/10.1007/bf00992698
Weber, T., Racanière, S., Reichert, D.P., Buesing, L., Guez, A., Rezende, D.J., Badia, A.P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Silver, D., Wierstra, D.: Imagination-Augmented Agents for Deep Reinforcement Learning (2017). http://arxiv.org/abs/1707.06203
Yannakakis, G.N., Togelius, J.: Artificial Intelligence and Games. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-63519-4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Dockhorn, A., Kruse, R. (2022). Balancing Exploration and Exploitation in Forward Model Learning. In: Sgurev, V., Jotsov, V., Kacprzyk, J. (eds) Advances in Intelligent Systems Research and Innovation. Studies in Systems, Decision and Control, vol 379. Springer, Cham. https://doi.org/10.1007/978-3-030-78124-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-78124-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78123-1
Online ISBN: 978-3-030-78124-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)