Abstract
Autonomous driving does not yet have an industry-standard approach. One of the currently promising approaches is reinforcement learning. A novel model-based deep reinforcement learning algorithm, called MuZero, is able to perform well in observation spaces with a higher complexity than its predecessors. As a step towards autonomous driving, this paper employs MuZero for racing on unseen race tracks based on LIDAR observations. Furthermore, we propose a modification to the algorithm to support a continuous action space. We compare our continuous version of MuZero with its original, discrete variant since autonomous driving is inherently a continuous control problem. We also compare our results with a current benchmark reinforcement learning algorithm: Proximal Policy Optimization (PPO). A solution is proposed and verified to progressively generate race tracks which results in the MuZero agent being able to achieve high rewards on race tracks that it has never seen. Also, the performance of the continuous variant of MuZero is compared to PPO and discrete MuZero. Results show that both PPO and discrete MuZero achieve similar peak performance, while the latter does this with a much higher data-efficiency. Furthermore, we show that continuous MuZero is able to improve its policy, but stagnates at a lower peak performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Thomas, P., Morris, A., Talbot, R., Fagerlind, H.: Identifying the causes of road crashes in Europe. Ann. Adv. Automot. Med. 57, 13 (2013)
Grigorescu, S., Trasnea, B., Cocias, T., Macesanu, G.: A survey of deep learning techniques for autonomous driving. J. Field Robot. 37, 11 (2019)
Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. 23(6), 4909–4926 (2022)
Wu, J., Huang, Z., Lv, C.: Uncertainty-aware model-based reinforcement learning: methodology and application in autonomous driving. IEEE Trans. Intell. Veh. 1–10 (2022). https://doi.org/10.1109/TIV.2022.3185159
Schrittwieser, J., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR, vol. abs/1707.06347 (2017). arxiv:1707.06347
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (2018)
Altman, E.: Constrained Markov Decision Processes: Stochastic Modeling. Routledge, Milton Park (1999)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2), 235–256 (2002). https://doi.org/10.1023/A:1013689704352
Hoel, C., Driggs-Campbell, K., Wolff, K., Laine, L., Kochenderfer, M.J.: Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans. Intell. Veh. 5(2), 294–305 (2020)
Yang, X., Duvaud, W., Wei, P.: Continuous control for searching and planning with a learned model. arXiv preprint arXiv:2006.07430 (2020)
Couëtoux, A., Hoock, J.-B., Sokolovska, N., Teytaud, O., Bonnard, N.: Continuous upper confidence trees. In: LION 2011: Proceedings of the 5th International Conference on Learning and Intelligent OptimizatioN, Italy, 2011, p. TBA. https://hal.archives-ouvertes.fr/hal-00542673
Moerland, T.M., Broekens, J., Plaat, A., Jonker, C.M.: A0C: alpha zero in continuous action space. arXiv preprint arXiv:1805.09613 (2018)
Chaslot, G., Winands, M., Herik, H., Uiterwijk, J., Bouzy, B.: Progressive strategies for Monte-Carlo tree search. New Math. Nat. Comput. 4, 343–357 (2008)
Henderson, T.: Mit racecar simulator. https://github.com/mit-racecar/racecar_simulator (2017)
Behrens, F.: Procedural race track generation for domain randomization (2020)
Liang, E., et al.: RLlib: abstractions for distributed reinforcement learning. In: International Conference on Machine Learning. PMLR 2018, pp. 3053–3062 (2018)
Tang, Y., Agrawal, S.: Discretizing continuous action space for on-policy optimization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 5981–5988 (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6059
Troch, A., Hoog, J., Vanneste, S., Balemans, D., Latré, S., Hellinckx, P.: Transfer learning in autonomous driving using real-world samples. In: Barolli, L. (ed.) 3PGCIC 2021. LNNS, vol. 343, pp. 237–245. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-89899-1_24
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Herremans, S. et al. (2023). Autonomous Navigation Using Model-Based Reinforcement Learning. In: Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2022. Lecture Notes in Networks and Systems, vol 571. Springer, Cham. https://doi.org/10.1007/978-3-031-19945-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-19945-5_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19944-8
Online ISBN: 978-3-031-19945-5
eBook Packages: EngineeringEngineering (R0)