Skip to main content

Autonomous Navigation Using Model-Based Reinforcement Learning

  • Conference paper
  • First Online:
Advances on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC 2022)

Abstract

Autonomous driving does not yet have an industry-standard approach. One of the currently promising approaches is reinforcement learning. A novel model-based deep reinforcement learning algorithm, called MuZero, is able to perform well in observation spaces with a higher complexity than its predecessors. As a step towards autonomous driving, this paper employs MuZero for racing on unseen race tracks based on LIDAR observations. Furthermore, we propose a modification to the algorithm to support a continuous action space. We compare our continuous version of MuZero with its original, discrete variant since autonomous driving is inherently a continuous control problem. We also compare our results with a current benchmark reinforcement learning algorithm: Proximal Policy Optimization (PPO). A solution is proposed and verified to progressively generate race tracks which results in the MuZero agent being able to achieve high rewards on race tracks that it has never seen. Also, the performance of the continuous variant of MuZero is compared to PPO and discrete MuZero. Results show that both PPO and discrete MuZero achieve similar peak performance, while the latter does this with a much higher data-efficiency. Furthermore, we show that continuous MuZero is able to improve its policy, but stagnates at a lower peak performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Thomas, P., Morris, A., Talbot, R., Fagerlind, H.: Identifying the causes of road crashes in Europe. Ann. Adv. Automot. Med. 57, 13 (2013)

    Google Scholar 

  2. Grigorescu, S., Trasnea, B., Cocias, T., Macesanu, G.: A survey of deep learning techniques for autonomous driving. J. Field Robot. 37, 11 (2019)

    Google Scholar 

  3. Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. 23(6), 4909–4926 (2022)

    Google Scholar 

  4. Wu, J., Huang, Z., Lv, C.: Uncertainty-aware model-based reinforcement learning: methodology and application in autonomous driving. IEEE Trans. Intell. Veh. 1–10 (2022). https://doi.org/10.1109/TIV.2022.3185159

  5. Schrittwieser, J., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)

    Google Scholar 

  6. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR, vol. abs/1707.06347 (2017). arxiv:1707.06347

  7. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (2018)

    Google Scholar 

  8. Altman, E.: Constrained Markov Decision Processes: Stochastic Modeling. Routledge, Milton Park (1999)

    Google Scholar 

  9. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Google Scholar 

  10. Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7

    Chapter  Google Scholar 

  11. Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017)

  12. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2), 235–256 (2002). https://doi.org/10.1023/A:1013689704352

  13. Hoel, C., Driggs-Campbell, K., Wolff, K., Laine, L., Kochenderfer, M.J.: Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans. Intell. Veh. 5(2), 294–305 (2020)

    Article  Google Scholar 

  14. Yang, X., Duvaud, W., Wei, P.: Continuous control for searching and planning with a learned model. arXiv preprint arXiv:2006.07430 (2020)

  15. Couëtoux, A., Hoock, J.-B., Sokolovska, N., Teytaud, O., Bonnard, N.: Continuous upper confidence trees. In: LION 2011: Proceedings of the 5th International Conference on Learning and Intelligent OptimizatioN, Italy, 2011, p. TBA. https://hal.archives-ouvertes.fr/hal-00542673

  16. Moerland, T.M., Broekens, J., Plaat, A., Jonker, C.M.: A0C: alpha zero in continuous action space. arXiv preprint arXiv:1805.09613 (2018)

  17. Chaslot, G., Winands, M., Herik, H., Uiterwijk, J., Bouzy, B.: Progressive strategies for Monte-Carlo tree search. New Math. Nat. Comput. 4, 343–357 (2008)

    Google Scholar 

  18. Henderson, T.: Mit racecar simulator. https://github.com/mit-racecar/racecar_simulator (2017)

  19. Behrens, F.: Procedural race track generation for domain randomization (2020)

    Google Scholar 

  20. Liang, E., et al.: RLlib: abstractions for distributed reinforcement learning. In: International Conference on Machine Learning. PMLR 2018, pp. 3053–3062 (2018)

    Google Scholar 

  21. Tang, Y., Agrawal, S.: Discretizing continuous action space for on-policy optimization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 5981–5988 (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6059

  22. Troch, A., Hoog, J., Vanneste, S., Balemans, D., Latré, S., Hellinckx, P.: Transfer learning in autonomous driving using real-world samples. In: Barolli, L. (ed.) 3PGCIC 2021. LNNS, vol. 343, pp. 237–245. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-89899-1_24

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siemen Herremans .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Herremans, S. et al. (2023). Autonomous Navigation Using Model-Based Reinforcement Learning. In: Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2022. Lecture Notes in Networks and Systems, vol 571. Springer, Cham. https://doi.org/10.1007/978-3-031-19945-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19945-5_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19944-8

  • Online ISBN: 978-3-031-19945-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics