Skip to main content
Log in

Behavior analysis of emergent rule discovery for cooperative automated driving using deep reinforcement learning

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

With the improvements in AI technology and sensor performance, research on automated driving has become increasingly popular. However, most studies are based on human driving styles. In this study, we consider an environment in which only autonomous vehicles are present. In such an environment, it is essential to develop an appropriate control method that actively utilizes the characteristics of autonomous vehicles, such as dense information exchange and highly accurate vehicle control. To address this issue, we investigated the emergence of automatic driving rules using reinforcement learning based on information from surrounding vehicles using inter-vehicle communication. We evaluated whether reinforcement learning converges in a situation where distance sensor information can be shared in real-time using vehicle-to-vehicle communication and whether reinforcement learning can learn a rational driving method. The simulation results show a positive trend in the cumulative rewards value, and it indicates that the proposed multi-agent learning method with an extended own-vehicle environment has the potential to learn automated vehicle control with cooperative behavior automatically. Furthermore, we analyzed whether a rational driving method (action selection) can be learned by reinforcement learning. The simulation results showed that reinforcement learning achieves rational control of the overtaking behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning. Foundations and Trends® in Machine Learning 11(3-4), 219–354 . https://doi.org/10.1561/2200000071

  2. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: J. Dy, A. Krause (eds.) Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, pp. 1861–1870. PMLR . https://proceedings.mlr.press/v80/haarnoja18b.html. Accessed 13 Dec 2022

  3. Haas JK (2014) A history of the unity game engine. Worcester Polytechnic Institute, vol 483. p 484

  4. Hausknecht M, Stone P (2015) Deep recurrent Q-learning for partially observable MDPs. In: 2015 aaai fall symposium series

  5. Juliani A, Berges VP, Teng E, Cohen A, Harper J, Elion C, Goy C, Gao Y, Henry H, Mattar M. Lange D (2018) Unity: A general platform for intelligent agents . https://doi.org/10.48550/ARXIV.1809.02627. https://arxiv.org/abs/1809.02627

  6. Kishi Y, Cao W, Mukai M (2022) Study on the formulation of vehicle merging problems for model predictive control. Artif Life Robot. https://doi.org/10.1007/s10015-022-00751-0

    Article  Google Scholar 

  7. Kuutti S, Bowden R, Jin Y, Barber P, Fallah S (2021) A survey of deep learning applications to autonomous vehicle control. IEEE Trans Intell Transp Syst 22(2):712–733. https://doi.org/10.1109/TITS.2019.2962338

    Article  Google Scholar 

  8. Li Z, Du Y, Zhu M, Zhou S, Zhang L (2021) A survey of 3d object detection algorithms for intelligent vehicles development. Artif Life Robot 27(1):115–122. https://doi.org/10.1007/s10015-021-00711-0

    Article  Google Scholar 

  9. Marcano M, Díaz S, Pérez J, Irigoyen E (2020) A review of shared control for automated vehicles: theory and applications. IEEE Trans Hum Mach Syst 50(6):475–491. https://doi.org/10.1109/THMS.2020.3017748

    Article  Google Scholar 

  10. Miki T, Lee J, Hwangbo J, Wellhausen L, Koltun V, Hutter M (2022) Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci Robot 7(62):eabk2822. https://doi.org/10.1126/scirobotics.abk2822

    Article  Google Scholar 

  11. Mirhoseini A, Goldie A, Yazgan M, Jiang J, Songhori E, Wang S, Lee Y.J, Johnson E, Pathak O, Bae S, Nazi A, Pak J, Tong A, Srinivasa K, Hang W, Tuncer E, Babu A, Le Q.V, Laudon J, Ho R, Carpenter R, Dean J (2020) Chip placement with deep reinforcement learning . https://doi.org/10.48550/ARXIV.2004.10746. https://arxiv.org/abs/2004.10746

  12. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  13. Mohri K, Yamamoto M, Uchiyama T (2019) Application topics of amorphous wire CMOS IC magneto-impedance micromagnetic sensors for i-o-t smart society. J Sens 2019:1–8. https://doi.org/10.1155/2019/8285240

    Article  Google Scholar 

  14. Ogawa I, Yokoyama S, Yamashita T, Kawamura H, Sakatoku A, Yanagaihara T, Tanaka H (2017) Proposal of cooperative learning to realize motion control of rc cars group by deep q-network. Proceedings of the Annual Conference of JSAI JSAI2017, 3I2OS13b5–3I2OS13b5 . https://doi.org/10.11517/pjsai.JSAI2017.0_3I2OS13b5. Accessed 13 Dec 2022

  15. Ogawa I, Yokoyama S, Yanashita T, Kawamura H, Sakatoku A, Yanagihara T, Ogishi T, Tanaka H (2018) Efficiency of traffic flow with mutual concessions of autonomous cars using deep q-network. Proceedings of the Annual Conference of JSAI JSAI2018, 3Z204–3Z204 . https://doi.org/10.11517/pjsai.JSAI2018.0_3Z204

  16. Open AI, Berner C, Brockman G, Chan B, Cheung V, Dbiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C, Józefowicz R, Gray S, Olsson C, Pachocki J, Petrov M, Pinto HPdO, Raiman J, Salimans T, Schlatter J, Schneider J, Sidor S, Sutskever I, Tang J, Wolski F, Zhang S (2019) Dota 2 with large scale deep reinforcement learning . https://doi.org/10.48550/ARXIV.1912.06680. https://arxiv.org/abs/1912.06680

  17. Pal A, Philion J, Liao YH, Fidler S (2021) Emergent road rules in multi-agent driving environments. In: International Conference on Learning Representations. https://openreview.net/forum?id=d8Q1mt2Ghw. Accessed 13 Dec 2022

  18. Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: J. Dy, A. Krause (eds) Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, pp. 4295–4304. PMLR . https://proceedings.mlr.press/v80/rashid18a.html

  19. Rashid T, Samvelyan M, Schroeder de Witt C, Farquhar G, Foerster JN, Whiteson S (2020) Monotonic value function factorisation for deep multi-agent reinforcement learning. J Mach Learn Res. https://doi.org/10.5555/3455716.3455894

    Article  MATH  Google Scholar 

  20. Schulman J, Moritz P, Levine S, Jordan M, Abbeel P (2015) High-dimensional continuous control using generalized advantage estimation . https://doi.org/10.48550/ARXIV.1506.02438. https://arxiv.org/abs/1506.02438

  21. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms . https://doi.org/10.48550/ARXIV.1707.06347. https://arxiv.org/abs/1707.06347

  22. Shimada H, Yamaguchi A, Takada H, Sato K (2015) Implementation and evaluation of local dynamic map in safety driving systems. J Transp Technol 05(02):102–112. https://doi.org/10.4236/jtts.2015.52010

    Article  Google Scholar 

  23. Streck A (2021) Reinforcement learning a self-driving car ai in unity . https://towardsdatascience.com/reinforcement-learning-a-self-driving-car-ai-in-unity-60b0e7a10d9e. Accessed 13 Dec 2022

  24. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press

    MATH  Google Scholar 

  25. Witt CS, Gupta T, Makoviichuk D, Makoviychuk V, Torr PHS, Sun M, Whiteson S (2020) Is independent learning all you need in the starcraft multi-agent challenge? . https://doi.org/10.48550/ARXIV.2011.09533. https://arxiv.org/abs/2011.09533

  26. Yu C, Velu A, Vinitsky E, Wang Y, Bayen A, Wu Y (2021) The surprising effectiveness of ppo in cooperative, multi-agent games . https://doi.org/10.48550/ARXIV.2103.01955. https://arxiv.org/abs/2103.01955

  27. Zhang K, Yang Z, Başar T (2021) Multi-agent reinforcement learning: a selective overview of theories and algorithms. Handbook of reinforcement learning and control. Springer, pp 321–384

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kiyohiko Hattori.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was presented in part at the joint symposium of the 27th International Symposium on Artificial Life and Robotics, the 7th International Symposium on BioComplexity, and the 5th International Symposium on Swarm Behavior and Bio-Inspired Robotics (Online, January 25–27, 2022).

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Harada, T., Matsuoka, J. & Hattori, K. Behavior analysis of emergent rule discovery for cooperative automated driving using deep reinforcement learning. Artif Life Robotics 28, 31–42 (2023). https://doi.org/10.1007/s10015-022-00839-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-022-00839-7

Keywords

Navigation