Abstract
With the improvements in AI technology and sensor performance, research on automated driving has become increasingly popular. However, most studies are based on human driving styles. In this study, we consider an environment in which only autonomous vehicles are present. In such an environment, it is essential to develop an appropriate control method that actively utilizes the characteristics of autonomous vehicles, such as dense information exchange and highly accurate vehicle control. To address this issue, we investigated the emergence of automatic driving rules using reinforcement learning based on information from surrounding vehicles using inter-vehicle communication. We evaluated whether reinforcement learning converges in a situation where distance sensor information can be shared in real-time using vehicle-to-vehicle communication and whether reinforcement learning can learn a rational driving method. The simulation results show a positive trend in the cumulative rewards value, and it indicates that the proposed multi-agent learning method with an extended own-vehicle environment has the potential to learn automated vehicle control with cooperative behavior automatically. Furthermore, we analyzed whether a rational driving method (action selection) can be learned by reinforcement learning. The simulation results showed that reinforcement learning achieves rational control of the overtaking behavior.
Similar content being viewed by others
References
François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning. Foundations and Trends® in Machine Learning 11(3-4), 219–354 . https://doi.org/10.1561/2200000071
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: J. Dy, A. Krause (eds.) Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, pp. 1861–1870. PMLR . https://proceedings.mlr.press/v80/haarnoja18b.html. Accessed 13 Dec 2022
Haas JK (2014) A history of the unity game engine. Worcester Polytechnic Institute, vol 483. p 484
Hausknecht M, Stone P (2015) Deep recurrent Q-learning for partially observable MDPs. In: 2015 aaai fall symposium series
Juliani A, Berges VP, Teng E, Cohen A, Harper J, Elion C, Goy C, Gao Y, Henry H, Mattar M. Lange D (2018) Unity: A general platform for intelligent agents . https://doi.org/10.48550/ARXIV.1809.02627. https://arxiv.org/abs/1809.02627
Kishi Y, Cao W, Mukai M (2022) Study on the formulation of vehicle merging problems for model predictive control. Artif Life Robot. https://doi.org/10.1007/s10015-022-00751-0
Kuutti S, Bowden R, Jin Y, Barber P, Fallah S (2021) A survey of deep learning applications to autonomous vehicle control. IEEE Trans Intell Transp Syst 22(2):712–733. https://doi.org/10.1109/TITS.2019.2962338
Li Z, Du Y, Zhu M, Zhou S, Zhang L (2021) A survey of 3d object detection algorithms for intelligent vehicles development. Artif Life Robot 27(1):115–122. https://doi.org/10.1007/s10015-021-00711-0
Marcano M, Díaz S, Pérez J, Irigoyen E (2020) A review of shared control for automated vehicles: theory and applications. IEEE Trans Hum Mach Syst 50(6):475–491. https://doi.org/10.1109/THMS.2020.3017748
Miki T, Lee J, Hwangbo J, Wellhausen L, Koltun V, Hutter M (2022) Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci Robot 7(62):eabk2822. https://doi.org/10.1126/scirobotics.abk2822
Mirhoseini A, Goldie A, Yazgan M, Jiang J, Songhori E, Wang S, Lee Y.J, Johnson E, Pathak O, Bae S, Nazi A, Pak J, Tong A, Srinivasa K, Hang W, Tuncer E, Babu A, Le Q.V, Laudon J, Ho R, Carpenter R, Dean J (2020) Chip placement with deep reinforcement learning . https://doi.org/10.48550/ARXIV.2004.10746. https://arxiv.org/abs/2004.10746
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Mohri K, Yamamoto M, Uchiyama T (2019) Application topics of amorphous wire CMOS IC magneto-impedance micromagnetic sensors for i-o-t smart society. J Sens 2019:1–8. https://doi.org/10.1155/2019/8285240
Ogawa I, Yokoyama S, Yamashita T, Kawamura H, Sakatoku A, Yanagaihara T, Tanaka H (2017) Proposal of cooperative learning to realize motion control of rc cars group by deep q-network. Proceedings of the Annual Conference of JSAI JSAI2017, 3I2OS13b5–3I2OS13b5 . https://doi.org/10.11517/pjsai.JSAI2017.0_3I2OS13b5. Accessed 13 Dec 2022
Ogawa I, Yokoyama S, Yanashita T, Kawamura H, Sakatoku A, Yanagihara T, Ogishi T, Tanaka H (2018) Efficiency of traffic flow with mutual concessions of autonomous cars using deep q-network. Proceedings of the Annual Conference of JSAI JSAI2018, 3Z204–3Z204 . https://doi.org/10.11517/pjsai.JSAI2018.0_3Z204
Open AI, Berner C, Brockman G, Chan B, Cheung V, Dbiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C, Józefowicz R, Gray S, Olsson C, Pachocki J, Petrov M, Pinto HPdO, Raiman J, Salimans T, Schlatter J, Schneider J, Sidor S, Sutskever I, Tang J, Wolski F, Zhang S (2019) Dota 2 with large scale deep reinforcement learning . https://doi.org/10.48550/ARXIV.1912.06680. https://arxiv.org/abs/1912.06680
Pal A, Philion J, Liao YH, Fidler S (2021) Emergent road rules in multi-agent driving environments. In: International Conference on Learning Representations. https://openreview.net/forum?id=d8Q1mt2Ghw. Accessed 13 Dec 2022
Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: J. Dy, A. Krause (eds) Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, pp. 4295–4304. PMLR . https://proceedings.mlr.press/v80/rashid18a.html
Rashid T, Samvelyan M, Schroeder de Witt C, Farquhar G, Foerster JN, Whiteson S (2020) Monotonic value function factorisation for deep multi-agent reinforcement learning. J Mach Learn Res. https://doi.org/10.5555/3455716.3455894
Schulman J, Moritz P, Levine S, Jordan M, Abbeel P (2015) High-dimensional continuous control using generalized advantage estimation . https://doi.org/10.48550/ARXIV.1506.02438. https://arxiv.org/abs/1506.02438
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms . https://doi.org/10.48550/ARXIV.1707.06347. https://arxiv.org/abs/1707.06347
Shimada H, Yamaguchi A, Takada H, Sato K (2015) Implementation and evaluation of local dynamic map in safety driving systems. J Transp Technol 05(02):102–112. https://doi.org/10.4236/jtts.2015.52010
Streck A (2021) Reinforcement learning a self-driving car ai in unity . https://towardsdatascience.com/reinforcement-learning-a-self-driving-car-ai-in-unity-60b0e7a10d9e. Accessed 13 Dec 2022
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press
Witt CS, Gupta T, Makoviichuk D, Makoviychuk V, Torr PHS, Sun M, Whiteson S (2020) Is independent learning all you need in the starcraft multi-agent challenge? . https://doi.org/10.48550/ARXIV.2011.09533. https://arxiv.org/abs/2011.09533
Yu C, Velu A, Vinitsky E, Wang Y, Bayen A, Wu Y (2021) The surprising effectiveness of ppo in cooperative, multi-agent games . https://doi.org/10.48550/ARXIV.2103.01955. https://arxiv.org/abs/2103.01955
Zhang K, Yang Z, Başar T (2021) Multi-agent reinforcement learning: a selective overview of theories and algorithms. Handbook of reinforcement learning and control. Springer, pp 321–384
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was presented in part at the joint symposium of the 27th International Symposium on Artificial Life and Robotics, the 7th International Symposium on BioComplexity, and the 5th International Symposium on Swarm Behavior and Bio-Inspired Robotics (Online, January 25–27, 2022).
About this article
Cite this article
Harada, T., Matsuoka, J. & Hattori, K. Behavior analysis of emergent rule discovery for cooperative automated driving using deep reinforcement learning. Artif Life Robotics 28, 31–42 (2023). https://doi.org/10.1007/s10015-022-00839-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-022-00839-7