Example-guided learning of stochastic human driving policies using deep reinforcement learning

Emuna, Ran; Duffney, Rotem; Borowsky, Avinoam; Biess, Armin

doi:10.1007/s00521-022-07947-2

Example-guided learning of stochastic human driving policies using deep reinforcement learning

S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots
Published: 23 December 2022

Volume 35, pages 16791–16804, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Ran Emuna¹,
Rotem Duffney¹,
Avinoam Borowsky¹ &
…
Armin Biess ORCID: orcid.org/0000-0002-0087-3675¹

508 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Deep reinforcement learning has been successfully applied to the generation of goal-directed behavior in artificial agents. However, existing algorithms are often not designed to reproduce human-like behavior, which may be desired in many environments, such as human–robot collaborations, social robotics and autonomous vehicles. Here we introduce a model-free and easy-to-implement deep reinforcement learning approach to mimic the stochastic behavior of a human expert by learning distributions of task variables from examples. As tractable use-cases, we study static and dynamic obstacle avoidance tasks for an autonomous vehicle on a highway road in simulation (Unity). Our control algorithm receives a feedback signal from two sources: a deterministic (handcrafted) part encoding basic task goals and a stochastic (data-driven) part that incorporates human expert knowledge. Gaussian processes are used to model human state distributions and to assess the similarity between machine and human behavior. Using this generic approach, we demonstrate that the learning agent acquires human-like driving skills and can generalize to new roads and obstacle distributions unseen during training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Reinforcement and Imitation Learning for Self-driving Tasks

Policy-Approximation Based Deep Reinforcement Learning Techniques: An Overview

Enhancing Navigational Performance with Holistic Deep-Reinforcement-Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://github.com/emunaran/stochastic-human-driving-policies-drl
Video 1: Generalization I: Comparison of the performance of the vRL- and mRL-agent in an obstacle avoidance task on road track 1 with random obstacle distribution.
Video 2: Generalization II: Testing the vRL-agent in an obstacle avoidance task on road track 1 with three different obstacle distributions (A:random, B: Gaussian, C: Batch).
Video 3: Testing the vRL-agent in an overtaking task on road track 2. The same learning algorithm (PPO) and network architecture (except for the input layers) as for the obstacle avoidance task were used.

References

Li Y (2017) Deep reinforcement learning: an overview. arXiv preprint arXiv:1701.07274.
François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J et al (2018) An introduction to deep reinforcement learning. Found Trends Mach Learn 11(3–4):219–354
Article MATH Google Scholar
Heess N, TB D, Sriram S, Lemmon J, Merel J, Wayne G, Tassa Y, Erez T, Wang Z, Eslami S et al. (2017) Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286
Schwarting W, Pierson A, Alonso-Mora J, Karaman S, Rus D (2019) Social behavior for autonomous vehicles. Proc Natl Acad Sci 116(50):24972–24978
Article MathSciNet MATH Google Scholar
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning 37:1889–1897
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems. p 2672–2680
Ho J, Ermon S (2016) Generative adversarial imitation learning. In: Advances in neural information processing systems, vol 26. p 4565–4573
Ranney TA (1994) Models of driving behavior: a review of their evolution. Accid Anal Prev 26(6):733–750
Article Google Scholar
Fuller R (2005) Towards a general theory of driver behaviour. Accid Anal Prev 37(3):461–472
Article Google Scholar
Plöchl M, Edelmann J (2007) Driver models in automobile dynamics application. Veh Syst Dyn 45(7–8):699–741
Article Google Scholar
Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot 37:362–386
Fridman L, Brown DE, Glazer M, Angell W, Dodd S, Jenik B, Terwilliger J, Patsekin A, Kindelsberger J, Ding L et al (2019) MIT advanced vehicle technology study: large-scale naturalistic driving study of driver behavior and interaction with automation. IEEE Access 7:102021–102038
Article Google Scholar
Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst
Kuutti S, Bowden R, Jin Y, Barber P, Fallah S (2020) A survey of deep learning applications to autonomous vehicle control. IEEE Trans Intell Transp Syst 22(2):712–733
Zhu Z, Zhao H (2021) A survey of deep rl and il for autonomous driving policy learning. IEEE Trans Intell Transp Syst
Peng XB, Abbeel P, Levine S, van de Panne M (2018) Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Graph (TOG) 37(4):143
Google Scholar
Lu C, Wang H, Lv C, Gong J, Xi J, Cao D (2018) Learning driver-specific behavior for overtaking: a combined learning framework. IEEE Trans Veh Technol 67(8):6788–6802
Article Google Scholar
Zhu M, Wang X, Wang Y (2018) Human-like autonomous car-following model with deep reinforcement learning. Transport Res Part C 97:348–368
Article Google Scholar
Osa T, Pajarinen J, Neumann G, Bagnell JA, Abbeel P, Peters J et al (2018) An algorithmic perspective on imitation learning. Founda Trends Robot 7(1–2):1–179
Google Scholar
Ng A.Y, Russell SJ (2000) et al. (2000) Algorithms for inverse reinforcement learning. ICML '00: Proceedings of the Seventeenth International Conference on Machine Learning, 663–670
Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. ICML '04: Proceedings of the twenty-first International Conference on Machine Learning, 2004
Kuderer M, Gulati S, Burgard W (2015) Learning driving styles for autonomous vehicles from demonstration. In: 2015 IEEE International Conference on Robotics and Automation (ICRA). p 2641–2646. https://doi.org/10.1109/ICRA.2015.7139555
Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with gaussian processes. In: Advances in Neural Information Processing Systems vol 24. p 19–27
Levine S, Koltun V (2012) Continuous inverse optimal control with locally optimal examples. arXiv preprint arXiv:1206.4617
Udacity: (2017) Udacity’s self-driving car simulator. https://github.com/udacity/self-driving-car-sim
Udacity: (2017) Self-driving car engineer nanodegree program. https://github.com/udacity/CarND-Path-Planning-Project
Leung K, Schmerling E, Pavone M (2016) Distributional prediction of human driving behaviours using mixture density networks. Stanford University, Stanford
Google Scholar
Borrelli F, Falcone P, Keviczky T, Asgari J, Hrovat D (2005) MPC-based approach to active steering for autonomous vehicle systems. Int J Veh Auton Syst 3(2):265–291
Article Google Scholar
Kong J, Pfeiffer M, Schildbach G, Borrelli F (2015) Kinematic and dynamic vehicle models for autonomous driving control design. In: 2015 IEEE Intelligent vehicles symposium (IV), 1094–1099
Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, Cambridge
MATH Google Scholar
Bishop C.M (1994) Mixture density networks. Neural Computing Research Group Report: NCRG/94/004
Zolna K, Reed S, Novikov A, Colmenarej SG, Budden D, Cabi S, Denil M, de Freitas N, Wang Z (2019) Task-relevant adversarial imitation learning. arXiv preprint arXiv:1910.01077
Peng XB, Kanazawa A, Toyer S, Abbeel P, Levine S (2018) Variational discriminator bottleneck: Improving imitation learning, inverse RL, and GANs by constraining information flow. arXiv preprint arXiv:1810.00821
Wang R, Ciliberto C, Amadori PV, Demiris Y (2019) Random expert distillation: Imitation learning via expert policy support estimation. In: International Conference on Machine Learning, PMLR Vol 97. p 6536–6544
Cobbe K, Klimov O, Hesse C, Kim T, Schulman J (2018) Quantifying generalization in reinforcement learning. arXiv preprint arXiv:1812.02341

Download references

Funding

This research was supported in part by the Helmsley Charitable Trust through the Agricultural, Biological and Cognitive Robotics Initiative and by the Marcus Endowment Fund both at Ben-Gurion University of the Negev. This research was supported by the Israel Science Foundation (Grant No. 1627/17): Leona M. and Harry B. Helmsley Charitable Trust.

Author information

Authors and Affiliations

Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, 8410501, Beer-Sheva, Israel
Ran Emuna, Rotem Duffney, Avinoam Borowsky & Armin Biess

Authors

Ran Emuna
View author publications
You can also search for this author inPubMed Google Scholar
Rotem Duffney
View author publications
You can also search for this author inPubMed Google Scholar
Avinoam Borowsky
View author publications
You can also search for this author inPubMed Google Scholar
Armin Biess
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Armin Biess.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Hyperparameters

see Table 4.

Table 4 PPO and GAIL hyperparameters. Note that GAIL makes use of PPO

Full size table

Appendix B: Dynamic batch size update

The (Boolean) parameter $\textit{completed}\_{rounds}$ indicates whether the agent completed two rounds or not. The parameter $\textit{max}\_length$ is the maximum number of steps the agent has taken in a row from the start of an episode to its end. The $\textit{rem}$ parameter is the number of steps remaining to reach twice

the $\textit{max}\_length$. The $\textit{rem}$ guarantees that the new batch (B) can be divided into equal mini-batches (MB). B and MB were initialized to 512 and 256, respectively.

Appendix C: Visual turing test

Video A: human, video B: robot.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Emuna, R., Duffney, R., Borowsky, A. et al. Example-guided learning of stochastic human driving policies using deep reinforcement learning. Neural Comput & Applic 35, 16791–16804 (2023). https://doi.org/10.1007/s00521-022-07947-2

Download citation

Received: 19 January 2022
Accepted: 12 October 2022
Published: 23 December 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00521-022-07947-2

Keywords

Part of a collection:

S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots (vol 35, issue 22)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Example-guided learning of stochastic human driving policies using deep reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Reinforcement and Imitation Learning for Self-driving Tasks

Policy-Approximation Based Deep Reinforcement Learning Techniques: An Overview

Enhancing Navigational Performance with Holistic Deep-Reinforcement-Learning

Explore related subjects

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Hyperparameters

Appendix B: Dynamic batch size update

Appendix C: Visual turing test

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now