World Model Learning from Demonstrations with Active Inference: Application to Driving Behavior

Wei, Ran; Garcia, Alfredo; McDonald, Anthony; Markkula, Gustav; Engström, Johan; Supeene, Isaac; O’Kelly, Matthew

doi:10.1007/978-3-031-28719-0_9

Ran Wei¹²,
Alfredo Garcia¹²,
Anthony McDonald^12,13,
Gustav Markkula¹⁴,
Johan Engström¹⁵,
Isaac Supeene¹⁵ &
…
Matthew O’Kelly¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1721))

Included in the following conference series:

International Workshop on Active Inference

726 Accesses
5 Citations

Abstract

Active inference proposes a unifying principle for perception and action as jointly minimizing the free energy of an agent’s internal world model. In the active inference literature, world models are typically pre-specified or learned through interacting with an environment. This paper explores the possibility of learning world models of active inference agents from recorded demonstrations, with an application to human driving behavior modeling. The results show that the presented method can create models that generate human-like driving behavior but the approach is sensitive to input features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Autonomous Driving Based on Imitation and Active Inference

Cycle-Consistent World Models for Domain Independent Latent Imagination

DriveDreamer: Towards Real-World-Drive World Models for Autonomous Driving

Notes

1.
Recording 007 from location “DR_CHN_Merging_ZS”.

References

Baker, C., Saxe, R., Tenenbaum, J.: Bayesian theory of mind: modeling joint belief-desire attribution. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 33 (2011)
Google Scholar
Bhattacharyya, R., et al.: Modeling human driving behavior through generative adversarial imitation learning. arXiv preprint arXiv:2006.06412 (2020)
Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. J. Math. Psychol. 99, 102447 (2020)
Article MathSciNet MATH Google Scholar
De Haan, P., Jayaraman, D., Levine, S.: Causal confusion in imitation learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Engström, J., et al.: Great expectations: a predictive processing account of automobile driving. Theor. Issues Ergon. Sci. 19(2), 156–194 (2018)
Article Google Scholar
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: a process theory. Neural Comput. 29(1), 1–49 (2017)
Article MathSciNet MATH Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Google Scholar
Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: model-based policy optimization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Karkus, P., Hsu, D., Lee, W.S.: QMDP-Net: deep learning for planning under partial observability. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Kujala, T., Lappi, O.: Inattention and uncertainty in the predictive brain. Front. Neuroergon. 2, 718699 (2021)
Article Google Scholar
Kwon, M., Daptardar, S., Schrater, P.R., Pitkow, X.: Inverse rational control with partially observable continuous nonlinear dynamics. In: Advances in Neural Information Processing Systems, vol. 33, pp. 7898–7909 (2020)
Google Scholar
Lambert, N., Amos, B., Yadan, O., Calandra, R.: Objective mismatch in model-based reinforcement learning. arXiv preprint arXiv:2002.04523 (2020)
Leurent, E.: An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env (2018)
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning policies for partially observable environments: scaling up. In: Machine Learning Proceedings 1995, pp. 362–370. Elsevier (1995)
Google Scholar
Makino, T., Takeuchi, J.: Apprenticeship learning for model parameters of partially observable environments. arXiv preprint arXiv:1206.6484 (2012)
Markkula, G., Boer, E., Romano, R., Merat, N.: Sustained sensorimotor control as intermittent decisions about prediction errors: computational framework and application to ground vehicle steering. Biol. Cybern. 112(3), 181–207 (2018)
Article MATH Google Scholar
Markkula, G., Engström, J., Lodin, J., Bärgman, J., Victor, T.: A farewell to brake reaction times? kinematics-dependent brake response in naturalistic rear-end emergencies. Accid. Anal. Prev. 95, 209–226 (2016)
Article Google Scholar
McDonald, A.D., et al.: Toward computational simulations of behavior during automated driving takeovers: a review of the empirical and modeling literatures. Hum. Factors 61(4), 642–688 (2019)
Article Google Scholar
McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Ng, A.Y., Russell, S.J., et al.: Algorithms for inverse reinforcement learning. In: Icml, vol. 1, p. 2 (2000)
Google Scholar
Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making with information-processing costs. Proc. R. Soc. A: Math. Phys. Eng. Sci. 469(2153), 20120683 (2013)
Article MathSciNet MATH Google Scholar
Osa, T., Pajarinen, J., Neumann, G., Bagnell, J.A., Abbeel, P., Peters, J.: An algorithmic perspective on imitation learning. Found. Trends Rob. 7, 1–179 (2018)
Article Google Scholar
Reddy, S., Dragan, A., Levine, S.: Where do you think you’re going?: inferring beliefs about dynamics from behavior. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635. JMLR Workshop and Conference Proceedings (2011)
Google Scholar
Salvucci, D.D., Gray, R.: A two-point visual control model of steering. Perception 33(10), 1233–1248 (2004)
Article Google Scholar
Schwartenbeck, P., et al.: Optimal inference with suboptimal models: addiction and active Bayesian inference. Med. Hypotheses 84(2), 109–117 (2015)
Article MathSciNet Google Scholar
Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Tishby, N., Polani, D.: Information Theory of Decisions and Actions, pp. 601–636. Springer, New York (2011)
Google Scholar
Tschantz, A., Baltieri, M., Seth, A.K., Buckley, C.L.: Scaling active inference. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)
Google Scholar
Tschantz, A., Seth, A.K., Buckley, C.L.: Learning action-oriented models through active inference. PLoS Comput. Biol. 16(4), e1007805 (2020)
Article Google Scholar
Wei, R., McDonald, A.D., Garcia, A., Alambeigi, H.: Modeling driver responses to automation failures with active inference. IEEE Trans. Intell. Transp. Syst. (2022)
Google Scholar
Zhan, W., et al.: Interaction dataset: an international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps. arXiv preprint arXiv:1910.03088 (2019)
Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K., et al.: Maximum entropy inverse reinforcement learning. In: AAAI, vol. 8, pp. 1433–1438. Chicago, IL, USA (2008)
Google Scholar

Download references

Acknowledgements

Support for this research was provided in part by grants from the U.S. Department of Transportation, University Transportation Centers Program to the Safety through Disruption University Transportation Center (451453-19C36), the U.S. Army Research Office (W911NF2210213), and the U.K. Engineering and Physical Sciences Research Council (EP/S005056/1). The role of the Waymo employees in the project is solely consulting including making suggestions and helping set the technical direction.

Author information

Authors and Affiliations

Texas A &M University, College Station, USA
Ran Wei, Alfredo Garcia & Anthony McDonald
University of Wisconsin, Madison, USA
Anthony McDonald
University of Leeds, Leeds, UK
Gustav Markkula
Waymo LLC, Mountain View, USA
Johan Engström, Isaac Supeene & Matthew O’Kelly

Authors

Ran Wei
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Anthony McDonald
View author publications
You can also search for this author in PubMed Google Scholar
Gustav Markkula
View author publications
You can also search for this author in PubMed Google Scholar
Johan Engström
View author publications
You can also search for this author in PubMed Google Scholar
Isaac Supeene
View author publications
You can also search for this author in PubMed Google Scholar
Matthew O’Kelly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ran Wei .

Editor information

Editors and Affiliations

University of Sussex, Brighton, UK
Christopher L. Buckley
University of Chieti-Pescara, Pescara, Italy
Daniela Cialfi
Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
Pablo Lanillos
Wellcome Centre for Human Neuroimaging, London, UK
Maxwell Ramstead
Wellcome Centre for Human Neuroimaging, London, UK
Noor Sajid
Kyoto University, Kyoto, Japan
Hideaki Shimazaki
Ghent University, Ghent, Belgium
Tim Verbelen

A Appendix

1.1 A.1 Dataset

We used the INTERACTION dataset [32], a publicly available naturalistic driving dataset recorded with drone footage of fixed road segments, to fit a model of highway car-following behavior. Each recording in the dataset consists of the positions, velocities, and headings of all vehicles in the road segment at a sampling frequency 10 Hz. Specifically, we used a subset of the data^{Footnote 1} due to the abundance of car-following trajectories and relatively complex road geometry with road curvature and merging lanes. We defined car-following as the trajectory segments from the initial appearance of a vehicle to either an ego lane-change or the disappearance of the lead vehicle. Reducing the dataset using this definition resulted in a total of 1027 car-following trajectories with an average duration of 13 s and standard deviation of 8.7 s. We obtained driver control actions (i.e., longitudinal and lateral accelerations) by taking the derivative of the velocities of each trajectory. We then created a set of held-out trajectories for testing purposes by first categorizing all trajectories into four clusters based on their kinematic profiles using UMAP [19] and sampled 15% of the trajectories from each cluster.

1.2 A.2 Behavior Cloning Agent

The behavior cloning agents consist of a recurrent neural network with a single gated recurrent unit (GRU) layer and a feed-forward neural network. The GRU layer compresses the observation history into a fixed size vector, which is decoded by the feed-forward network into a continuous action distribution model by a multivariate Gaussian distribution. To make the BC agents comparable to the active inference agents, the GRU has 64 hidden units and 30 output units and the feed-forward network has 30 input units, 2 hidden layers with 64 hidden units, and SiLU activation function. We used the same observation vector as input to the BC agents as to the active inference agents.

1.3 A.3 Sample Path

Example sample paths generated by the agents with and without the two-point observation model.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, R. et al. (2023). World Model Learning from Demonstrations with Active Inference: Application to Driving Behavior. In: Buckley, C.L., et al. Active Inference. IWAI 2022. Communications in Computer and Information Science, vol 1721. Springer, Cham. https://doi.org/10.1007/978-3-031-28719-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-28719-0_9
Published: 22 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28718-3
Online ISBN: 978-3-031-28719-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

World Model Learning from Demonstrations with Active Inference: Application to Driving Behavior

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Autonomous Driving Based on Imitation and Active Inference

Cycle-Consistent World Models for Domain Independent Latent Imagination

DriveDreamer: Towards Real-World-Drive World Models for Autonomous Driving

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

1.1 A.1 Dataset

1.2 A.2 Behavior Cloning Agent

1.3 A.3 Sample Path

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

World Model Learning from Demonstrations with Active Inference: Application to Driving Behavior

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Autonomous Driving Based on Imitation and Active Inference

Cycle-Consistent World Models for Domain Independent Latent Imagination

DriveDreamer: Towards Real-World-Drive World Models for Autonomous Driving

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Dataset

1.2 A.2 Behavior Cloning Agent

1.3 A.3 Sample Path

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation