Abstract
Active inference proposes a unifying principle for perception and action as jointly minimizing the free energy of an agent’s internal world model. In the active inference literature, world models are typically pre-specified or learned through interacting with an environment. This paper explores the possibility of learning world models of active inference agents from recorded demonstrations, with an application to human driving behavior modeling. The results show that the presented method can create models that generate human-like driving behavior but the approach is sensitive to input features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Recording 007 from location “DR_CHN_Merging_ZS”.
References
Baker, C., Saxe, R., Tenenbaum, J.: Bayesian theory of mind: modeling joint belief-desire attribution. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 33 (2011)
Bhattacharyya, R., et al.: Modeling human driving behavior through generative adversarial imitation learning. arXiv preprint arXiv:2006.06412 (2020)
Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. J. Math. Psychol. 99, 102447 (2020)
De Haan, P., Jayaraman, D., Levine, S.: Causal confusion in imitation learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Engström, J., et al.: Great expectations: a predictive processing account of automobile driving. Theor. Issues Ergon. Sci. 19(2), 156–194 (2018)
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: a process theory. Neural Comput. 29(1), 1–49 (2017)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: model-based policy optimization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Karkus, P., Hsu, D., Lee, W.S.: QMDP-Net: deep learning for planning under partial observability. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Kujala, T., Lappi, O.: Inattention and uncertainty in the predictive brain. Front. Neuroergon. 2, 718699 (2021)
Kwon, M., Daptardar, S., Schrater, P.R., Pitkow, X.: Inverse rational control with partially observable continuous nonlinear dynamics. In: Advances in Neural Information Processing Systems, vol. 33, pp. 7898–7909 (2020)
Lambert, N., Amos, B., Yadan, O., Calandra, R.: Objective mismatch in model-based reinforcement learning. arXiv preprint arXiv:2002.04523 (2020)
Leurent, E.: An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env (2018)
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning policies for partially observable environments: scaling up. In: Machine Learning Proceedings 1995, pp. 362–370. Elsevier (1995)
Makino, T., Takeuchi, J.: Apprenticeship learning for model parameters of partially observable environments. arXiv preprint arXiv:1206.6484 (2012)
Markkula, G., Boer, E., Romano, R., Merat, N.: Sustained sensorimotor control as intermittent decisions about prediction errors: computational framework and application to ground vehicle steering. Biol. Cybern. 112(3), 181–207 (2018)
Markkula, G., Engström, J., Lodin, J., Bärgman, J., Victor, T.: A farewell to brake reaction times? kinematics-dependent brake response in naturalistic rear-end emergencies. Accid. Anal. Prev. 95, 209–226 (2016)
McDonald, A.D., et al.: Toward computational simulations of behavior during automated driving takeovers: a review of the empirical and modeling literatures. Hum. Factors 61(4), 642–688 (2019)
McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Ng, A.Y., Russell, S.J., et al.: Algorithms for inverse reinforcement learning. In: Icml, vol. 1, p. 2 (2000)
Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making with information-processing costs. Proc. R. Soc. A: Math. Phys. Eng. Sci. 469(2153), 20120683 (2013)
Osa, T., Pajarinen, J., Neumann, G., Bagnell, J.A., Abbeel, P., Peters, J.: An algorithmic perspective on imitation learning. Found. Trends Rob. 7, 1–179 (2018)
Reddy, S., Dragan, A., Levine, S.: Where do you think you’re going?: inferring beliefs about dynamics from behavior. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635. JMLR Workshop and Conference Proceedings (2011)
Salvucci, D.D., Gray, R.: A two-point visual control model of steering. Perception 33(10), 1233–1248 (2004)
Schwartenbeck, P., et al.: Optimal inference with suboptimal models: addiction and active Bayesian inference. Med. Hypotheses 84(2), 109–117 (2015)
Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Tishby, N., Polani, D.: Information Theory of Decisions and Actions, pp. 601–636. Springer, New York (2011)
Tschantz, A., Baltieri, M., Seth, A.K., Buckley, C.L.: Scaling active inference. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)
Tschantz, A., Seth, A.K., Buckley, C.L.: Learning action-oriented models through active inference. PLoS Comput. Biol. 16(4), e1007805 (2020)
Wei, R., McDonald, A.D., Garcia, A., Alambeigi, H.: Modeling driver responses to automation failures with active inference. IEEE Trans. Intell. Transp. Syst. (2022)
Zhan, W., et al.: Interaction dataset: an international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps. arXiv preprint arXiv:1910.03088 (2019)
Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K., et al.: Maximum entropy inverse reinforcement learning. In: AAAI, vol. 8, pp. 1433–1438. Chicago, IL, USA (2008)
Acknowledgements
Support for this research was provided in part by grants from the U.S. Department of Transportation, University Transportation Centers Program to the Safety through Disruption University Transportation Center (451453-19C36), the U.S. Army Research Office (W911NF2210213), and the U.K. Engineering and Physical Sciences Research Council (EP/S005056/1). The role of the Waymo employees in the project is solely consulting including making suggestions and helping set the technical direction.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Dataset
We used the INTERACTION dataset [32], a publicly available naturalistic driving dataset recorded with drone footage of fixed road segments, to fit a model of highway car-following behavior. Each recording in the dataset consists of the positions, velocities, and headings of all vehicles in the road segment at a sampling frequency 10 Hz. Specifically, we used a subset of the dataFootnote 1 due to the abundance of car-following trajectories and relatively complex road geometry with road curvature and merging lanes. We defined car-following as the trajectory segments from the initial appearance of a vehicle to either an ego lane-change or the disappearance of the lead vehicle. Reducing the dataset using this definition resulted in a total of 1027 car-following trajectories with an average duration of 13 s and standard deviation of 8.7 s. We obtained driver control actions (i.e., longitudinal and lateral accelerations) by taking the derivative of the velocities of each trajectory. We then created a set of held-out trajectories for testing purposes by first categorizing all trajectories into four clusters based on their kinematic profiles using UMAP [19] and sampled 15% of the trajectories from each cluster.
1.2 A.2 Behavior Cloning Agent
The behavior cloning agents consist of a recurrent neural network with a single gated recurrent unit (GRU) layer and a feed-forward neural network. The GRU layer compresses the observation history into a fixed size vector, which is decoded by the feed-forward network into a continuous action distribution model by a multivariate Gaussian distribution. To make the BC agents comparable to the active inference agents, the GRU has 64 hidden units and 30 output units and the feed-forward network has 30 input units, 2 hidden layers with 64 hidden units, and SiLU activation function. We used the same observation vector as input to the BC agents as to the active inference agents.
1.3 A.3 Sample Path
Example sample paths generated by the agents with and without the two-point observation model.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wei, R. et al. (2023). World Model Learning from Demonstrations with Active Inference: Application to Driving Behavior. In: Buckley, C.L., et al. Active Inference. IWAI 2022. Communications in Computer and Information Science, vol 1721. Springer, Cham. https://doi.org/10.1007/978-3-031-28719-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-28719-0_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28718-3
Online ISBN: 978-3-031-28719-0
eBook Packages: Computer ScienceComputer Science (R0)