Skip to main content

World Model Learning from Demonstrations with Active Inference: Application to Driving Behavior

  • Conference paper
  • First Online:
Active Inference (IWAI 2022)

Abstract

Active inference proposes a unifying principle for perception and action as jointly minimizing the free energy of an agent’s internal world model. In the active inference literature, world models are typically pre-specified or learned through interacting with an environment. This paper explores the possibility of learning world models of active inference agents from recorded demonstrations, with an application to human driving behavior modeling. The results show that the presented method can create models that generate human-like driving behavior but the approach is sensitive to input features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Recording 007 from location “DR_CHN_Merging_ZS”.

References

  1. Baker, C., Saxe, R., Tenenbaum, J.: Bayesian theory of mind: modeling joint belief-desire attribution. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 33 (2011)

    Google Scholar 

  2. Bhattacharyya, R., et al.: Modeling human driving behavior through generative adversarial imitation learning. arXiv preprint arXiv:2006.06412 (2020)

  3. Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. J. Math. Psychol. 99, 102447 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  4. De Haan, P., Jayaraman, D., Levine, S.: Causal confusion in imitation learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  5. Engström, J., et al.: Great expectations: a predictive processing account of automobile driving. Theor. Issues Ergon. Sci. 19(2), 156–194 (2018)

    Article  Google Scholar 

  6. Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: a process theory. Neural Comput. 29(1), 1–49 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  7. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)

    Google Scholar 

  8. Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: model-based policy optimization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  9. Karkus, P., Hsu, D., Lee, W.S.: QMDP-Net: deep learning for planning under partial observability. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  10. Kujala, T., Lappi, O.: Inattention and uncertainty in the predictive brain. Front. Neuroergon. 2, 718699 (2021)

    Article  Google Scholar 

  11. Kwon, M., Daptardar, S., Schrater, P.R., Pitkow, X.: Inverse rational control with partially observable continuous nonlinear dynamics. In: Advances in Neural Information Processing Systems, vol. 33, pp. 7898–7909 (2020)

    Google Scholar 

  12. Lambert, N., Amos, B., Yadan, O., Calandra, R.: Objective mismatch in model-based reinforcement learning. arXiv preprint arXiv:2002.04523 (2020)

  13. Leurent, E.: An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env (2018)

  14. Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning policies for partially observable environments: scaling up. In: Machine Learning Proceedings 1995, pp. 362–370. Elsevier (1995)

    Google Scholar 

  15. Makino, T., Takeuchi, J.: Apprenticeship learning for model parameters of partially observable environments. arXiv preprint arXiv:1206.6484 (2012)

  16. Markkula, G., Boer, E., Romano, R., Merat, N.: Sustained sensorimotor control as intermittent decisions about prediction errors: computational framework and application to ground vehicle steering. Biol. Cybern. 112(3), 181–207 (2018)

    Article  MATH  Google Scholar 

  17. Markkula, G., Engström, J., Lodin, J., Bärgman, J., Victor, T.: A farewell to brake reaction times? kinematics-dependent brake response in naturalistic rear-end emergencies. Accid. Anal. Prev. 95, 209–226 (2016)

    Article  Google Scholar 

  18. McDonald, A.D., et al.: Toward computational simulations of behavior during automated driving takeovers: a review of the empirical and modeling literatures. Hum. Factors 61(4), 642–688 (2019)

    Article  Google Scholar 

  19. McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

  20. Ng, A.Y., Russell, S.J., et al.: Algorithms for inverse reinforcement learning. In: Icml, vol. 1, p. 2 (2000)

    Google Scholar 

  21. Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making with information-processing costs. Proc. R. Soc. A: Math. Phys. Eng. Sci. 469(2153), 20120683 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  22. Osa, T., Pajarinen, J., Neumann, G., Bagnell, J.A., Abbeel, P., Peters, J.: An algorithmic perspective on imitation learning. Found. Trends Rob. 7, 1–179 (2018)

    Article  Google Scholar 

  23. Reddy, S., Dragan, A., Levine, S.: Where do you think you’re going?: inferring beliefs about dynamics from behavior. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  24. Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635. JMLR Workshop and Conference Proceedings (2011)

    Google Scholar 

  25. Salvucci, D.D., Gray, R.: A two-point visual control model of steering. Perception 33(10), 1233–1248 (2004)

    Article  Google Scholar 

  26. Schwartenbeck, P., et al.: Optimal inference with suboptimal models: addiction and active Bayesian inference. Med. Hypotheses 84(2), 109–117 (2015)

    Article  MathSciNet  Google Scholar 

  27. Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  28. Tishby, N., Polani, D.: Information Theory of Decisions and Actions, pp. 601–636. Springer, New York (2011)

    Google Scholar 

  29. Tschantz, A., Baltieri, M., Seth, A.K., Buckley, C.L.: Scaling active inference. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)

    Google Scholar 

  30. Tschantz, A., Seth, A.K., Buckley, C.L.: Learning action-oriented models through active inference. PLoS Comput. Biol. 16(4), e1007805 (2020)

    Article  Google Scholar 

  31. Wei, R., McDonald, A.D., Garcia, A., Alambeigi, H.: Modeling driver responses to automation failures with active inference. IEEE Trans. Intell. Transp. Syst. (2022)

    Google Scholar 

  32. Zhan, W., et al.: Interaction dataset: an international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps. arXiv preprint arXiv:1910.03088 (2019)

  33. Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K., et al.: Maximum entropy inverse reinforcement learning. In: AAAI, vol. 8, pp. 1433–1438. Chicago, IL, USA (2008)

    Google Scholar 

Download references

Acknowledgements

Support for this research was provided in part by grants from the U.S. Department of Transportation, University Transportation Centers Program to the Safety through Disruption University Transportation Center (451453-19C36), the U.S. Army Research Office (W911NF2210213), and the U.K. Engineering and Physical Sciences Research Council (EP/S005056/1). The role of the Waymo employees in the project is solely consulting including making suggestions and helping set the technical direction.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ran Wei .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Dataset

We used the INTERACTION dataset [32], a publicly available naturalistic driving dataset recorded with drone footage of fixed road segments, to fit a model of highway car-following behavior. Each recording in the dataset consists of the positions, velocities, and headings of all vehicles in the road segment at a sampling frequency 10 Hz. Specifically, we used a subset of the dataFootnote 1 due to the abundance of car-following trajectories and relatively complex road geometry with road curvature and merging lanes. We defined car-following as the trajectory segments from the initial appearance of a vehicle to either an ego lane-change or the disappearance of the lead vehicle. Reducing the dataset using this definition resulted in a total of 1027 car-following trajectories with an average duration of 13 s and standard deviation of 8.7 s. We obtained driver control actions (i.e., longitudinal and lateral accelerations) by taking the derivative of the velocities of each trajectory. We then created a set of held-out trajectories for testing purposes by first categorizing all trajectories into four clusters based on their kinematic profiles using UMAP [19] and sampled 15% of the trajectories from each cluster.

1.2 A.2 Behavior Cloning Agent

The behavior cloning agents consist of a recurrent neural network with a single gated recurrent unit (GRU) layer and a feed-forward neural network. The GRU layer compresses the observation history into a fixed size vector, which is decoded by the feed-forward network into a continuous action distribution model by a multivariate Gaussian distribution. To make the BC agents comparable to the active inference agents, the GRU has 64 hidden units and 30 output units and the feed-forward network has 30 input units, 2 hidden layers with 64 hidden units, and SiLU activation function. We used the same observation vector as input to the BC agents as to the active inference agents.

1.3 A.3 Sample Path

Example sample paths generated by the agents with and without the two-point observation model.

Fig. 3.
figure 3

Active inference agent sample path comparison.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, R. et al. (2023). World Model Learning from Demonstrations with Active Inference: Application to Driving Behavior. In: Buckley, C.L., et al. Active Inference. IWAI 2022. Communications in Computer and Information Science, vol 1721. Springer, Cham. https://doi.org/10.1007/978-3-031-28719-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28719-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28718-3

  • Online ISBN: 978-3-031-28719-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics