Hierarchical Latent Structure for Multi-modal Vehicle Trajectory Forecasting

Choi, Dooseop; Min, KyoungWook

doi:10.1007/978-3-031-20047-2_8

Dooseop Choi¹² &
KyoungWook Min¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13682))

Included in the following conference series:

European Conference on Computer Vision

4247 Accesses

Abstract

Variational autoencoder (VAE) has widely been utilized for modeling data distributions because it is theoretically elegant, easy to train, and has nice manifold representations. However, when applied to image reconstruction and synthesis tasks, VAE shows the limitation that the generated sample tends to be blurry. We observe that a similar problem, in which the generated trajectory is located between adjacent lanes, often arises in VAE-based trajectory forecasting models. To mitigate this problem, we introduce a hierarchical latent structure into the VAE-based forecasting model. Based on the assumption that the trajectory distribution can be approximated as a mixture of simple distributions (or modes), the low-level latent variable is employed to model each mode of the mixture and the high-level latent variable is employed to represent the weights for the modes. To model each mode accurately, we condition the low-level latent variable using two lane-level context vectors computed in novel ways, one corresponds to vehicle-lane interaction and the other to vehicle-vehicle interaction. The context vectors are also used to model the weights via the proposed mode selection network. To evaluate our forecasting model, we use two large-scale real-world datasets. Experimental results show that our model is not only capable of generating clear multi-modal trajectory distributions but also outperforms the state-of-the-art (SOTA) models in terms of prediction accuracy. Our code is available at https://github.com/d1024choi/HLSTrajForecast.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Autonomous Vehicle Path Prediction Using Conditional Variational Autoencoder Networks

Conditional Variational Autoencoder Networks for Autonomous Vehicle Path Prediction

Article 02 April 2022

Social-CVAE: Pedestrian Trajectory Prediction Using Conditional Variational Auto-Encoder

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representation (2015)
Google Scholar
Bhattacharyya, A., Schiele, B., Fritz, M.: Accurate and diverse sampling of sequences based on a best-of-many sample objective. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Jozefowicz, R., Bengio, S.: Generating sentences from a continuous space. arXiv:1511.06349 (2015)
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Casas, S., Gulino, C., Suo, S., Luo, K., Liao, R., Urtasun, R.: Implicit latent variable model for scene-consistent motion forecasting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 624–641. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_37
Chapter Google Scholar
Casas, S., Gulino, C., Suo, S., Urtasun, R.: The importance of prior knowledge in precise multimodal prediction. In: International Conference on Intelligent Robots and Systems (2020)
Google Scholar
Chang, M.F., et al.: Argoverse: 3D tracking and forecasting with rich maps. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Cui, A., Sadat, A., Casas, S., Liao, R., Urtasun, R.: Lookout: diverse multi-future prediction and planning for self-driving. In: International Conference on Computer Vision (2021)
Google Scholar
Cui, H., et al.: Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In: IEEE International Conference on Robotics and Automation (2019)
Google Scholar
Fang, L., Jiang, Q., Shi, J., Zhou, B.: TPNet: trajectory proposal network for motion prediction. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Fu, H., Li, C., Liu, X., Gao, J., Celikyilmaz, A., Carin, L.: Cyclical annealing schedule: a simple approach to mitigating KL vanishing. In: NAACL (2019)
Google Scholar
Gao, J., et al.: VectorNet: encoding HD maps and agent dynamics from vectorized representation. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing System (2014)
Google Scholar
Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning and Representation (2017)
Google Scholar
Huang, H., Li, Z., He, R., Sun, Z., Tan, T.: IntroVAE: introspective variational autoencoders for photographic image synthesis. In: Advances in Neural Information Processing System (2018)
Google Scholar
Kim, B., et al.: LaPred: lane-aware prediction of multi-modal future trajectories of dynamic agents. In: IEEE Conference on Computer Vision Pattern Recognition (2021)
Google Scholar
Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improved variational inference with inverse autoregressive flow. In: Advances in Neural Information Processing System (2016)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv:1312.6114 (2013)
Larsen, A.B.L., Sonderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: International Conference on Learning Representation (2016)
Google Scholar
Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H.S., Chan, M.: Desire: Distant future prediction in dynamic scenes with interacting agents. In: IEEE Conference on Computer Vision on Pattern Recognition (2017)
Google Scholar
Li, J., Yang, F., Ma, H., Malla, S., Tomizuka, M., Choi, C.: Rain: reinforced hybrid attention inference network for motion forecasting. In: Interantional Conference on Computer Vision (2021)
Google Scholar
Liang, M., et al.: learning lane graph representations for motion forecasting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 541–556. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_32
Luo, C., Sun, L., Dabiri, D., Yuille, A.: Probabilistic multi-modal trajectory prediction with lane attention for autonomous vehicles. In: IEEE Conference on Intelligent Robots System (2020)
Google Scholar
Messaoud, K., Deo, N., Trivedi, M.M., Nashashibi, F.: Trajectory prediction for autonomous driving based on multi-head attention with joint agent-map representation. arXiv:2005.02545 (2020)
Narayanan, S., Moslemi, R., Pittaluga, F., Liu, B., Chandraker, M.: Divide-and-conquer for lane-aware diverse trajectory prediction. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
P-Minh, T., Grigore, E.C., Boulton, F.A., Beijbom, O., Wolff, E.M.: CoverNet: multimodal behavior prediction using trajectory sets. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Razavi, A., Oord, A., Poole, B., Vinyals, O.: Preventing posterior collapse with delta-VAEs. In: International Conference on Learning Representation (2019)
Google Scholar
Rezende, D.J., Mohamad, S.: Variational inference with normalizing flows. In: International Conference on Machine Learning (2015)
Google Scholar
Rhinehart, N., Kitani, K.M., Vernaza, P.: R2p2: a reparameterized pushforward policy for diverse, precise generative path forecasting. In: European Conference on Computer Vision (2018)
Google Scholar
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 683–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_40
Chapter Google Scholar
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing System (2015)
Google Scholar
Vahdat, A., Kautz, J.: NVAE: a deep hierarchical variational autoencoder. In: Advances in Neural Information Processing System (2020)
Google Scholar
Yang, Z., Hu, Z., Salakhutdinov, R., B.-Kirkpatrick, T.: Improved variational autoencoders for text modeling using dilated convolutions. In: International Conference on Machine Learning (2017)
Google Scholar
Yuan, Y., Weng, X., Ou, Y., Kitani, K.: AgentFormer: agent-aware transformers for socio-temporal multi-agent forecasting. arXiv:2103.14023 (2021)
Zhao, S., Song, J., Ermon, S.: InfoVAE: information maximizing variational autoencoders. In: arXiv:1706.02262 (2017)
Zhao, S., Song, J., Ermon, S.: Towards a deeper understanding of variational autoencoding models. In: arXiv:1702.08658v1 (2017)

Download references

Acknowledgment

This research work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIP) (No. 2020-0-00002, Development of standard SW platform-based autonomous driving technology to solve social problems of mobility and safety for public transport-marginalized communities)

Author information

Authors and Affiliations

Artificial Intelligence Research Laboratory, ETRI, Daejeon, South Korea
Dooseop Choi & KyoungWook Min

Authors

Dooseop Choi
View author publications
You can also search for this author in PubMed Google Scholar
KyoungWook Min
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dooseop Choi .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 504 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choi, D., Min, K. (2022). Hierarchical Latent Structure for Multi-modal Vehicle Trajectory Forecasting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13682. Springer, Cham. https://doi.org/10.1007/978-3-031-20047-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-20047-2_8
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20046-5
Online ISBN: 978-3-031-20047-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hierarchical Latent Structure for Multi-modal Vehicle Trajectory Forecasting