Abstract
Variational autoencoder (VAE) has widely been utilized for modeling data distributions because it is theoretically elegant, easy to train, and has nice manifold representations. However, when applied to image reconstruction and synthesis tasks, VAE shows the limitation that the generated sample tends to be blurry. We observe that a similar problem, in which the generated trajectory is located between adjacent lanes, often arises in VAE-based trajectory forecasting models. To mitigate this problem, we introduce a hierarchical latent structure into the VAE-based forecasting model. Based on the assumption that the trajectory distribution can be approximated as a mixture of simple distributions (or modes), the low-level latent variable is employed to model each mode of the mixture and the high-level latent variable is employed to represent the weights for the modes. To model each mode accurately, we condition the low-level latent variable using two lane-level context vectors computed in novel ways, one corresponds to vehicle-lane interaction and the other to vehicle-vehicle interaction. The context vectors are also used to model the weights via the proposed mode selection network. To evaluate our forecasting model, we use two large-scale real-world datasets. Experimental results show that our model is not only capable of generating clear multi-modal trajectory distributions but also outperforms the state-of-the-art (SOTA) models in terms of prediction accuracy. Our code is available at https://github.com/d1024choi/HLSTrajForecast.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representation (2015)
Bhattacharyya, A., Schiele, B., Fritz, M.: Accurate and diverse sampling of sequences based on a best-of-many sample objective. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Jozefowicz, R., Bengio, S.: Generating sentences from a continuous space. arXiv:1511.06349 (2015)
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Casas, S., Gulino, C., Suo, S., Luo, K., Liao, R., Urtasun, R.: Implicit latent variable model for scene-consistent motion forecasting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 624–641. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_37
Casas, S., Gulino, C., Suo, S., Urtasun, R.: The importance of prior knowledge in precise multimodal prediction. In: International Conference on Intelligent Robots and Systems (2020)
Chang, M.F., et al.: Argoverse: 3D tracking and forecasting with rich maps. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Cui, A., Sadat, A., Casas, S., Liao, R., Urtasun, R.: Lookout: diverse multi-future prediction and planning for self-driving. In: International Conference on Computer Vision (2021)
Cui, H., et al.: Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In: IEEE International Conference on Robotics and Automation (2019)
Fang, L., Jiang, Q., Shi, J., Zhou, B.: TPNet: trajectory proposal network for motion prediction. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Fu, H., Li, C., Liu, X., Gao, J., Celikyilmaz, A., Carin, L.: Cyclical annealing schedule: a simple approach to mitigating KL vanishing. In: NAACL (2019)
Gao, J., et al.: VectorNet: encoding HD maps and agent dynamics from vectorized representation. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing System (2014)
Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning and Representation (2017)
Huang, H., Li, Z., He, R., Sun, Z., Tan, T.: IntroVAE: introspective variational autoencoders for photographic image synthesis. In: Advances in Neural Information Processing System (2018)
Kim, B., et al.: LaPred: lane-aware prediction of multi-modal future trajectories of dynamic agents. In: IEEE Conference on Computer Vision Pattern Recognition (2021)
Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improved variational inference with inverse autoregressive flow. In: Advances in Neural Information Processing System (2016)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv:1312.6114 (2013)
Larsen, A.B.L., Sonderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: International Conference on Learning Representation (2016)
Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H.S., Chan, M.: Desire: Distant future prediction in dynamic scenes with interacting agents. In: IEEE Conference on Computer Vision on Pattern Recognition (2017)
Li, J., Yang, F., Ma, H., Malla, S., Tomizuka, M., Choi, C.: Rain: reinforced hybrid attention inference network for motion forecasting. In: Interantional Conference on Computer Vision (2021)
Liang, M., et al.: learning lane graph representations for motion forecasting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 541–556. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_32
Luo, C., Sun, L., Dabiri, D., Yuille, A.: Probabilistic multi-modal trajectory prediction with lane attention for autonomous vehicles. In: IEEE Conference on Intelligent Robots System (2020)
Messaoud, K., Deo, N., Trivedi, M.M., Nashashibi, F.: Trajectory prediction for autonomous driving based on multi-head attention with joint agent-map representation. arXiv:2005.02545 (2020)
Narayanan, S., Moslemi, R., Pittaluga, F., Liu, B., Chandraker, M.: Divide-and-conquer for lane-aware diverse trajectory prediction. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)
P-Minh, T., Grigore, E.C., Boulton, F.A., Beijbom, O., Wolff, E.M.: CoverNet: multimodal behavior prediction using trajectory sets. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Razavi, A., Oord, A., Poole, B., Vinyals, O.: Preventing posterior collapse with delta-VAEs. In: International Conference on Learning Representation (2019)
Rezende, D.J., Mohamad, S.: Variational inference with normalizing flows. In: International Conference on Machine Learning (2015)
Rhinehart, N., Kitani, K.M., Vernaza, P.: R2p2: a reparameterized pushforward policy for diverse, precise generative path forecasting. In: European Conference on Computer Vision (2018)
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 683–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_40
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing System (2015)
Vahdat, A., Kautz, J.: NVAE: a deep hierarchical variational autoencoder. In: Advances in Neural Information Processing System (2020)
Yang, Z., Hu, Z., Salakhutdinov, R., B.-Kirkpatrick, T.: Improved variational autoencoders for text modeling using dilated convolutions. In: International Conference on Machine Learning (2017)
Yuan, Y., Weng, X., Ou, Y., Kitani, K.: AgentFormer: agent-aware transformers for socio-temporal multi-agent forecasting. arXiv:2103.14023 (2021)
Zhao, S., Song, J., Ermon, S.: InfoVAE: information maximizing variational autoencoders. In: arXiv:1706.02262 (2017)
Zhao, S., Song, J., Ermon, S.: Towards a deeper understanding of variational autoencoding models. In: arXiv:1702.08658v1 (2017)
Acknowledgment
This research work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIP) (No. 2020-0-00002, Development of standard SW platform-based autonomous driving technology to solve social problems of mobility and safety for public transport-marginalized communities)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Choi, D., Min, K. (2022). Hierarchical Latent Structure for Multi-modal Vehicle Trajectory Forecasting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13682. Springer, Cham. https://doi.org/10.1007/978-3-031-20047-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-20047-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20046-5
Online ISBN: 978-3-031-20047-2
eBook Packages: Computer ScienceComputer Science (R0)