Progressive Pretext Task Learning for Human Trajectory Prediction

Lin, Xiaotong; Liang, Tianming; Lai, Jianhuang; Hu, Jian-Fang

doi:10.1007/978-3-031-73404-5_12

Xiaotong Lin¹³,
Tianming Liang¹³,
Jianhuang Lai^13,14,15 &
…
Jian-Fang Hu^13,14,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15088))

Included in the following conference series:

European Conference on Computer Vision

169 Accesses
1 Citations

Abstract

Human trajectory prediction is a practical task of predicting the future positions of pedestrians on the road, which typically covers all temporal ranges from short-term to long-term within a trajectory. However, existing works attempt to address the entire trajectory prediction with a singular, uniform training paradigm, neglecting the distinction between short-term and long-term dynamics in human trajectories. To overcome this limitation, we introduce a novel Progressive Pretext Task learning (PPT) framework, which progressively enhances the model’s capacity of capturing short-term dynamics and long-term dependencies for the final entire trajectory prediction. Specifically, we elaborately design three stages of training tasks in the PPT framework. In the first stage, the model learns to comprehend the short-term dynamics through a stepwise next-position prediction task. In the second stage, the model is further enhanced to understand long-term dependencies through a destination prediction task. In the final stage, the model aims to address the entire future trajectory task by taking full advantage of the knowledge from previous stages. To alleviate the knowledge forgetting, we further apply a cross-task knowledge distillation. Additionally, we design a Transformer-based trajectory predictor, which is able to achieve highly efficient two-step reasoning by integrating a destination-driven prediction strategy and a group of learnable prompt embeddings. Extensive experiments on popular benchmarks have demonstrated that our proposed approach achieves state-of-the-art performance with high efficiency. Code is available at https://github.com/iSEE-Laboratory/PPT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Pedestrian Trajectory Prediction Based on Improved Diffusion with Fourier Embeddings

PreTraM: Self-supervised Pre-training via Connecting Trajectory and Map

PreCLN: Pretrained-based contrastive learning network for vehicle trajectory prediction

Article 17 November 2022

References

Bae, I., Jeon, H.-G.: A set of control points conditioned pedestrian trajectory prediction. Proc. AAAI Conf. Artif. Intell. 37(5), 6155–6165 (2023)
Google Scholar
Bae, I., Oh, J., Jeon, H.G.: Eigentrajectory: low-rank descriptors for multi-modal trajectory forecasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
Google Scholar
Bae, I., Park, J.H., Jeon, H.G.: Non-probability sampling network for stochastic human trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6477–6487 (2022)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Google Scholar
Choi, C., Choi, J.H., Li, J., Malla, S.: Shared cross-modal trajectory prediction for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 244–253 (2021)
Google Scholar
Foka, A.F., Trahanias, P.E.: Probabilistic autonomous robot navigation in dynamic environments with human motion prediction. Int. J. Soc. Robot. 2, 79–94 (2010)
Article Google Scholar
Fu, H., Zheng, W., Meng, X., Wang, X., Wang, C., Ma, H.: You do not need additional priors or regularizers in retinex-based low-light image enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18125–18134 (2023)
Google Scholar
Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision. pp. 1134–1142 (2015)
Google Scholar
Giuliari, F., Hasan, I., Cristani, M., Galasso, F.: Transformer networks for trajectory forecasting. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10335–10342. IEEE (2021)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014)
Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: International Conference on Machine Learning, pp. 1462–1471. PMLR (2015)
Google Scholar
Gu, T., et al.: Stochastic trajectory prediction via motion indeterminacy diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17113–17122 (2022)
Google Scholar
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2255–2264 (2018)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Komodakis, N., Gidaris, S.: Attend refine repeat: active box proposal generation via in-out localization. In: BMVC (2016)
Google Scholar
Kosaraju, V., Sadeghian, A., Martín-Martín, R., Reid, I., Rezatofighi, H., Savarese, S.: Social-bigat: multimodal trajectory forecasting using bicycle-GAN and graph attention networks. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Leal-Taixé, L., Fenzi, M., Kuznetsova, A., Rosenhahn, B., Savarese, S.: Learning an image-based motion context for multiple people tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3542–3549 (2014)
Google Scholar
Lee, M., Sohn, S.S., Moon, S., Yoon, S., Kapadia, M., Pavlovic, V.: MUSE-VAE: multi-scale VAE for environment-aware long term trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2221–2230 (2022)
Google Scholar
Levinson, J., et al.: Towards fully autonomous driving: systems and algorithms. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 163–168. IEEE (2011)
Google Scholar
Li, L.L., et al.: End-to-end contextual perception and prediction with interaction transformer. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5784–5791. IEEE (2020)
Google Scholar
Liang, Z., Li, C., Zhou, S., Feng, R., Loy, C.C.: Iterative prompt learning for unsupervised backlit image enhancement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8094–8103 (2023)
Google Scholar
Luo, Y., Cai, P., Bera, A., Hsu, D., Lee, W.S., Manocha, D.: Porca: modeling and planning for autonomous driving among many pedestrians. IEEE Robot. Automat. Lett. 3(4), 3418–3425 (2018)
Google Scholar
Ma, T., Nie, Y., Long, C., Zhang, Q., Li, G.: Progressively generating better initial guesses towards next stages for high-quality human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6437–6446 (2022)
Google Scholar
Mangalam, K., An, Y., Girase, H., Malik, J.: From goals, waypoints and paths to long term human trajectory forecasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15233–15242 (2021)
Google Scholar
Mangalam, K., et al.: It is not the journey but the destination: endpoint conditioned trajectory prediction. In: ECCV 2020, Part II 16, pp. 759–776. Springer (2020)
Google Scholar
Mao, W., Xu, C., Zhu, Q., Chen, S., Wang, Y.: Leapfrog diffusion model for stochastic trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5517–5526 (2023)
Google Scholar
Mohamed, A., Qian, K., Elhoseiny, M., Claudel, C.: Social-stgcnn: a social spatio-temporal graph convolutional neural network for human trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14424–14432 (2020)
Google Scholar
Najibi, M., Rastegari, M., Davis, L.S.: G-cnn: an iterative grid based object detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2369–2377 (2016)
Google Scholar
Park, S.H., et al.: Diverse and admissible trajectory forecasting through multimodal context understanding. In: ECCV 2020, Part XI 16, pp. 282–298. Springer (2020)
Google Scholar
Pellegrini, S., Ess, A., Schindler, K., Van Gool, L.: You’ll never walk alone: modeling social behavior for multi-target tracking. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 261–268. IEEE (2009)
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Google Scholar
Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: ECCV 2016, Part VIII 14, pp. 549–565. Springer (2016)
Google Scholar
Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Rezatofighi, H., Savarese, S.: Sophie: an attentive GAN for predicting paths compliant to social and physical constraints. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1349–1358 (2019)
Google Scholar
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: ECCV 2020, Part XVIII 16, pp. 683–700. Springer (2020)
Google Scholar
Shi, L., et al.: SGCN: sparse graph convolution network for pedestrian trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8994–9003 (2021)
Google Scholar
Shi, L., Wang, L., Zhou, S., Hua, G.: Trajectory unified transformer for pedestrian trajectory prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9675–9684 (2023)
Google Scholar
Song, H., Ding, W., Chen, Y., Shen, S., Wang, M.Y., Chen, Q.: Pip: planning-informed trajectory prediction for autonomous driving. In: ECCV 2020, Part XXI 16, pp. 598–614. Springer (2020)
Google Scholar
Sun, J., Li, Y., Fang, H.S., Lu, C.: Three steps to multimodal trajectory prediction: modality clustering, classification and synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13250–13259 (2021)
Google Scholar
Tang, J., Sun, J., Lin, X., Zheng, W.S., Hu, J.F., et al.: Temporal continual learning with prior compensation for human motion prediction. Adv. Neural Inf. Process. Syst. 36 (2024)
Google Scholar
Tang, Jianwei, Wang, Jieming, Hu, Jian-Fang.: Predicting human poses via recurrent attention network. Visual Intell. 1(1) (2023). https://doi.org/10.1007/s44267-023-00020-z
Tsao, L.-W., Wang, Y.-K., Lin, H.-S., Shuai, H.-H., Wong, L.-K., Cheng, W.-H.: Social-SSL: self-supervised cross-sequence representation learning based on transformers for multi-agent trajectory prediction. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, Part XXII, pp. 234–250. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_14
Valera, M., Velastin, S.A.: Intelligent distributed surveillance systems: a review. IEE Proc. Vision Image Signal Process. 152(2), 192–204 (2005)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Wang, Q., et al.: Learning deep transformer models for machine translation. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL (2019)
Google Scholar
Wong, C., et al.: View vertically: a hierarchical network for trajectory prediction via Fourier spectrums. In: European Conference on Computer Vision, pp. 682–700. Springer (2022)
Google Scholar
Wong, C., Xia, B., Peng, Q., Yuan, W., You, X.: MSN: multi-style network for trajectory prediction. IEEE Trans. Intell. Transp. Syst. 24(9), 9751–9766 (2023)
Article Google Scholar
Xie, J., et al.: Pedestrian trajectory prediction based on social interactions learning with random weights. IEEE Trans. Multimedia (2024)
Google Scholar
Xu, C., Mao, W., Zhang, W., Chen, S.: Remember intentions: retrospective-memory-based trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6488–6497 (2022)
Google Scholar
Xu, P., Hayet, J.-B., Karamouzas, I.: SocialVAE: human trajectory prediction using timewise latents. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, Part IV, pp. 511–528. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19772-7_30
Xu, S., Wang, Y.-X., Gui, L.-Y.: Diverse human motion prediction guided by multi-level spatial-temporal anchors. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, Part XXII, pp. 251–269. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_15
Yi, S., Li, H., Wang, X.: Understanding pedestrian behaviors from stationary crowd groups. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3488–3496 (2015)
Google Scholar
Yu, C., Ma, X., Ren, J., Zhao, H., Yi, S.: Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XII, pp. 507–523. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_30
Yuan, Y., Kitani, K.: DLow: diversifying latent flows for diverse human motion prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part IX, pp. 346–364. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_20
Yuan, Y., Weng, X., Ou, Y., Kitani, K.M.: Agentformer: agent-aware transformers for socio-temporal multi-agent forecasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9813–9823 (2021)
Google Scholar
Yue, J., Manocha, D., Wang, H.: Human trajectory prediction via neural social physics. In: European Conference on Computer Vision, pp. 376–394. Springer (2022)
Google Scholar
Zhao, H., Wildes, R.P.: Where are you heading? dynamic trajectory prediction with expert goal examples. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7629–7638 (2021)
Google Scholar

Download references

Acknowledgements

This work was supported partially by the NSFC (U21A20-471, U22A2095, 62076260, 61772570), Guangdong Natural Science Funds Project (2023B1515040025), Guangdong NSF for Distinguished Young Scholar (2022B15-15020009), Guangdong Provincial Key Laboratory of Information Security Technology (2023B1212060026), and Guangzhou Science and Technology Plan Project (202201011134).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Xiaotong Lin, Tianming Liang, Jianhuang Lai & Jian-Fang Hu
Guangdong Province Key Laboratory of Information Security Technology, Guangzhou, China
Jianhuang Lai & Jian-Fang Hu
Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Beijing, China
Jianhuang Lai & Jian-Fang Hu

Authors

Xiaotong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Tianming Liang
View author publications
You can also search for this author in PubMed Google Scholar
Jianhuang Lai
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Fang Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian-Fang Hu .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, X., Liang, T., Lai, J., Hu, JF. (2025). Progressive Pretext Task Learning for Human Trajectory Prediction. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15088. Springer, Cham. https://doi.org/10.1007/978-3-031-73404-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-73404-5_12
Published: 30 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73403-8
Online ISBN: 978-3-031-73404-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Progressive Pretext Task Learning for Human Trajectory Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Pedestrian Trajectory Prediction Based on Improved Diffusion with Fourier Embeddings

PreTraM: Self-supervised Pre-training via Connecting Trajectory and Map

PreCLN: Pretrained-based contrastive learning network for vehicle trajectory prediction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Progressive Pretext Task Learning for Human Trajectory Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Pedestrian Trajectory Prediction Based on Improved Diffusion with Fourier Embeddings

PreTraM: Self-supervised Pre-training via Connecting Trajectory and Map

PreCLN: Pretrained-based contrastive learning network for vehicle trajectory prediction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation