How Can I See My Future? FvTraj: Using First-Person View for Pedestrian Trajectory Prediction

Bi, Huikun; Zhang, Ruisi; Mao, Tianlu; Deng, Zhigang; Wang, Zhaoqi

doi:10.1007/978-3-030-58571-6_34

How Can I See My Future? FvTraj: Using First-Person View for Pedestrian Trajectory Prediction

Huikun Bi^12,13,
Ruisi Zhang¹⁴,
Tianlu Mao^12,13,
Zhigang Deng¹⁵ &
…
Zhaoqi Wang^12,13

Conference paper
First Online: 09 November 2020

4174 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12352))

Abstract

This work presents a novel First-person View based Trajectory predicting model (FvTraj) to estimate the future trajectories of pedestrians in a scene given their observed trajectories and the corresponding first-person view images. First, we render first-person view images using our in-house built First-person View Simulator (FvSim), given the ground-level 2D trajectories. Then, based on multi-head attention mechanisms, we design a social-aware attention module to model social interactions between pedestrians, and a view-aware attention module to capture the relations between historical motion states and visual features from the first-person view images. Our results show the dynamic scene contexts with ego-motions captured by first-person view images via FvSim are valuable and effective for trajectory prediction. Using this simulated first-person view images, our well structured FvTraj model achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Amirian, J., Hayet, J.B., Pettré, J.: Social ways: learning multi-modal distributions of pedestrian trajectories with GANs. In: Proceedings of IEEE Conference on Computer Vision Pattern Recognition Workshops (CVPRW) (2019)
Google Scholar
Antonini, G., Bierlaire, M., Weber, M.: Discrete choice models of pedestrian walking behavior. Transp. Res. Part B: Methodol. 40(8), 667–687 (2006)
Article Google Scholar
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Bagautdinov, T., Alahi, A., Fleuret, F., Fua, P., Savarese, S.: Social scene understanding: end-to-end multi-person action localization and collective activity recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4315–4324 (2017)
Google Scholar
Bertasius, G., Chan, A., Shi, J.: Egocentric basketball motion planning from a single first-person image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Bi, H., Fang, Z., Mao, T., Wang, Z., Deng, Z.: Joint prediction for kinematic trajectories in vehicle-pedestrian-mixed scenes. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)
Choi, C., Dariush, B.: Looking to relations for future trajectory forecast. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, F.F.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, Florida, USA, 20–25 June 2009 (2009)
Google Scholar
Felsen, P., Lucey, P., Ganguly, S.: Where will they go? Predicting fine-grained adversarial multi-agent motion using conditional variational autoencoders. In: The European Conference on Computer Vision (ECCV), September 2018
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Hasan, I., Setti, F., Tsesmelis, T., Del Bue, A., Galasso, F., Cristani, M.: Mx-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51(5), 4282 (1995)
Article Google Scholar
Huang, Y., Bi, H., Li, Z., Mao, T., Wang, Z.: STGAT: modeling spatial-temporal interactions for human trajectory prediction. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Ivanovic, B., Pavone, M.: The trajectron: probabilistic multi-agent trajectory modeling with dynamic spatiotemporal graphs. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Johnson, L.A., Higgins, C.M.: A navigation aid for the blind using tactile-visual sensory substitution. In: Proceedings of International Conference on IEEE Engineering in Medicine and Biology Society, pp. 6289–6292 (2006)
Google Scholar
Kang, G., Lim, J., Zhang, B.: Dual attention networks for visual reference resolution in visual dialog. CoRR abs/1902.09368 (2019). http://arxiv.org/abs/1902.09368
Kantorovitch, J., Väre, J., Pehkonen, V., Laikari, A., Seppälä, H.: An assistive household robot-doing more than just cleaning. J. Assistive Technol. 8(2), 64–76 (2014)
Article Google Scholar
Kosaraju, V., Sadeghian, A., Martín-Martín, R., Reid, I., Rezatofighi, S.H., Savarese, S.: Social-bigat: multimodal trajectory forecasting using bicycle-GAN and graph attention networks. arXiv preprint arXiv:1907.03395 (2019)
Lai, G.Y., Chen, K.H., Liang, B.J.: People trajectory forecasting and collision avoidance in first-person viewpoint. In: 2018 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), pp. 1–2. IEEE (2018)
Google Scholar
Lee, J.G., Han, J., Whang, K.Y.: Trajectory clustering: a partition-and-group framework. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 593–604 (2007)
Google Scholar
Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H.S., Chandraker, M.: Desire: distant future prediction in dynamic scenes with interacting agents. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Lerner, A., Chrysanthou, Y., Lischinski, D.: Crowds by example. In: Proceedings of Computer Graphics Forum, vol. 26, pp. 655–664 (2007)
Google Scholar
Leung, T.S., Medioni, G.: Visual navigation aid for the blind in dynamic environments. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 565–572 (2014)
Google Scholar
Liang, J., Jiang, L., Carlos Niebles, J., Hauptmann, A.G., Fei-Fei, L.: Peeking into the future: predicting future person activities and locations in videos. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Ma, Y., Zhu, X., Zhang, S., Yang, R., Wang, W., Manocha, D.: Trafficpredict: trajectory prediction for heterogeneous traffic-agents. In: Proceedings of AAAI Conference on Artificial Intelligence, pp. 6120–6127 (2019)
Google Scholar
Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the oxford robotcar dataset. Int. J. Robot. Res. 36(1), 3–15 (2017)
Article Google Scholar
Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 935–942 (2009)
Google Scholar
Morris, B., Trivedi, M.: Learning trajectory patterns by clustering: experimental studies and comparative evaluation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 312–319 (2009)
Google Scholar
Patil, A., Malla, S., Gang, H., Chen, Y.: The H3D dataset for full-surround 3D multi-object detection and tracking in crowded urban scenes. CoRR abs/1903.01568 (2019). http://arxiv.org/abs/1903.01568
Pellegrini, S., Ess, A., Schindler, K., Van Gool, L.: You’ll never walk alone: modeling social behavior for multi-target tracking. In: Proceedings of IEEE Conference on Computer Vision (ICCV), pp. 261–268 (2009)
Google Scholar
Ramanishka, V., Chen, Y.T., Misu, T., Saenko, K.: Toward driving scene understanding: a dataset for learning driver behavior and causal reasoning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7699–7707 (2018)
Google Scholar
Rios-Martinez, J., Spalanzani, A., Laugier, C.: From proxemics theory to socially-aware navigation: a survey. Int. J. Soc. Robot. 7(2), 137–153 (2015)
Article Google Scholar
Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Rezatofighi, H., Savarese, S.: Sophie: an attentive GAN for predicting paths compliant to social and physical constraints. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Sadeghian, A., Legros, F., Voisin, M., Vesel, R., Alahi, A., Savarese, S.: Car-net: clairvoyant attentive recurrent network. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 151–167 (2018)
Google Scholar
Song, X., et al.: Apollocar3d: a large 3D car instance understanding benchmark for autonomous driving. CoRR abs/1811.12222 (2018). http://arxiv.org/abs/1811.12222
Soo Park, H., Hwang, J.J., Niu, Y., Shi, J.: Egocentric future localization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014). http://arxiv.org/abs/1409.3215
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Vemula, A., Muelling, K., Oh, J.: Social attention: modeling attention in human crowds. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 1–7 (2018)
Google Scholar
Wang, X., Ma, K.T., Ng, G.W., Grimson, W.E.L.: Trajectory analysis and semantic region modeling using nonparametric hierarchical Bayesian models. Int. J. Computer Vision 95(3), 287–312 (2011)
Article Google Scholar
Wang, X., Ma, X., Grimson, W.E.L.: Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 539–555 (2009)
Article Google Scholar
Xu, Y., Piao, Z., Gao, S.: Encoding crowd interaction with deep neural network for pedestrian trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Yagi, T., Mangalam, K., Yonetani, R., Sato, Y.: Future person localization in first-person videos. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Yao, Y., Xu, M., Wang, Y., Crandall, D.J., Atkins, E.M.: Unsupervised traffic accident detection in first-person videos. CoRR abs/1903.00618 (2019). http://arxiv.org/abs/1903.00618
Zhang, P., Ouyang, W., Zhang, P., Xue, J., Zheng, N.: SR-LSTM: state refinement for LSTM towards pedestrian trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Key Research and Development Program of China under Grant 2018AAA0103000, 2017YFC0804900, and 2018YFB1700905, in part by the National Natural Science Foundation of China under Grant 61532002, 61972379, and 61702482. Zhigang Deng was in part supported by US NSF grant IIS-1524782.

Author information

Authors and Affiliations

Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Huikun Bi, Tianlu Mao & Zhaoqi Wang
University of Chinese Academy of Sciences, Beijing, China
Huikun Bi, Tianlu Mao & Zhaoqi Wang
University of Utah, Salt Lake City, USA
Ruisi Zhang
University of Houston, Houston, USA
Zhigang Deng

Authors

Huikun Bi
View author publications
You can also search for this author in PubMed Google Scholar
Ruisi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tianlu Mao
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Deng
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoqi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huikun Bi .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bi, H., Zhang, R., Mao, T., Deng, Z., Wang, Z. (2020). How Can I See My Future? FvTraj: Using First-Person View for Pedestrian Trajectory Prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12352. Springer, Cham. https://doi.org/10.1007/978-3-030-58571-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-58571-6_34
Published: 09 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58570-9
Online ISBN: 978-3-030-58571-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics