Abstract
In this paper, we present a hierarchical framework for multi-modal trajectory forecasting, which can provide for each pedestrian in the scene the distributions for the next moves at every time step. The overall architecture adopts a standard encoder-decoder paradigm, where the encoder is based on a self-attention mechanism to extract the temporal features of motion histories, while the decoder is built upon a stack of LSTMs to generate the future path sequentially. The model is learned in a discriminative manner, with the purpose of differentiating among varied motion modalities. To this end, we propose a clustering strategy to construct the so-called transformation set. The transformation set collaborates with the hierarchical LSTMs in the decoder, in order to approximate the real distributions in the training data. Experimental results demonstrate that the proposed framework can not only predict the future trajectory accurately, but also provide multi-modal trajectory distributions explicitly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social lstm: Human trajectory prediction in crowded spaces. In the Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 961–971. IEEE (2016)
Amirian, J., Hayet, J.B., Pettre, J.: Social ways: Learning multi-modal distributions of pedestrian trajectories with GANs. In the Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshop, IEEE (2019)
Bartoli, F., Lisanti, G., Ballan, L., Bimbo, A.D.: Context-aware trajectory prediction. In the Proceedings of the IEEE International Conference on Pattern Recognition, pp. 1941–1946 (2018)
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In the Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2255–2264. IEEE (2018)
Hirakawa, T., Yamashita, T., Tamaki, T., Fujiyoshi, H.: Survey on vision-based path prediction. In: Streitz, N., Konomi, S. (eds.) DAPI 2018. LNCS, vol. 10922, pp. 48–64. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91131-1_4
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)
Lawal, I., Poiesi, F., Aguita, D., Cavallaro, A.: Support vector motion clustering. IEEE Trans. Circuits Syst. Video Technol. 27(11), 2395–2408 (2017)
Lee, N., et al.: Desire: Distant future prediction in dynamic scenes with interacting agents. In the Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 336–345. IEEE (2017)
Lerner, A., Chrysanthou, Y., Lischinski, D.: Crowds by example. Comput. Graph. Forum 26, 655–664 (2007)
Pellegrini, S., Ess, A., Schindler, K., Gool, L.V.: You’ll never walk alone: Modeling social behavior for multi-target tracking. In the Proceedings of the IEEE International Conference on Computer Vision, pp. 261–268. IEEE (2009)
Rhinehart, N., Kitani, K., Vernaza, P.: R2p2:a reparameterized pushforward policy for diverse, precise generative path forecasting. In: the Proceedings of the European Conference on Computer Vision, pp. 772–788. (2018)
Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Savarese, S.: Sophie: An attentive GAN for predicting paths compliant to social and physical constraints. In: the Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1349–1358. IEEE (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Vemula, A., Muelling, K., Oh, J.: Social attention: modeling attention in human crowds. In the Proceedings of the IEEE International Conference on Robotics and Automation, pp. 1–7. IEEE (2018)
Zhong, J., Cai, W., Luo, L., Yin, H.: Learning behavior patterns from video: a data-driven framework for agent-based crowd modeling. In: the Proceedings of the International Conference on Autonomous Agents and Multi-agent Systems, pp. 801–809 (2015)
Zhou, B., Tang, X., Wang, X.: Coherent filtering: detecting coherent motions from crowd clutters. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 857–871. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_61
Acknowledgment
This work is supported by the National Natural Science Foundation of China (Grant No. 61702073), and the China Postdoctoral Science Foundation (Grant No. 2019M661079).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, Y., Zhang, B., Conci, N., Liu, H. (2021). A Hierarchical Framework for Motion Trajectory Forecasting Based on Modality Sampling. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12664. Springer, Cham. https://doi.org/10.1007/978-3-030-68799-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-68799-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68798-4
Online ISBN: 978-3-030-68799-1
eBook Packages: Computer ScienceComputer Science (R0)