Multi-person Pose Forecasting with Individual Interaction Perceptron and Prior Learning

Xiao, Peng; Xie, Yi; Xu, Xuemiao; Chen, Weihong; Zhang, Huaidong

doi:10.1007/978-3-031-72649-1_23

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15076))

Included in the following conference series:

European Conference on Computer Vision

582 Accesses

Abstract

Human Pose Forecasting is a major problem in human intention comprehension that can be addressed through learning the historical poses via deep methods. However, existing methods often lack the modeling of the person’s role in the event in multi-person scenes. This leads to limited performance in complicated scenes with variant interactions happening at the same time. In this paper, we introduce the Interaction-Aware Pose Forecasting Transformer (IAFormer) framework to better learn the interaction features. With the key insight that the event often involves only part of the people in the scene, we designed the Interaction Perceptron Module (IPM) to evaluate the human-to-event interaction level. With the interaction evaluation, the human-independent features are extracted with the attention mechanism for interaction-aware forecasting. In addition, an Interaction Prior Learning Module (IPLM) is presented to learn and accumulate prior knowledge of high-frequency interactions, encouraging semantic pose forecasting rather than simple trajectory pose forecasting. We conduct experiments using datasets such as CMU-Mocap, UMPM, CHI3D, Human3.6M, and synthesized crowd datasets. The results demonstrate that our method significantly outperforms state-of-the-art approaches considering scenarios with varying numbers of people. Code is available at https://github.com/ArcticPole/ IAFormer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

3D Human Pose Estimation via Non-causal Retentive Networks

ConvFormer: parameter reduction in transformer models for 3D human pose estimation by leveraging dynamic multi-headed convolutional attention

Article 03 July 2023

Towards improvement of baseline performance for regression based human pose estimation

Article 26 May 2023

References

Van der Aa, N., Luo, X., Giezeman, G.J., Tan, R.T., Veltkamp, R.C.: Umpm benchmark: a multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 1264–1269 (2011)
Google Scholar
Adeli, V., et al.: Tripod: Human trajectory and pose dynamics forecasting in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13390–13400 (2021)
Google Scholar
Butepage, J., Black, M.J., Kragic, D., Kjellstrom, H.: Deep representation learning for human motion prediction and classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6158–6166 (2017)
Google Scholar
Chiu, H.k., Adeli, E., Wang, B., Huang, D.A., Niebles, J.C.: Action-agnostic human pose forecasting. In: Proceedings of the IEEE/CVF winter conference on Applications of Computer Vision, pp. 1423–1432 (2019)
Google Scholar
CMU-Graphics-Lab: CMU graphics lab motion capture database (2003). http://mocap.cs.cmu.edu/
Cui, Q., Sun, H.: Towards accurate 3d human motion prediction from incomplete observations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4801–4810 (2021)
Google Scholar
Cui, Q., Sun, H., Yang, F.: Learning dynamic relationships for 3d human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6519–6527 (2020)
Google Scholar
Dang, L., Nie, Y., Long, C., Zhang, Q., Li, G.: MSR-GCN: multi-scale residual graph convolution networks for human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11467–11476 (2021)
Google Scholar
Diller, C., Funkhouser, T., Dai, A.: Forecasting characteristic 3d poses of human actions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15914–15923 (2022)
Google Scholar
Ding, Y., Mao, R., Du, G., Zhang, L.: Clothes-eraser: clothing-aware controllable disentanglement for clothes-changing person re-identification. In: Signal, Image and Video Processing, , pp. 1–12 (2024)
Google Scholar
Ding, Y., Wang, A., Zhang, L.: Multidimensional semantic disentanglement network for clothes-changing person re-identification. In: Proceedings of the 2024 International Conference on Multimedia Retrieval, pp. 1025–1033 (2024)
Google Scholar
Ding, Y., Wu, Y., Wang, A., Gong, T., Zhang, L.: Disentangled body features for clothing change person re-identification. Multimedia Tools Appl. 1–22 (2024)
Google Scholar
Fieraru, M., Zanfir, M., Oneata, E., Popa, A.I., Olaru, V., Sminchisescu, C.: Three-dimensional reconstruction of human interactions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7214–7223 (2020)
Google Scholar
Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4346–4354 (2015)
Google Scholar
Guo, W., Bie, X., Alameda-Pineda, X., Moreno-Noguer, F.: Multi-person extreme motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13053–13064 (2022)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tian, Q.: Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 214–223 (2020)
Google Scholar
Ma, T., Nie, Y., Long, C., Zhang, Q., Li, G.: Progressively generating better initial guesses towards next stages for high-quality human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6437–6446 (2022)
Google Scholar
Mao, W., Liu, M., Salzmann, M.: History repeats itself: human motion prediction via motion attention. In: Proceedings of the European Conference on Computer Vision, pp. 474–489 (2020)
Google Scholar
Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9489–9497 (2019)
Google Scholar
Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2891–2900 (2017)
Google Scholar
Mehta, D., et al.: Single-shot multi-person 3d pose estimation from monocular RGB. In: International Conference on 3D Vision, pp. 120–130 (2018)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. (2019)
Google Scholar
Peng, X., Mao, S., Wu, Z.: Trajectory-aware body interaction transformer for multi-person pose forecasting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17121–17130 (2023)
Google Scholar
Shen, F., Xie, Y., Zhu, J., Zhu, X., Zeng, H.: Git: graph interactive transformer for vehicle re-identification. IEEE Trans. Image Process. 32, 1039–1051 (2023)
Article Google Scholar
Shen, F., Zhu, J., Zhu, X., Xie, Y., Huang, J.: Exploring spatial significance via hybrid pyramidal graph network for vehicle re-identification. IEEE Trans. Intell. Transp. Syst. 23(7), 8793–8804 (2021)
Article Google Scholar
Shu, X., Zhang, L., Qi, G.J., Liu, W., Tang, J.: Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3300–3315 (2021)
Article Google Scholar
Sofianos, T., Sampieri, A., Franco, L., Galasso, F.: Space-time-separable graph convolutional network for pose forecasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11209–11218 (2021)
Google Scholar
Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3d human pose in the wild using imus and a moving camera. In: Proceedings of the European Conference on Computer Vision, pp. 601–617 (2018)
Google Scholar
Wang, J., Xu, H., Narasimhan, M., Wang, X.: Multi-person 3d motion prediction with multi-range transformers. Adv. Neural. Inf. Process. Syst. 34, 6036–6049 (2021)
Google Scholar
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)
Article MathSciNet Google Scholar
Xiao, P., Wang, C., Lin, Z., Hao, Y., Chen, G., Xie, L.: Knowledge-based clustering federated learning for fault diagnosis in robotic assembly. Knowl.-Based Syst. 294, 111792 (2024)
Article Google Scholar
Xu, C., Tan, R.T., Tan, Y., Chen, S., Wang, X., Wang, Y.: Auxiliary tasks benefit 3d skeleton-based human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9509–9520 (2023)
Google Scholar
Xu, Q., et al.: Joint-relation transformer for multi-person motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9816–9826 (2023)
Google Scholar
Zhang, H., Shen, C., Li, Y., Cao, Y., Liu, Y., Yan, Y.: Exploiting temporal consistency for real-time video depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1725–1734 (2019)
Google Scholar
Zheng, W., Xu, C., Xu, X., Liu, W., He, S.: Ciri: curricular inactivation for residue-aware one-shot video inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13012–13022 (2023)
Google Scholar
Zhong, C., Hu, L., Zhang, Z., Ye, Y., Xia, S.: Spatio-temporal gating-adjacency gcn for human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6447–6456 (2022)
Google Scholar

Download references

Acknowledgements

The work is supported by China National Key R&D Program (Grant No. 2023YFE0202700), Key-Area Research and Development Program of Guangzhou City (No.2023B01J0022), Guangdong Provincial Natural Science Foundation for Outstanding Youth Team Project (No.2024B1515040010), National Natural Science Foundation of China (No. 62302170), Guangdong Basic and Applied Basic Research Foundation (No.2024A1515010187), Guangzhou Basic and Applied Basic Research Foundation (No.2024A04J3750).

Author information

Authors and Affiliations

South China University of Technology, Guangzhou, China
Peng Xiao, Yi Xie, Xuemiao Xu, Weihong Chen & Huaidong Zhang
Guangdong Engineering Center for Large Model and GenAI Technology, Guangzhou, China
Xuemiao Xu & Huaidong Zhang
Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information, Guangzhou, China
Xuemiao Xu

Authors

Peng Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Yi Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xuemiao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Weihong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Huaidong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xuemiao Xu or Huaidong Zhang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 291 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiao, P., Xie, Y., Xu, X., Chen, W., Zhang, H. (2025). Multi-person Pose Forecasting with Individual Interaction Perceptron and Prior Learning. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15076. Springer, Cham. https://doi.org/10.1007/978-3-031-72649-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-72649-1_23
Published: 30 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72648-4
Online ISBN: 978-3-031-72649-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-person Pose Forecasting with Individual Interaction Perceptron and Prior Learning