Skip-attention encoder–decoder framework for human motion prediction

Zhang, Ruipeng; Shu, Xiangbo; Yan, Rui; Zhang, Jiachao; Song, Yan

doi:10.1007/s00530-021-00807-4

Skip-attention encoder–decoder framework for human motion prediction

Special Issue Paper
Published: 11 June 2021

Volume 28, pages 413–422, (2022)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Ruipeng Zhang¹,
Xiangbo Shu ORCID: orcid.org/0000-0003-4902-4663¹,
Rui Yan¹,
Jiachao Zhang² &
…
Yan Song¹

625 Accesses
9 Citations
Explore all metrics

Abstract

Human motion prediction aims to automatically predict the future motion sequence based on an observed human motion sequence. In this paper, we propose a novel skip-attention encoder–decoder (SAED) framework to model human motion dependences in spatiotemporal space, by utilizing the encoder and decoder to encode the observed motions, and decode the predicted motions, respectively. Overall, this framework has two main insights. First, we design a new self-renewing ConvGRU as the unit of encoder and decoder to effectively capture temporal and spatial skeleton-motion dependencies. Second, we present a new skip-attention mechanism (SAM) to aggregate the motion information of all layers based on their importance. In experiments, quantitative and qualitative results on the Human3.6M and CMU motion capture datasets show the effectiveness of the proposed SAED compared with the related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gradient multi-foci networks for 3D skeleton-based human motion prediction

Article 09 May 2024

Augmented Graph Attention with Temporal Gradation and Reorganization for Human Motion Prediction

Multi-level Motion Attention for Human Motion Prediction

Article 16 June 2021

References

Aksan, E., Kaufmann, M., Hilliges, O.: Structured prediction helps 3d human motion modelling. In: ICCV (2019)
Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations (2015). arXiv:1511.06432
Bappy, J.H., Roy-Chowdhury, A.K.: CNN based region proposals for efficient object detection. In: ICIP (2016)
Blot, M., Cord, M., Thome, N.: Max-min convolutional neural networks for image classification. In: ICIP (2016)
Brand, M., Hertzmann, A.: Style machines. In: SIGGRAPH, pp. 183–192 (2000)
Chiu, H.k., Adeli, E., Wang, B., Huang, D.A., Niebles, J.C.: Action-agnostic human pose forecasting. In: WACV (2019)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv:1406.1078
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: CVPR (2017)
Corona, E., Pumarola, A., Alenya, G., Moreno-Noguer, F.: Context-aware human motion prediction. In: CVPR (2020)
Dong, M., Xu, C.: On retrospecting human dynamics with attention. In: IJCAI (2019)
Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: ICCV (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hernandez, A., Gall, J., Moreno-Noguer, F.: Human motion prediction via spatio-temporal inpainting. In: ICCV (2019)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. (2013)
Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: Deep learning on spatio-temporal graphs. In: CVPR (2016)
Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: CVPR (2017)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE (1998)
Li, C., Zhang, Z., Sun Lee, W., Hee Lee, G.: Convolutional sequence to sequence model for human dynamics. In: CVPR (2018)
Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tian, Q.: Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction. In: CVPR (2020)
Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: ICCV (2019)
Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR (2017)
Pavllo, D., Grangier, D., Auli, M.: Quaternet: A quaternion-based recurrent model for human motion (2018). arXiv:1805.06485
Pavlovic, V., Rehg, J.M., MacCormick, J.: Learning switching linear models of human motion. In: NeurIPS (2001)
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.c.: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In: NeurIPS (2015)
Shu, X., Tang, J., Qi, G., Liu, W., Yang, J.: Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Trans. Pattern Anal. Mach. Intell (2019)
Shu, X., Tang, J., Qi, G.J., Song, Y., Li, Z., Zhang, L.: Concurrence-aware long short-term sub-memories for person-person action recognition. In: CVPRW (2017)
Shu, X., Zhang, L., Sun, Y., Tang, J.: Host-Parasite: Graph LSTM-in-LSTM for Group Activity Recognition. IEEE Trans. Neural Netw. Learn. Syst (2020)
Sigal, L., Balan, A.O., Black, M.J.: Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vision (2010)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI (2017)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions (2014)
Tan, W.R., Chan, C.S., Aguirre, H.E., Tanaka, K.: ArtGAN: Artwork synthesis with conditional categorical GANs. In: ICIP (2017)
Tang, J., Shu, X., Yan, R., Zhang, L.: Coherence constrained graph LSTM for group activity recognition. IEEE Trans. Pattern Anal. Mach. Intell (2019)
Tokmakov, P., Alahari, K., Schmid, C.: Learning video object segmentation with visual memory. In: ICCV (2017)
Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell (2007)
Yan, R., Tang, J., Shu, X., Li, Z., Tian, Q.: Participation-contributed temporal dynamic model for group activity recognition. In: ACM MM (2018)
Yan, R., Xie, L., Tang, J., Shu, X., Tian, Q.: Social Adaptive Module for Weakly-supervised Group Activity Recognition (2020). arXiv:2007.09470
Ye, Q., Li, Z., Fu, L., Zhang, Z., Yang, W., Yang, G.: Nonpeaked discriminant analysis for data representation. IEEE Trans. Neural Netw. Learn. Syst (2019)
Ye, Q., Yang, J., Liu, F., Zhao, C., Ye, N., Yin, T.: L1-norm distance linear discriminant analysis based on an effective iterative algorithm. IEEE Trans. Circuits Syst. Video Technol (2016)
Zhang, J.Y., Felsen, P., Kanazawa, A., Malik, J.: Predicting 3d human dynamics from video. In: ICCV (2019)

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Ruipeng Zhang, Xiangbo Shu, Rui Yan & Yan Song
Artificial Intelligence Institute of Industrial Technology, Nanjing Institute of Technology, Nanjing, China
Jiachao Zhang

Authors

Ruipeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangbo Shu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jiachao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangbo Shu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work is supported by the National Key R&D Program of China (No. 2018AAA0102001) and the National Natural Science Foundation of China (Grant Nos. 62072245, and 61932020).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, R., Shu, X., Yan, R. et al. Skip-attention encoder–decoder framework for human motion prediction. Multimedia Systems 28, 413–422 (2022). https://doi.org/10.1007/s00530-021-00807-4

Download citation

Received: 14 October 2020
Accepted: 04 May 2021
Published: 11 June 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00530-021-00807-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skip-attention encoder–decoder framework for human motion prediction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Gradient multi-foci networks for 3D skeleton-based human motion prediction

Augmented Graph Attention with Temporal Gradation and Reorganization for Human Motion Prediction

Multi-level Motion Attention for Human Motion Prediction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now