A multilayer human motion prediction perceptron by aggregating repetitive motion

Geng, Lei; Yang, Wenzhu; Jiao, Yanyan; Zeng, Shuang; Chen, Xinting

doi:10.1007/s00138-023-01447-6

A multilayer human motion prediction perceptron by aggregating repetitive motion

ORIGINAL PAPER
Published: 13 September 2023

Volume 34, article number 98, (2023)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Lei Geng¹,
Wenzhu Yang^1,2,
Yanyan Jiao¹^na1,
Shuang Zeng¹^na1 &
…
Xinting Chen¹^na1

209 Accesses
Explore all metrics

Abstract

Human motion prediction aims to forecast future human poses given a historical motion. Current state-of-the-art approaches rely on deep learning architectures of arbitrary complexity, such as Recurrent Neural Networks (RNN), Graph Convolutional Networks (GCN), and typically requires multiple training stages and more parameters. In addition, existing learning-based methods fail to model the observation that human motion tends to repeat itself. In summary, to address the problem of the existing methods neglecting the repetitive nature of human motion, we first introduced a Multi-level Attention Mechanism (MAM) that explicitly leverages this observation to find relevant historical information for predicting future motion. Instead of modeling frame-wise attention via pose similarity, the motion attention was extracted to capture the similarity between the current motion context and the historical motion sub-sequences. In this context, the use of different types of attention, computed at joint, body part, and full pose levels was studied. Furthermore, to address the complexity of existing algorithms based on deep learning architectures, a Fully connected Transpose MLP (FTMLP) model was introduced. By combining a MLP network with a fully connected and transposed layer to process the aggregated relevant past movements, the patterns of motion from the long-term history can be quickly and efficiently used to predict the future poses. The experimental results on standard motion prediction benchmark datasets Human3.6 M and CMU motion capture dataset show that our model is able to make accurate short- and long-term predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Article 12 August 2023

A Data-Driven Approach to Estimate Human Center of Mass State During Perturbed Locomotion Using Simulated Wearable Sensors

Article 01 April 2024

Human Action Recognition and Prediction: A Survey

Article 28 March 2022

References

Gui, L.Y., Wang, Y.X., Ramanan, D., et al.: Few-shot human motion prediction via meta-learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 432–450 (2018).
Paden, B., Čáp, M., Yong, S.Z., et al.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)
Article Google Scholar
Gong, H., Sim, J., Likhachev, M., et al.: Multi-hypothesis motion planning for visual object tracking. In: 2011 International Conference on Computer Vision, pp. 619–626. IEEE (2011).
Koppula, H.S., Saxena, A.: Anticipating human activities for reactive robotic response. In: IROS, p. 2071 (2013).
Lyu, K., Chen, H., Liu, Z., et al.: 3D human motion prediction: a survey. Neurocomputing 489, 345–365 (2022)
Article Google Scholar
Urtasun, R., Fleet, D.J., Lawrence, N.D.: Modeling human locomotion with topologically constrained latent variable models. Workshop on Human Motion, pp. 104–118. Springer, Berlin (2007)
Google Scholar
Lehrmann, A.M., Gehler, P.V., Nowozin, S.: Efficient nonlinear markov models for human motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1314–1321 (2014).
Pavlovic, V., Rehg, J.M., MacCormick, J.: Learning switching linear models of human motion. In: Advances in Neural Information Processing Systems, p. 13 (2000).
Taylor, G.W., Hinton, G.E., Roweis, S.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, p. 19 (2006).
Cheng, P., Wang, H., Stojanovic, V., et al.: Dissipativity-based finite-time asynchronous output feedback control for wind turbine system via a hidden Markov model. Int. J. Syst. Sci. 53(15), 3177–3189 (2022)
Article MathSciNet MATH Google Scholar
Mao, W., Liu, M., Salzmann, M., et al.: Learning trajectory dependencies for human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9489–9497 (2019).
Alam, E., Sufian, A., Dutta, P., et al. Vision-based human fall detection systems using deep learning: a review. Comput. Biol. Med. 105626 (2022).
Momin, M.S., Sufian, A., Barman, D., et al.: In-home older adults’ activity pattern monitoring using depth sensors: a review. Sensors 22(23), 9067 (2022)
Article Google Scholar
Ghosh, P., Yao, Y., Davis, L., et al. Stacked spatio-temporal graph convolutional networks for action segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 576–585 (2020).
Li, C., Zhang, Z., Lee, W.S., et al.: Convolutional sequence to sequence model for human dynamics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5226–5234 (2018).
Liu, Z., Wu, S., Jin, S., et al.: Investigating pose representations and motion contexts modeling for 3D motion prediction. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 681–697 (2022)
Article Google Scholar
Kaufmann, M., Aksan, E., Song, J., et al.: Convolutional autoencoders for human motion infilling. In: 2020 International Conference on 3D Vision (3DV). IEEE, pp. 918–927 (2020).
Hernandez, A., Gall, J., Moreno-Noguer, F. Human motion prediction via spatio-temporal inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7134–7143 (2019).
Yang, L., Qi, Z., Liu, Z., et al.: An embedded implementation of CNN-based hand detection and orientation estimation algorithm. Mach. Vis. Appl. 30, 1071–1082 (2019)
Article Google Scholar
Wu, H., Zhang, J., Huang, K.: Point cloud super resolution with adversarial residual graph networks. arXiv preprint arXiv:1908.02111 (2019).
You, J., Ying, R., Ren, X., et al.: Graphrnn: generating realistic graphs with deep auto-regressive models. In: International Conference on Machine Learning. PMLR, pp. 5708–5717 (2018).
Shi, L., Zhang, Y., Cheng, J., et al.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and pattern recognition, pp. 12026–12035 (2019).
Li, R., Wang, H.: Graph convolutional networks and LSTM for first-person multimodal hand action recognition. Mach. Vis. Appl. 33(6), 84 (2022)
Article Google Scholar
Zhong, C., Hu, L., Zhang, Z., et al.: Spatio-temporal gating-adjacency GCN for human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6447–6456 (2022).
Runia, T.F.H., Snoek, C.G.M., Smeulders, A.W.M.: Real-world repetition estimation by div, grad and curl. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9009–9017 (2018).
Wang J., Rong Y., Liu J., et al.: Towards diverse and natural scene-aware 3d human motion synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20460–20469 (2022).
Zhou, C., Tao, H., Chen, Y., et al.: Robust point-to-point iterative learning control for constrained systems: a minimum energy approach. Int. J. Robust Nonlinear Control 32(18), 10139–10161 (2022)
Article MathSciNet Google Scholar
Zhuang, Z., Tao, H., Chen, Y., et al.: An optimal iterative learning control approach for linear systems with nonuniform trial lengths under input constraints. IEEE Trans. Syst. Man Cybern. Syst. (2022).
Zang, C., Pei, M., Kong, Y.: Few-shot human motion prediction via learning novel motion dynamics. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 846–852 (2021).
Cui, Q., Sun, H., Kong, Y., et al.: Efficient human motion prediction using temporal convolutional generative adversarial network. Inf. Sci. 545, 427–447 (2021)
Article Google Scholar
Cai, Y., Huang, L., Wang, Y., et al.: Learning progressive joint propagation for human motion prediction. In: European Conference on Computer Vision, pp. 226–242. Springer, Cham (2020).
Mao, W., Liu, M., Salzmann, M.: History repeats itself: human motion prediction via motion attention. In: European Conference on Computer Vision, pp. 474–489. Springer, Cham (2020).
Bourached, A., Griffiths, R.R., Gray, R., et al.: Generative model-enhanced human motion prediction. Appl. AI Lett. 3(2), e63 (2022)
Article Google Scholar
Ionescu, C., Papava, D., Olaru, V., et al.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Article Google Scholar
Mahmood, N., Ghorbani, N., Troje, N.F., et al.: AMASS: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5442–5451 (2019).
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, p. 30 (2017).
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018).
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Zhu, X., Su, W., Lu, L., et al.: Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020).
Gao, J., Yi, J., Murphey, Y.L.: Attention-based global context network for driving maneuvers prediction. Mach. Vis. Appl. 33(4), 53 (2022)
Article Google Scholar
Vidit, V., Salzmann, M.: Attention-based domain adaptation for single-stage detectors[J]. Mach. Vis. Appl. 33(5), 65 (2022)
Article Google Scholar
Tang, Y., Ma, L., Liu, W., et al.: Long-term human motion prediction by modeling motion context and enhancing motion dynamic. arXiv preprint arXiv:1805.02513 (2018).
Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2891–2900 (2017).
Ma, T., Nie, Y., Long, C., et al.: Progressively generating better initial guesses towards next stages for high-quality human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6437–6446 (2022).
Liu, X., Yin, J., Liu, J., et al.: Trajectorycnn: a new spatio-temporal feature learning network for human motion prediction. IEEE Trans. Circuits Syst. Video Technol. 31(6), 2133–2146 (2020)
Article Google Scholar
Dang, L., et al.: Msr-gcn: multi-scale residual graph convolution networks for human motion prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. 2021.
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

Download references

Acknowledgements

The authors thank the editors and reviewers for their work on this manuscript. The authors also thank the Hebei Province important research project (22370301D) for their financial support and the support of the High-Performance Computing Centre of Hebei University.

Author information

Yanyan Jiao, Shuang Zeng and Xinting Chen have contributed equally to this work.

Authors and Affiliations

School of Cyber Security and Computer, Hebei University, Baoding, 071002, China
Lei Geng, Wenzhu Yang, Yanyan Jiao, Shuang Zeng & Xinting Chen
Machine Vision Engineering Research Center, Hebei University, Baoding, 071002, China
Wenzhu Yang

Authors

Lei Geng
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xinting Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenzhu Yang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Geng, L., Yang, W., Jiao, Y. et al. A multilayer human motion prediction perceptron by aggregating repetitive motion. Machine Vision and Applications 34, 98 (2023). https://doi.org/10.1007/s00138-023-01447-6

Download citation

Received: 22 February 2023
Revised: 28 June 2023
Accepted: 10 August 2023
Published: 13 September 2023
DOI: https://doi.org/10.1007/s00138-023-01447-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multilayer human motion prediction perceptron by aggregating repetitive motion

Abstract

Access this article

Similar content being viewed by others

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

A Data-Driven Approach to Estimate Human Center of Mass State During Perturbed Locomotion Using Simulated Wearable Sensors

Human Action Recognition and Prediction: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multilayer human motion prediction perceptron by aggregating repetitive motion

Abstract

Access this article

Similar content being viewed by others

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

A Data-Driven Approach to Estimate Human Center of Mass State During Perturbed Locomotion Using Simulated Wearable Sensors

Human Action Recognition and Prediction: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation