Skip to main content

Advertisement

Log in

A multilayer human motion prediction perceptron by aggregating repetitive motion

  • ORIGINAL PAPER
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Human motion prediction aims to forecast future human poses given a historical motion. Current state-of-the-art approaches rely on deep learning architectures of arbitrary complexity, such as Recurrent Neural Networks (RNN), Graph Convolutional Networks (GCN), and typically requires multiple training stages and more parameters. In addition, existing learning-based methods fail to model the observation that human motion tends to repeat itself. In summary, to address the problem of the existing methods neglecting the repetitive nature of human motion, we first introduced a Multi-level Attention Mechanism (MAM) that explicitly leverages this observation to find relevant historical information for predicting future motion. Instead of modeling frame-wise attention via pose similarity, the motion attention was extracted to capture the similarity between the current motion context and the historical motion sub-sequences. In this context, the use of different types of attention, computed at joint, body part, and full pose levels was studied. Furthermore, to address the complexity of existing algorithms based on deep learning architectures, a Fully connected Transpose MLP (FTMLP) model was introduced. By combining a MLP network with a fully connected and transposed layer to process the aggregated relevant past movements, the patterns of motion from the long-term history can be quickly and efficiently used to predict the future poses. The experimental results on standard motion prediction benchmark datasets Human3.6 M and CMU motion capture dataset show that our model is able to make accurate short- and long-term predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Gui, L.Y., Wang, Y.X., Ramanan, D., et al.: Few-shot human motion prediction via meta-learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 432–450 (2018).

  2. Paden, B., Čáp, M., Yong, S.Z., et al.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)

    Article  Google Scholar 

  3. Gong, H., Sim, J., Likhachev, M., et al.: Multi-hypothesis motion planning for visual object tracking. In: 2011 International Conference on Computer Vision, pp. 619–626. IEEE (2011).

  4. Koppula, H.S., Saxena, A.: Anticipating human activities for reactive robotic response. In: IROS, p. 2071 (2013).

  5. Lyu, K., Chen, H., Liu, Z., et al.: 3D human motion prediction: a survey. Neurocomputing 489, 345–365 (2022)

    Article  Google Scholar 

  6. Urtasun, R., Fleet, D.J., Lawrence, N.D.: Modeling human locomotion with topologically constrained latent variable models. Workshop on Human Motion, pp. 104–118. Springer, Berlin (2007)

    Google Scholar 

  7. Lehrmann, A.M., Gehler, P.V., Nowozin, S.: Efficient nonlinear markov models for human motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1314–1321 (2014).

  8. Pavlovic, V., Rehg, J.M., MacCormick, J.: Learning switching linear models of human motion. In: Advances in Neural Information Processing Systems, p. 13 (2000).

  9. Taylor, G.W., Hinton, G.E., Roweis, S.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, p. 19 (2006).

  10. Cheng, P., Wang, H., Stojanovic, V., et al.: Dissipativity-based finite-time asynchronous output feedback control for wind turbine system via a hidden Markov model. Int. J. Syst. Sci. 53(15), 3177–3189 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  11. Mao, W., Liu, M., Salzmann, M., et al.: Learning trajectory dependencies for human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9489–9497 (2019).

  12. Alam, E., Sufian, A., Dutta, P., et al. Vision-based human fall detection systems using deep learning: a review. Comput. Biol. Med. 105626 (2022).

  13. Momin, M.S., Sufian, A., Barman, D., et al.: In-home older adults’ activity pattern monitoring using depth sensors: a review. Sensors 22(23), 9067 (2022)

    Article  Google Scholar 

  14. Ghosh, P., Yao, Y., Davis, L., et al. Stacked spatio-temporal graph convolutional networks for action segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 576–585 (2020).

  15. Li, C., Zhang, Z., Lee, W.S., et al.: Convolutional sequence to sequence model for human dynamics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5226–5234 (2018).

  16. Liu, Z., Wu, S., Jin, S., et al.: Investigating pose representations and motion contexts modeling for 3D motion prediction. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 681–697 (2022)

    Article  Google Scholar 

  17. Kaufmann, M., Aksan, E., Song, J., et al.: Convolutional autoencoders for human motion infilling. In: 2020 International Conference on 3D Vision (3DV). IEEE, pp. 918–927 (2020).

  18. Hernandez, A., Gall, J., Moreno-Noguer, F. Human motion prediction via spatio-temporal inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7134–7143 (2019).

  19. Yang, L., Qi, Z., Liu, Z., et al.: An embedded implementation of CNN-based hand detection and orientation estimation algorithm. Mach. Vis. Appl. 30, 1071–1082 (2019)

    Article  Google Scholar 

  20. Wu, H., Zhang, J., Huang, K.: Point cloud super resolution with adversarial residual graph networks. arXiv preprint arXiv:1908.02111 (2019).

  21. You, J., Ying, R., Ren, X., et al.: Graphrnn: generating realistic graphs with deep auto-regressive models. In: International Conference on Machine Learning. PMLR, pp. 5708–5717 (2018).

  22. Shi, L., Zhang, Y., Cheng, J., et al.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and pattern recognition, pp. 12026–12035 (2019).

  23. Li, R., Wang, H.: Graph convolutional networks and LSTM for first-person multimodal hand action recognition. Mach. Vis. Appl. 33(6), 84 (2022)

    Article  Google Scholar 

  24. Zhong, C., Hu, L., Zhang, Z., et al.: Spatio-temporal gating-adjacency GCN for human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6447–6456 (2022).

  25. Runia, T.F.H., Snoek, C.G.M., Smeulders, A.W.M.: Real-world repetition estimation by div, grad and curl. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9009–9017 (2018).

  26. Wang J., Rong Y., Liu J., et al.: Towards diverse and natural scene-aware 3d human motion synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20460–20469 (2022).

  27. Zhou, C., Tao, H., Chen, Y., et al.: Robust point-to-point iterative learning control for constrained systems: a minimum energy approach. Int. J. Robust Nonlinear Control 32(18), 10139–10161 (2022)

    Article  MathSciNet  Google Scholar 

  28. Zhuang, Z., Tao, H., Chen, Y., et al.: An optimal iterative learning control approach for linear systems with nonuniform trial lengths under input constraints. IEEE Trans. Syst. Man Cybern. Syst. (2022).

  29. Zang, C., Pei, M., Kong, Y.: Few-shot human motion prediction via learning novel motion dynamics. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 846–852 (2021).

  30. Cui, Q., Sun, H., Kong, Y., et al.: Efficient human motion prediction using temporal convolutional generative adversarial network. Inf. Sci. 545, 427–447 (2021)

    Article  Google Scholar 

  31. Cai, Y., Huang, L., Wang, Y., et al.: Learning progressive joint propagation for human motion prediction. In: European Conference on Computer Vision, pp. 226–242. Springer, Cham (2020).

  32. Mao, W., Liu, M., Salzmann, M.: History repeats itself: human motion prediction via motion attention. In: European Conference on Computer Vision, pp. 474–489. Springer, Cham (2020).

  33. Bourached, A., Griffiths, R.R., Gray, R., et al.: Generative model-enhanced human motion prediction. Appl. AI Lett. 3(2), e63 (2022)

    Article  Google Scholar 

  34. Ionescu, C., Papava, D., Olaru, V., et al.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)

    Article  Google Scholar 

  35. Mahmood, N., Ghorbani, N., Troje, N.F., et al.: AMASS: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5442–5451 (2019).

  36. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, p. 30 (2017).

  37. Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018).

  38. Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  39. Zhu, X., Su, W., Lu, L., et al.: Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020).

  40. Gao, J., Yi, J., Murphey, Y.L.: Attention-based global context network for driving maneuvers prediction. Mach. Vis. Appl. 33(4), 53 (2022)

    Article  Google Scholar 

  41. Vidit, V., Salzmann, M.: Attention-based domain adaptation for single-stage detectors[J]. Mach. Vis. Appl. 33(5), 65 (2022)

    Article  Google Scholar 

  42. Tang, Y., Ma, L., Liu, W., et al.: Long-term human motion prediction by modeling motion context and enhancing motion dynamic. arXiv preprint arXiv:1805.02513 (2018).

  43. Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2891–2900 (2017).

  44. Ma, T., Nie, Y., Long, C., et al.: Progressively generating better initial guesses towards next stages for high-quality human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6437–6446 (2022).

  45. Liu, X., Yin, J., Liu, J., et al.: Trajectorycnn: a new spatio-temporal feature learning network for human motion prediction. IEEE Trans. Circuits Syst. Video Technol. 31(6), 2133–2146 (2020)

    Article  Google Scholar 

  46. Dang, L., et al.: Msr-gcn: multi-scale residual graph convolution networks for human motion prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. 2021.

  47. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

Download references

Acknowledgements

The authors thank the editors and reviewers for their work on this manuscript. The authors also thank the Hebei Province important research project (22370301D) for their financial support and the support of the High-Performance Computing Centre of Hebei University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenzhu Yang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Geng, L., Yang, W., Jiao, Y. et al. A multilayer human motion prediction perceptron by aggregating repetitive motion. Machine Vision and Applications 34, 98 (2023). https://doi.org/10.1007/s00138-023-01447-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01447-6

Keywords

Navigation