Abstract
Human motion prediction aims to forecast future human poses given a historical motion. Current state-of-the-art approaches rely on deep learning architectures of arbitrary complexity, such as Recurrent Neural Networks (RNN), Graph Convolutional Networks (GCN), and typically requires multiple training stages and more parameters. In addition, existing learning-based methods fail to model the observation that human motion tends to repeat itself. In summary, to address the problem of the existing methods neglecting the repetitive nature of human motion, we first introduced a Multi-level Attention Mechanism (MAM) that explicitly leverages this observation to find relevant historical information for predicting future motion. Instead of modeling frame-wise attention via pose similarity, the motion attention was extracted to capture the similarity between the current motion context and the historical motion sub-sequences. In this context, the use of different types of attention, computed at joint, body part, and full pose levels was studied. Furthermore, to address the complexity of existing algorithms based on deep learning architectures, a Fully connected Transpose MLP (FTMLP) model was introduced. By combining a MLP network with a fully connected and transposed layer to process the aggregated relevant past movements, the patterns of motion from the long-term history can be quickly and efficiently used to predict the future poses. The experimental results on standard motion prediction benchmark datasets Human3.6 M and CMU motion capture dataset show that our model is able to make accurate short- and long-term predictions.
Similar content being viewed by others
References
Gui, L.Y., Wang, Y.X., Ramanan, D., et al.: Few-shot human motion prediction via meta-learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 432–450 (2018).
Paden, B., Čáp, M., Yong, S.Z., et al.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)
Gong, H., Sim, J., Likhachev, M., et al.: Multi-hypothesis motion planning for visual object tracking. In: 2011 International Conference on Computer Vision, pp. 619–626. IEEE (2011).
Koppula, H.S., Saxena, A.: Anticipating human activities for reactive robotic response. In: IROS, p. 2071 (2013).
Lyu, K., Chen, H., Liu, Z., et al.: 3D human motion prediction: a survey. Neurocomputing 489, 345–365 (2022)
Urtasun, R., Fleet, D.J., Lawrence, N.D.: Modeling human locomotion with topologically constrained latent variable models. Workshop on Human Motion, pp. 104–118. Springer, Berlin (2007)
Lehrmann, A.M., Gehler, P.V., Nowozin, S.: Efficient nonlinear markov models for human motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1314–1321 (2014).
Pavlovic, V., Rehg, J.M., MacCormick, J.: Learning switching linear models of human motion. In: Advances in Neural Information Processing Systems, p. 13 (2000).
Taylor, G.W., Hinton, G.E., Roweis, S.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, p. 19 (2006).
Cheng, P., Wang, H., Stojanovic, V., et al.: Dissipativity-based finite-time asynchronous output feedback control for wind turbine system via a hidden Markov model. Int. J. Syst. Sci. 53(15), 3177–3189 (2022)
Mao, W., Liu, M., Salzmann, M., et al.: Learning trajectory dependencies for human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9489–9497 (2019).
Alam, E., Sufian, A., Dutta, P., et al. Vision-based human fall detection systems using deep learning: a review. Comput. Biol. Med. 105626 (2022).
Momin, M.S., Sufian, A., Barman, D., et al.: In-home older adults’ activity pattern monitoring using depth sensors: a review. Sensors 22(23), 9067 (2022)
Ghosh, P., Yao, Y., Davis, L., et al. Stacked spatio-temporal graph convolutional networks for action segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 576–585 (2020).
Li, C., Zhang, Z., Lee, W.S., et al.: Convolutional sequence to sequence model for human dynamics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5226–5234 (2018).
Liu, Z., Wu, S., Jin, S., et al.: Investigating pose representations and motion contexts modeling for 3D motion prediction. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 681–697 (2022)
Kaufmann, M., Aksan, E., Song, J., et al.: Convolutional autoencoders for human motion infilling. In: 2020 International Conference on 3D Vision (3DV). IEEE, pp. 918–927 (2020).
Hernandez, A., Gall, J., Moreno-Noguer, F. Human motion prediction via spatio-temporal inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7134–7143 (2019).
Yang, L., Qi, Z., Liu, Z., et al.: An embedded implementation of CNN-based hand detection and orientation estimation algorithm. Mach. Vis. Appl. 30, 1071–1082 (2019)
Wu, H., Zhang, J., Huang, K.: Point cloud super resolution with adversarial residual graph networks. arXiv preprint arXiv:1908.02111 (2019).
You, J., Ying, R., Ren, X., et al.: Graphrnn: generating realistic graphs with deep auto-regressive models. In: International Conference on Machine Learning. PMLR, pp. 5708–5717 (2018).
Shi, L., Zhang, Y., Cheng, J., et al.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and pattern recognition, pp. 12026–12035 (2019).
Li, R., Wang, H.: Graph convolutional networks and LSTM for first-person multimodal hand action recognition. Mach. Vis. Appl. 33(6), 84 (2022)
Zhong, C., Hu, L., Zhang, Z., et al.: Spatio-temporal gating-adjacency GCN for human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6447–6456 (2022).
Runia, T.F.H., Snoek, C.G.M., Smeulders, A.W.M.: Real-world repetition estimation by div, grad and curl. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9009–9017 (2018).
Wang J., Rong Y., Liu J., et al.: Towards diverse and natural scene-aware 3d human motion synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20460–20469 (2022).
Zhou, C., Tao, H., Chen, Y., et al.: Robust point-to-point iterative learning control for constrained systems: a minimum energy approach. Int. J. Robust Nonlinear Control 32(18), 10139–10161 (2022)
Zhuang, Z., Tao, H., Chen, Y., et al.: An optimal iterative learning control approach for linear systems with nonuniform trial lengths under input constraints. IEEE Trans. Syst. Man Cybern. Syst. (2022).
Zang, C., Pei, M., Kong, Y.: Few-shot human motion prediction via learning novel motion dynamics. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 846–852 (2021).
Cui, Q., Sun, H., Kong, Y., et al.: Efficient human motion prediction using temporal convolutional generative adversarial network. Inf. Sci. 545, 427–447 (2021)
Cai, Y., Huang, L., Wang, Y., et al.: Learning progressive joint propagation for human motion prediction. In: European Conference on Computer Vision, pp. 226–242. Springer, Cham (2020).
Mao, W., Liu, M., Salzmann, M.: History repeats itself: human motion prediction via motion attention. In: European Conference on Computer Vision, pp. 474–489. Springer, Cham (2020).
Bourached, A., Griffiths, R.R., Gray, R., et al.: Generative model-enhanced human motion prediction. Appl. AI Lett. 3(2), e63 (2022)
Ionescu, C., Papava, D., Olaru, V., et al.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Mahmood, N., Ghorbani, N., Troje, N.F., et al.: AMASS: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5442–5451 (2019).
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, p. 30 (2017).
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018).
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Zhu, X., Su, W., Lu, L., et al.: Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020).
Gao, J., Yi, J., Murphey, Y.L.: Attention-based global context network for driving maneuvers prediction. Mach. Vis. Appl. 33(4), 53 (2022)
Vidit, V., Salzmann, M.: Attention-based domain adaptation for single-stage detectors[J]. Mach. Vis. Appl. 33(5), 65 (2022)
Tang, Y., Ma, L., Liu, W., et al.: Long-term human motion prediction by modeling motion context and enhancing motion dynamic. arXiv preprint arXiv:1805.02513 (2018).
Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2891–2900 (2017).
Ma, T., Nie, Y., Long, C., et al.: Progressively generating better initial guesses towards next stages for high-quality human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6437–6446 (2022).
Liu, X., Yin, J., Liu, J., et al.: Trajectorycnn: a new spatio-temporal feature learning network for human motion prediction. IEEE Trans. Circuits Syst. Video Technol. 31(6), 2133–2146 (2020)
Dang, L., et al.: Msr-gcn: multi-scale residual graph convolution networks for human motion prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. 2021.
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
Acknowledgements
The authors thank the editors and reviewers for their work on this manuscript. The authors also thank the Hebei Province important research project (22370301D) for their financial support and the support of the High-Performance Computing Centre of Hebei University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Geng, L., Yang, W., Jiao, Y. et al. A multilayer human motion prediction perceptron by aggregating repetitive motion. Machine Vision and Applications 34, 98 (2023). https://doi.org/10.1007/s00138-023-01447-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01447-6