NeRMo: Learning Implicit Neural Representations for 3D Human Motion Prediction

Wei, Dong; Sun, Huaijiang; Sun, Xiaoning; Hu, Shengxiang

doi:10.1007/978-3-031-72784-9_23

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15102))

Included in the following conference series:

European Conference on Computer Vision

341 Accesses

Abstract

Predicting accurate future human poses from historically observed motions remains a challenging task due to the spatial-temporal complexity and continuity of motions. Previous historical-value methods typically interpret the motion as discrete consecutive frames, which neglects the continuous temporal dynamics and impedes the capability of handling incomplete observations (with missing values). In this paper, we propose a novel implicit Neural Representation method for the task of human Motion prediction, dubbed NeRMo, which represents the motion as a continuous function parameterized by a neural network. The core idea is to explicitly disentangle the spatial-temporal context and output the corresponding 3D skeleton positions. This separate and flexible treatment of space and time allows NeRMo to combine the following advantages. It extrapolates at arbitrary temporal locations; it can learn from both complete and incomplete observed past motions; it provides a unified framework for repairing missing values and forecasting future poses using a single trained model. In addition, we show that NeRMo exhibits compatibility with meta-learning methods, enabling it to effectively generalize to unseen time steps. Extensive experiments conducted on classical benchmarks have confirmed the superior repairing and prediction performance of our proposed method compared to existing historical-value baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

3D skeleton-based human motion prediction using spatial–temporal graph convolutional network

Article 29 July 2024

Adversarial Geometry-Aware Human Motion Prediction

Exploiting Temporal Information for 3D Human Pose Estimation

References

Aksan, E., Kaufmann, M., Cao, P., Hilliges, O.: A spatio-temporal transformer for 3D human motion prediction. In: Proceedings of the International Conference on 3D Vision, pp. 565–574 (2021)
Google Scholar
Bertinetto, L., Henriques, J., Torr, P., Vedaldi, A.: Meta-learning with differentiable closed-form solvers. In: Proceedings of the International Conference on Learning Representations (2019)
Google Scholar
Cai, Y., et al.: Learning progressive joint propagation for human motion prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 226–242. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_14
Chapter Google Scholar
Cervantes, P., Sekikawa, Y., Sato, I., Shinoda, K.: Implicit neural representations for variable length human motion generation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision, ECCV 2022. LNCS, vol. 13677, pp. 356–372. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19790-1_22
Chen, H., He, B., Wang, H., Ren, Y., Lim, S.N., Shrivastava, A.: NeRV: neural representations for videos. In: Proceedings of the Advances in Neural Information Processing Systems (2021)
Google Scholar
Chen, S., Liu, B., Feng, C., Vallespi-Gonzalez, C., Wellington, C.: 3D point cloud processing and learning for autonomous driving: Impacting map creation, localization, and perception. IEEE Sig. Process. Mag. 38(1), 68–86 (2020)
Article Google Scholar
Chen, Y., Liu, S., Wang, X.: Learning continuous image representation with local implicit image function. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8628–8638 (2021)
Google Scholar
Chen, Z., et al.: VideoINR: learning video implicit neural representation for continuous space-time super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2047–2057 (2022)
Google Scholar
Cui, Q., Sun, H.: Towards accurate 3D human motion prediction from incomplete observations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4801–4810 (2021)
Google Scholar
Cui, Q., Sun, H., Li, Y., Kong, Y.: A deep bi-directional attention network for human motion recovery. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 701–707 (2019)
Google Scholar
Dang, L., Nie, Y., Long, C., Zhang, Q., Li, G.: MSR-GCN: multi-scale residual graph convolution networks for human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11467–11476 (2021)
Google Scholar
Dupont, E., Goliński, A., Alizadeh, M., Teh, Y.W., Doucet, A.: COIN: compression with implicit neural representations. arXiv preprint arXiv:2103.03123 (2021)
Dupont, E., Kim, H., Eslami, S., Rezende, D., Rosenbaum, D.: From data to functa: your data point is a function and you can treat it like one. In: Proceedings of the International Conference on Machine Learning, pp. 5694–5725 (2022)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the International Conference on Machine Learning, pp. 1126–1135 (2017)
Google Scholar
Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4346–4354 (2015)
Google Scholar
Gao, S., et al.: Implicit diffusion models for continuous super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10021–10030 (2023)
Google Scholar
Gui, L.-Y., Wang, Y.-X., Liang, X., Moura, J.M.F.: Adversarial geometry-aware human motion prediction. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 823–842. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_48
Chapter Google Scholar
Guo, W., et al.: Back to MLP: a simple baseline for human motion prediction. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4809–4819 (2023)
Google Scholar
He, C., Saito, J., Zachary, J., Rushmeier, H., Zhou, Y.: NeMF: neural motion fields for kinematic animation. In: Proceedings of the Advances in Neural Information Processing Systems (2022)
Google Scholar
Hu, S., Sun, H., Li, B., Wei, D., Li, W., Lu, J.: Fast adaptation for human pose estimation via meta-optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1792–1801 (2024)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Article Google Scholar
Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: deep learning on spatio-temporal graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5308–5317 (2016)
Google Scholar
Kim, C., Lee, D., Kim, S., Cho, M., Han, W.S.: Generalizable implicit neural representations via instance pattern composers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11808–11817 (2023)
Google Scholar
Lehrmann, A.M., Gehler, P.V., Nowozin, S.: Efficient nonlinear Markov models for human motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1314–1321 (2014)
Google Scholar
Li, M., Chen, S., Zhang, Z., Xie, L., Tian, Q., Zhang, Y.: Skeleton-parted graph scattering networks for 3D human motion prediction. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision, ECCV 2022. LNCS, vol. 13666, pp. 18–36. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20068-7_2
Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tian, Q.: Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 214–223 (2020)
Google Scholar
Lohit, S., Anirudh, R., Turaga, P.: Recovering trajectories of unmarked joints in 3D human actions using latent space optimization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2342–2351 (2021)
Google Scholar
Ma, T., Nie, Y., Long, C., Zhang, Q., Li, G.: Progressively generating better initial guesses towards next stages for high-quality human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6437–6446 (2022)
Google Scholar
Mao, W., Liu, M., Salzmann, M.: History repeats itself: human motion prediction via motion attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 474–489. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_28
Chapter Google Scholar
Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9489–9497 (2019)
Google Scholar
Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2891–2900 (2017)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Article Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Rahaman, N., et al.: On the spectral bias of neural networks. In: Proceedings of the International Conference on Machine Learning, pp. 5301–5310 (2019)
Google Scholar
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: Proceedings of the International Conference on Learning Representations (2016)
Google Scholar
Saadatnejad, S., et al.: A generic diffusion-based approach for 3D human pose prediction in the wild. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 8246–8253 (2023)
Google Scholar
Sampieri, A., et al.: Pose forecasting in industrial human-robot collaboration. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision, ECCV 2022. LNCS, vol. 13698, pp. 51–69. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19839-7_4
Sheridan, T.B.: Human-robot interaction: status and challenges. Hum. Fact. 58(4), 525–532 (2016)
Article Google Scholar
Shue, J.R., Chan, E.R., Po, R., Ankner, Z., Wu, J., Wetzstein, G.: 3D neural field generation using triplane diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20875–20886 (2023)
Google Scholar
Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: Proceedings of the Advances in Neural Information Processing Systems (2020)
Google Scholar
Skorokhodov, I., Ignatyev, S., Elhoseiny, M.: Adversarial generation of continuous images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Sun, X., Sun, H., Li, B., Wei, D., Li, W., Lu, J.: DeFeeNet: consecutive 3D human motion prediction with deviation feedback. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5527–5536 (2023)
Google Scholar
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. In: Proceedings of the Advances in Neural Information Processing Systems (2020)
Google Scholar
Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 614–631. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_37
Chapter Google Scholar
Walker, J., Marino, K., Gupta, A., Hebert, M.: The pose knows: video forecasting by generating pose futures. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3332–3341 (2017)
Google Scholar
Wang, K.C., et al.: NeMo: learning 3D neural motion fields from multiple video instances of the same action. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22129–22138 (2023)
Google Scholar
Wang, Y., Liu, Z., Zuo, Z., Li, Z., Wang, L., Luo, X.: Trajectory planning and safety assessment of autonomous vehicles based on motion prediction and model predictive control. IEEE Trans. Veh. Technol. 68(9), 8546–8556 (2019)
Article Google Scholar
Watson, D., Chan, W., Martin-Brualla, R., Ho, J., Tagliasacchi, A., Norouzi, M.: Novel view synthesis with diffusion models. In: International Conference on Learning Representations (2023)
Google Scholar
Wei, D., et al.: Human joint kinematics diffusion-refinement for stochastic motion prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6110–6118 (2023)
Google Scholar
Wei, D., et al.: NeRM: learning neural representations for high-framerate human motion synthesis. In: Proceedings of the International Conference on Learning Representations (2024)
Google Scholar
Woo, G., Liu, C., Sahoo, D., Kumar, A., Hoi, S.: Learning deep time-index models for time series forecasting. In: Proceedings of the International Conference on Machine Learning (2023)
Google Scholar
Xu, C., Tan, R.T., Tan, Y., Chen, S., Wang, X., Wang, Y.: Auxiliary tasks benefit 3D skeleton-based human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9509–9520 (2023)
Google Scholar
Yin, F., Liu, W., Huang, Z., Cheng, P., Chen, T., Yu, G.: Coordinates are not lonely-codebook prior helps implicit neural 3d representations. In: Proceedings of the Advances in Neural Information Processing Systems (2022)
Google Scholar
Yu, S., et al.: Generating videos with dynamics-aware implicit generative adversarial networks. In: International Conference on Learning Representations (2022)
Google Scholar

Download references

Acknowledgements.

This work was supported by the China Postdoctoral Foundation (NO. 2023M741711), and the National Natural Science Foundation of China (NO. 62176125, 61772272).

Author information

Authors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Dong Wei, Huaijiang Sun, Xiaoning Sun & Shengxiang Hu

Authors

Dong Wei
View author publications
You can also search for this author in PubMed Google Scholar
Huaijiang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoning Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shengxiang Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoning Sun .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1644 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, D., Sun, H., Sun, X., Hu, S. (2025). NeRMo: Learning Implicit Neural Representations for 3D Human Motion Prediction. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15102. Springer, Cham. https://doi.org/10.1007/978-3-031-72784-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-72784-9_23
Published: 30 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72783-2
Online ISBN: 978-3-031-72784-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

NeRMo: Learning Implicit Neural Representations for 3D Human Motion Prediction