research-article

Deep Human Dynamics Prior

Authors:

Xiaoning SunAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 4371 - 4379

https://doi.org/10.1145/3474085.3475581

Published: 17 October 2021 Publication History

Abstract

Motion capture (MoCap) technology aims to provide an accurate record of human motion, with specific potentials in activity analysis, human behavior understanding, as well as multimedia industries of animation production and special effects movies. However, because of joint occlusion and limitation of equipment precision, the raw motion data are often damaged, which severely hinders its downstream applications. The latest method relies on deep neural networks to reconstruct the underlying complete motion from the degraded observation, achieving remarkable results. Unfortunately, due to the non-enumerability of human motion, the trained model from large-scale training data often fails to comprehensively cover incomputable action categories, which may lead to a sharp decline in the performance of deep learning-based methods. To handle these limitations, we propose an untrained deep generative model, in which Graph Convolutional Networks (GCNs) are utilized to efficiently capture complicated topological relationships of human joints. We show that the untrained GCN architecture with randomly-initialized weights is sufficient to extract some low-level statistics for human motion reconstruction without any training process. Notably, the performance of our approach is comparable to that of those trained models, while its application is not restricted by the availability of training data or a pre-trained network. Moreover, the proposed model even surpasses the state-of-the-art methods when encountering unprecedented samples in the human action database, regardless of the tasks of human motion recovery and gap-filling problem.

References

[1]

Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Lars Petersson, and Stephen Gould. 2020. A Stochastic Conditioning Scheme for Diverse Human Motion Prediction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5223--5232.

[2]

Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. 2018. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. CoRR, Vol. abs/1803.01271 (2018).

[3]

Emad Barsoum, John Kender, and Zicheng Liu. 2018. HP-GAN: Probabilistic 3D human motion prediction via GAN. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1418--1427.

[4]

James Bergstra and Yoshua Bengio. 2012. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res., Vol. 13 (2012), 281--305.

Digital Library

[5]

Beijia Chen, Huaijiang Sun, Guiyu Xia, Lei Feng, and Bin Li. 2018. Human motion recovery utilizing truncated schatten p-norm and kinematic constraints. Information Sciences, Vol. 450 (2018), 89--108.

Digital Library

[6]

Carnegie Mellon University Graphics Lab. 2003. Carnegie Mellon University Graphics Lab: Carnegie-Mellon Motion Capture (Mocap) Database. http://mocap.cs.cmu.edu (2003).

[7]

Enric Corona, Albert Pumarola, Guillem Alenya, and Francesc Moreno-Noguer. 2020. Context-Aware Human Motion Prediction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6992--7001.

[8]

Qiongjie Cui, Beijia Chen, and Huaijiang Sun. 2019 a. Nonlocal low-rank regularization for human motion recovery based on similarity analysis. Information Sciences, Vol. 493 (2019), 57--74.

[9]

Qiongjie Cui and Huaijiang Sun. 2021. Towards Accurate 3D Human Motion Prediction From Incomplete Observations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4801--4810.

[10]

Qiongjie Cui, Huaijiang Sun, Yupeng Li, and Yue Kong. 2019 b. A Deep Bi-directional Attention Network for Human Motion Recovery. In IJCAI. 701--707.

Digital Library

[11]

Qiongjie Cui, Huaijing Sun, and Fei Yang. 2020. Learning Dynamic Relationships for 3D Human Motion Prediction. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6519--6527.

[12]

Yinfu Feng, Jun Xiao, Yueting Zhuang, Xiaosong Yang, Jian J. Zhang, and Rong Song. 2014. Exploiting temporal stability and low-rank structure for motion capture data refinement. Information Sciences, Vol. 277, 2 (2014), 777--793.

[13]

Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. 2015. Recurrent Network Models for Human Dynamics. In The IEEE International Conference on Computer Vision (ICCV). 4346--4354.

Digital Library

[14]

Kuang Gong, Ciprian Catana, Jinyi Qi, and Quanzheng Li. 2019. PET Image Reconstruction Using Deep Image Prior. IEEE Transactions on Medical Imaging, Vol. 38 (2019), 1655--1665.

[15]

Daniel Holden. 2018. Robust solving of optical motion capture data by denoising. ACM Trans. Graph., Vol. 37 (2018), 165:1--165:12.

Digital Library

[16]

Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-Functioned Neural Networks for Character Control. ACM Trans. Graph., Vol. 36 (2017), 42:1--42:13.

Digital Library

[17]

Wenyu Hu, Zhao Wang, Shuang Liu, Xiaosong Yang, Gaohang Yu, and Jian Jun Zhang. 2018. Motion Capture Data Completion via Truncated Nuclear Norm Regularization. IEEE Signal Processing Letters, Vol. 25 (2018), 258--262.

[18]

Ashesh Jain, Amir Roshan Zamir, Silvio Savarese, and Ashutosh Saxena. 2015. Structural-RNN: Deep Learning on Spatio-Temporal Graphs, In CVPR. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5308--5317.

[19]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR).

[20]

R. Y. Q. Lai, P. C. Yuen, and K. K. W. Lee. 2011. Motion capture data completion and denoising by singular value thresholding. Proc Eurographics Association, Vol. 11, 3 (2011), 924--929.

[21]

Chen Li, Zhen Zhang, Wee Sun Lee, and Gim Hee Lee. 2018. Convolutional Sequence to Sequence Model for Human Dynamics. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5226--5234.

[22]

Maosen Li, Siheng Chen, Yangheng Zhao, Ya Zhang, Yanfeng Wang, and Qi Tian. 2020. Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]

Shujie Li, Yang Zhou, Haisheng Zhu, Wenjun Xie, Yang Zhao, and Xiaoping Liu. 2019. Bidirectional recurrent autoencoder for 3D skeleton motion data refinement. Comput. Graph., Vol. 81 (2019), 92--103.

Digital Library

[24]

Xin Liu, Yiu Ming Cheung, Shu Juan Peng, Zhen Cui, Bineng Zhong, and Ji Xiang Du. 2014. Automatic motion capture data denoising via filtered subspace clustering and low rank matrix approximation. Signal Processing, Vol. 105, 12 (2014), 350--362.

[25]

Utkarsh Mall, G Roshan Lal, Siddhartha Chaudhuri, and Parag Chaudhuri. 2017. A deep recurrent framework for cleaning motion capture data. arXiv preprint arXiv:1712.03380 (2017).

[26]

Wei Mao, Miaomiao Liu, Mathieu Salzmann, and Hongdong Li. 2019. Learning Trajectory Dependencies for Human Motion Prediction. International Conference of Computer Vision (ICCV).

[27]

Diganta Misra. 2019. Mish: A Self Regularized Non-Monotonic Neural Activation Function. ArXiv, Vol. abs/1908.08681 (2019).

[28]

M. Müller, Tido Röder, Michael Clausen, Bernhard Eberhardt, Björn Krüger, and Andreas Weber. 2007. Documentation Mocap Database HDM05.

[29]

Anna Petrovskaia, Raghavendra B. Jana, and Ivan V. Oseledets. 2020. A single image deep learning approach to restoration of corrupted remote sensing products. In ICLR.

[30]

Zhongwei Qiu, Kai Qiu, Jianlong Fu, and Dongmei Fu. 2020. DGCN: Dynamic Graph Convolutional Network for Efficient Multi-Person Pose Estimation. In AAAI.

[31]

Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]

Kucherenko Taras and Hedvig Kjellström. 2018. A Neural Network Approach to Missing Marker Reconstruction. In CoRR.

[33]

Mickaël Tits, Joëlle Tilmanne, and Thierry Dutoit. 2018. Robust and automatic motion-capture data recovery using soft skeleton constraints and model averaging. PLoS ONE, Vol. 13 (2018).

[34]

Dmitry Ulyanov, Andrea Vedaldi, and Victor S. Lempitsky. 2018. Deep Image Prior. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 9446--9454.

[35]

Guiyu Xia, Huaijiang Sun, Beijia Chen, Qingshan Liu, Lei Feng, Guoqing Zhang, and Renlong Hang. 2018. Nonlinear Low-Rank Matrix Completion for Human Motion Recovery. IEEE Transactions on Image Processing, Vol. 27 (2018), 3011--3024.

[36]

Guiyu Xia, Huaijiang Sun, Guoqing Zhang, and Lei Feng. 2016. Human motion recovery jointly utilizing statistical and kinematic information. Information Sciences, Vol. 339 (2016), 189--205.

Digital Library

[37]

Xikun Zhang, Chang Xu, Xinmei Tian, and Dacheng Tao. 2019. Graph Edge Convolutional Neural Networks for Skeleton-Based Action Recognition. IEEE Transactions on Neural Networks and Learning Systems (2019).

[38]

Xiaowei Zhou, Sikang Liu, Georgios Pavlakos, Vijay Kumar, and Kostas Daniilidis. 2018. Human Motion Capture Using a Drone. 2018 IEEE International Conference on Robotics and Automation (ICRA) (2018), 2027--2033.

Cited By

Zhang AWang CHu YHassan AZhang ZHan BQian FXu SVanbever LZhang I(2024)HabitusProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691917(1677-1695)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691917
Pan XZheng BJiang XZeng ZKou QWang HJin X(2024)RoMo: A Robust Solver for Full-body Unlabeled Optical Motion CaptureSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687615(1-11)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680528.3687615
Fu JLong YWang XYin J(2024)LLM-Driven “Coach-Athlete” Pretraining Framework for Complex Text-To-Motion Generation2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650269(1-7)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650269
Show More Cited By

Index Terms

Deep Human Dynamics Prior
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Appearance and texture representations

Recommendations

Human dynamics from monocular video with dynamic camera movements

We propose a new method that reconstructs 3D human motion from in-the-wild video by making full use of prior knowledge on the laws of physics. Previous studies focus on reconstructing joint angles and positions in the body local coordinate frame. Body ...
GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane. Prior methods have modeled human-ground interactions ...
Motion reconstruction using sparse accelerometer data

The development of methods and tools for the generation of visually appealing motion sequences using prerecorded motion capture data has become an important research area in computer animation. In particular, data-driven approaches have been used for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Project of Science and Technology of Jiangsu Province of China
Postgraduate Research & Practice Innovation Program of Jiangsu Province
National Natural Science Foundation of China

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
176
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang AWang CHu YHassan AZhang ZHan BQian FXu SVanbever LZhang I(2024)HabitusProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691917(1677-1695)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691917
Pan XZheng BJiang XZeng ZKou QWang HJin X(2024)RoMo: A Robust Solver for Full-body Unlabeled Optical Motion CaptureSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687615(1-11)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680528.3687615
Fu JLong YWang XYin J(2024)LLM-Driven “Coach-Athlete” Pretraining Framework for Complex Text-To-Motion Generation2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650269(1-7)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650269
Pan XZheng BJiang XXu GGu XLi JKou QWang HShao TZhou KJin X(2023)A Locality-based Neural Solver for Optical Motion CaptureSIGGRAPH Asia 2023 Conference Papers10.1145/3610548.3618148(1-11)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.1145/3610548.3618148

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten