Weakly supervised 2D human pose transfer

Zheng, Qian; Liu, Yajie; Lin, Zhizhao; Lischinski, Dani; Cohen-Or, Daniel; Huang, Hui

doi:10.1007/s11432-021-3301-5

Weakly supervised 2D human pose transfer

Research Paper
Special Focus on Visual Computing with Machine Learning
Published: 26 October 2021

Volume 64, article number 210103, (2021)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Qian Zheng¹^na1,
Yajie Liu¹^na1,
Zhizhao Lin¹,
Dani Lischinski²,
Daniel Cohen-Or¹ &
…
Hui Huang¹

151 Accesses
1 Citation
Explore all metrics

Abstract

We present a novel method for pose transfer between two 2D human skeletons. When the bone lengths and proportions between the two skeletons are significantly different, pose transfer becomes a challenging task, which cannot be accomplished by simply copying the joint positions or the bone directions. Our data-driven approach utilizes a deep neural network trained, in a weakly supervised fashion, to encode a skeleton into two separate latent codes, one representing its pose, and another representing the skeleton’s proportions (skeleton-ID). The network is given two skeletons, and learns to combine the pose of one with the skeleton-ID of the other. Lacking supervision on the poses, we develop a novel loss that qualitatively compares poses of different skeletons. We evaluate the performance of our method on a large set of poses. The advantages of avoiding supervision are demonstrated by showing transfer of extreme poses, as well as between uncommon skeleton proportions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skeleton Transformer Networks: 3D Human Pose and Skinned Mesh from Single RGB Image

Unsupervised Cross-Modal Alignment for Multi-person 3D Pose Estimation

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

References

Cao Z, Simon T, Wei S E, et al. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1302–1310
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2672–2680
Mirza M, Osindero S. Conditional generative adversarial nets. 2014. ArXiv:1411.1784
Balakrishnan G, Zhao A, Dalca A V, et al. Synthesizing images of humans in unseen poses. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8340–8348
Ma L, Jia X, Sun Q, et al. Pose guided person image generation. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 405–415
Ma L, Sun Q, Georgoulis S, et al. Disentangled person image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 99–108
Aberman K, Wu R, Lischinski D, et al. Learning character-agnostic motion for motion retargeting in 2D. ACM Trans Graph, 2019, 38: 1–14
Article Google Scholar
Liu M Y, Huang X, Mallya A, et al. Few-shot unsupervised image-to-image translation. In: Proceedings of International Conference on Computer Vision, 2019. 10551–10560
Gleicher M. Retargetting motion to new characters. In: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, 1998. 33–42
Lee J, Shin S Y. A hierarchical approach to interactive motion editing for human-like figures. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 1999. 39–48
Choi K J, Ko H S. Online motion retargetting. J Visual Comput Animat, 2000, 11: 223–235
Article MATH Google Scholar
Monzani J S, Baerlocher P, Boulic R, et al. Using an intermediate skeleton and inverse kinematics for motion retargeting. Comput Graph Forum, 2000, 19: 11–19
Article Google Scholar
Tak S, Ko H S. A physically-based motion retargeting filter. ACM Trans Graph, 2005, 24: 98–117
Article Google Scholar
Villegas R, Yang J, Ceylan D, et al. Neural kinematic networks for unsupervised motion retargetting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8639–8648
Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of International Conference on Computer Vision, 2017. 2242–2251
Aberman K, Li P U, Lischinski D, et al. Skeleton-aware networks for deep motion retargeting. ACM Trans Graph, 2020, 39: 1–14
Article Google Scholar
Siarohin A, Sangineto E, Lathuilière S, et al. Deformable GANs for pose-based human image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 3408–3416
Zhu Z, Huang T, Shi B, et al. Progressive pose attention transfer for person image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2347–2356
Dong H Y, Liang X D, Gong K, et al. Soft-gated warping-GAN for pose-guided person image synthesis. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 474–484
Han X T, Hu X J, Huang W L, et al. Clothflow: a flow-based model for clothed person generation. In: Proceedings of International Conference on Computer Vision, 2019. 10471–10480
Liu W, Piao Z X, Min J, et al. Liquid warping GAN: a unified framework for human motion imitation, appearance transfer and novel view synthesis. In: Proceedings of International Conference on Computer Vision, 2019. 5904–5913
Aberman K, Shi M, Liao J, et al. Deep video-based performance cloning. Comput Graph Forum, 2019, 38: 219–233
Article Google Scholar
Chan C, Ginosar S, Zhou T H, et al. Everybody dance now. In: Proceedings of International Conference on Computer Vision, 2019. 5932–5941
Song S J, Zhang W, Liu J Y, et al. Unsupervised person image generation with semantic parsing transformation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2357–2366
Lorenz D, Bereska L, Milbich T, et al. Unsupervised part-based disentangling of object shape and appearance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 10947–10956
Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 1510–1519
Huang X, Liu M Y, Belongie S, et al. Multimodal unsupervised image-to-image translation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 172–189
Pavlakos G, Zhou X W, Daniilidis K. Ordinal depth supervision for 3D human pose estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7307–7316
Joo H, Simon T, Li X, et al. Panoptic studio: a massively multiview system for social interaction capture. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 190–204
Article Google Scholar
Karras T, Aila T, Laine S, et al. Progressive growing of gans for improved quality, stability, and variation. In: Proceedings of International Conference on Learning Representations, 2018
Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context. In: Proceedings of European Conference on Computer Vision, 2014. 740–755
Ren Y, Yu X, Chen J, et al. Deep image spatial transformation for person image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 7690–7699
Liu Z, Luo P, Qiu S, et al. Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1096–1104
Liang X, Gong K, Shen X, et al. Look into person: joint body parsing and pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 871–885
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant No. U2001206), GD Talent Program (Grant No. 2019JC05X328), GD Science and Technology Program (Grant Nos. 2020A0505100064, 2015A03031-2015), DEGP Key Project (Grant Nos. 2018KZDXM058, 2020SFKC059), Shenzhen Science and Technology Program (Grant Nos. RCJC2020071411-4435012, JCYJ20180305125709986), National Engineering Laboratory for Big Data System Computing Technology, and Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ).

Author information

Zheng Q and Liu Y J have the same contribution to this work.

Authors and Affiliations

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
Qian Zheng, Yajie Liu, Zhizhao Lin, Daniel Cohen-Or & Hui Huang
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 91905, Israel
Dani Lischinski

Authors

Qian Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yajie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhizhao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Dani Lischinski
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Cohen-Or
View author publications
You can also search for this author in PubMed Google Scholar
Hui Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Huang.

Additional information

Supporting information

Appendixes A–E. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File