Abstract
We present a novel method for pose transfer between two 2D human skeletons. When the bone lengths and proportions between the two skeletons are significantly different, pose transfer becomes a challenging task, which cannot be accomplished by simply copying the joint positions or the bone directions. Our data-driven approach utilizes a deep neural network trained, in a weakly supervised fashion, to encode a skeleton into two separate latent codes, one representing its pose, and another representing the skeleton’s proportions (skeleton-ID). The network is given two skeletons, and learns to combine the pose of one with the skeleton-ID of the other. Lacking supervision on the poses, we develop a novel loss that qualitatively compares poses of different skeletons. We evaluate the performance of our method on a large set of poses. The advantages of avoiding supervision are demonstrated by showing transfer of extreme poses, as well as between uncommon skeleton proportions.
Similar content being viewed by others
References
Cao Z, Simon T, Wei S E, et al. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1302–1310
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2672–2680
Mirza M, Osindero S. Conditional generative adversarial nets. 2014. ArXiv:1411.1784
Balakrishnan G, Zhao A, Dalca A V, et al. Synthesizing images of humans in unseen poses. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8340–8348
Ma L, Jia X, Sun Q, et al. Pose guided person image generation. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 405–415
Ma L, Sun Q, Georgoulis S, et al. Disentangled person image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 99–108
Aberman K, Wu R, Lischinski D, et al. Learning character-agnostic motion for motion retargeting in 2D. ACM Trans Graph, 2019, 38: 1–14
Liu M Y, Huang X, Mallya A, et al. Few-shot unsupervised image-to-image translation. In: Proceedings of International Conference on Computer Vision, 2019. 10551–10560
Gleicher M. Retargetting motion to new characters. In: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, 1998. 33–42
Lee J, Shin S Y. A hierarchical approach to interactive motion editing for human-like figures. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 1999. 39–48
Choi K J, Ko H S. Online motion retargetting. J Visual Comput Animat, 2000, 11: 223–235
Monzani J S, Baerlocher P, Boulic R, et al. Using an intermediate skeleton and inverse kinematics for motion retargeting. Comput Graph Forum, 2000, 19: 11–19
Tak S, Ko H S. A physically-based motion retargeting filter. ACM Trans Graph, 2005, 24: 98–117
Villegas R, Yang J, Ceylan D, et al. Neural kinematic networks for unsupervised motion retargetting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8639–8648
Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of International Conference on Computer Vision, 2017. 2242–2251
Aberman K, Li P U, Lischinski D, et al. Skeleton-aware networks for deep motion retargeting. ACM Trans Graph, 2020, 39: 1–14
Siarohin A, Sangineto E, Lathuilière S, et al. Deformable GANs for pose-based human image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 3408–3416
Zhu Z, Huang T, Shi B, et al. Progressive pose attention transfer for person image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2347–2356
Dong H Y, Liang X D, Gong K, et al. Soft-gated warping-GAN for pose-guided person image synthesis. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 474–484
Han X T, Hu X J, Huang W L, et al. Clothflow: a flow-based model for clothed person generation. In: Proceedings of International Conference on Computer Vision, 2019. 10471–10480
Liu W, Piao Z X, Min J, et al. Liquid warping GAN: a unified framework for human motion imitation, appearance transfer and novel view synthesis. In: Proceedings of International Conference on Computer Vision, 2019. 5904–5913
Aberman K, Shi M, Liao J, et al. Deep video-based performance cloning. Comput Graph Forum, 2019, 38: 219–233
Chan C, Ginosar S, Zhou T H, et al. Everybody dance now. In: Proceedings of International Conference on Computer Vision, 2019. 5932–5941
Song S J, Zhang W, Liu J Y, et al. Unsupervised person image generation with semantic parsing transformation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2357–2366
Lorenz D, Bereska L, Milbich T, et al. Unsupervised part-based disentangling of object shape and appearance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 10947–10956
Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 1510–1519
Huang X, Liu M Y, Belongie S, et al. Multimodal unsupervised image-to-image translation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 172–189
Pavlakos G, Zhou X W, Daniilidis K. Ordinal depth supervision for 3D human pose estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7307–7316
Joo H, Simon T, Li X, et al. Panoptic studio: a massively multiview system for social interaction capture. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 190–204
Karras T, Aila T, Laine S, et al. Progressive growing of gans for improved quality, stability, and variation. In: Proceedings of International Conference on Learning Representations, 2018
Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context. In: Proceedings of European Conference on Computer Vision, 2014. 740–755
Ren Y, Yu X, Chen J, et al. Deep image spatial transformation for person image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 7690–7699
Liu Z, Luo P, Qiu S, et al. Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1096–1104
Liang X, Gong K, Shen X, et al. Look into person: joint body parsing and pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 871–885
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (Grant No. U2001206), GD Talent Program (Grant No. 2019JC05X328), GD Science and Technology Program (Grant Nos. 2020A0505100064, 2015A03031-2015), DEGP Key Project (Grant Nos. 2018KZDXM058, 2020SFKC059), Shenzhen Science and Technology Program (Grant Nos. RCJC2020071411-4435012, JCYJ20180305125709986), National Engineering Laboratory for Big Data System Computing Technology, and Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ).
Author information
Authors and Affiliations
Corresponding author
Additional information
Supporting information
Appendixes A–E. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.
Supplementary File
Rights and permissions
About this article
Cite this article
Zheng, Q., Liu, Y., Lin, Z. et al. Weakly supervised 2D human pose transfer. Sci. China Inf. Sci. 64, 210103 (2021). https://doi.org/10.1007/s11432-021-3301-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-021-3301-5