Skip to main content
Log in

Weakly supervised 2D human pose transfer

  • Research Paper
  • Special Focus on Visual Computing with Machine Learning
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

We present a novel method for pose transfer between two 2D human skeletons. When the bone lengths and proportions between the two skeletons are significantly different, pose transfer becomes a challenging task, which cannot be accomplished by simply copying the joint positions or the bone directions. Our data-driven approach utilizes a deep neural network trained, in a weakly supervised fashion, to encode a skeleton into two separate latent codes, one representing its pose, and another representing the skeleton’s proportions (skeleton-ID). The network is given two skeletons, and learns to combine the pose of one with the skeleton-ID of the other. Lacking supervision on the poses, we develop a novel loss that qualitatively compares poses of different skeletons. We evaluate the performance of our method on a large set of poses. The advantages of avoiding supervision are demonstrated by showing transfer of extreme poses, as well as between uncommon skeleton proportions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cao Z, Simon T, Wei S E, et al. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1302–1310

  2. Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2672–2680

  3. Mirza M, Osindero S. Conditional generative adversarial nets. 2014. ArXiv:1411.1784

  4. Balakrishnan G, Zhao A, Dalca A V, et al. Synthesizing images of humans in unseen poses. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8340–8348

  5. Ma L, Jia X, Sun Q, et al. Pose guided person image generation. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 405–415

  6. Ma L, Sun Q, Georgoulis S, et al. Disentangled person image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 99–108

  7. Aberman K, Wu R, Lischinski D, et al. Learning character-agnostic motion for motion retargeting in 2D. ACM Trans Graph, 2019, 38: 1–14

    Article  Google Scholar 

  8. Liu M Y, Huang X, Mallya A, et al. Few-shot unsupervised image-to-image translation. In: Proceedings of International Conference on Computer Vision, 2019. 10551–10560

  9. Gleicher M. Retargetting motion to new characters. In: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, 1998. 33–42

  10. Lee J, Shin S Y. A hierarchical approach to interactive motion editing for human-like figures. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 1999. 39–48

  11. Choi K J, Ko H S. Online motion retargetting. J Visual Comput Animat, 2000, 11: 223–235

    Article  MATH  Google Scholar 

  12. Monzani J S, Baerlocher P, Boulic R, et al. Using an intermediate skeleton and inverse kinematics for motion retargeting. Comput Graph Forum, 2000, 19: 11–19

    Article  Google Scholar 

  13. Tak S, Ko H S. A physically-based motion retargeting filter. ACM Trans Graph, 2005, 24: 98–117

    Article  Google Scholar 

  14. Villegas R, Yang J, Ceylan D, et al. Neural kinematic networks for unsupervised motion retargetting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8639–8648

  15. Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of International Conference on Computer Vision, 2017. 2242–2251

  16. Aberman K, Li P U, Lischinski D, et al. Skeleton-aware networks for deep motion retargeting. ACM Trans Graph, 2020, 39: 1–14

    Article  Google Scholar 

  17. Siarohin A, Sangineto E, Lathuilière S, et al. Deformable GANs for pose-based human image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 3408–3416

  18. Zhu Z, Huang T, Shi B, et al. Progressive pose attention transfer for person image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2347–2356

  19. Dong H Y, Liang X D, Gong K, et al. Soft-gated warping-GAN for pose-guided person image synthesis. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 474–484

  20. Han X T, Hu X J, Huang W L, et al. Clothflow: a flow-based model for clothed person generation. In: Proceedings of International Conference on Computer Vision, 2019. 10471–10480

  21. Liu W, Piao Z X, Min J, et al. Liquid warping GAN: a unified framework for human motion imitation, appearance transfer and novel view synthesis. In: Proceedings of International Conference on Computer Vision, 2019. 5904–5913

  22. Aberman K, Shi M, Liao J, et al. Deep video-based performance cloning. Comput Graph Forum, 2019, 38: 219–233

    Article  Google Scholar 

  23. Chan C, Ginosar S, Zhou T H, et al. Everybody dance now. In: Proceedings of International Conference on Computer Vision, 2019. 5932–5941

  24. Song S J, Zhang W, Liu J Y, et al. Unsupervised person image generation with semantic parsing transformation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2357–2366

  25. Lorenz D, Bereska L, Milbich T, et al. Unsupervised part-based disentangling of object shape and appearance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 10947–10956

  26. Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 1510–1519

  27. Huang X, Liu M Y, Belongie S, et al. Multimodal unsupervised image-to-image translation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 172–189

  28. Pavlakos G, Zhou X W, Daniilidis K. Ordinal depth supervision for 3D human pose estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7307–7316

  29. Joo H, Simon T, Li X, et al. Panoptic studio: a massively multiview system for social interaction capture. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 190–204

    Article  Google Scholar 

  30. Karras T, Aila T, Laine S, et al. Progressive growing of gans for improved quality, stability, and variation. In: Proceedings of International Conference on Learning Representations, 2018

  31. Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context. In: Proceedings of European Conference on Computer Vision, 2014. 740–755

  32. Ren Y, Yu X, Chen J, et al. Deep image spatial transformation for person image generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 7690–7699

  33. Liu Z, Luo P, Qiu S, et al. Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1096–1104

  34. Liang X, Gong K, Shen X, et al. Look into person: joint body parsing and pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 871–885

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant No. U2001206), GD Talent Program (Grant No. 2019JC05X328), GD Science and Technology Program (Grant Nos. 2020A0505100064, 2015A03031-2015), DEGP Key Project (Grant Nos. 2018KZDXM058, 2020SFKC059), Shenzhen Science and Technology Program (Grant Nos. RCJC2020071411-4435012, JCYJ20180305125709986), National Engineering Laboratory for Big Data System Computing Technology, and Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Huang.

Additional information

Supporting information

Appendixes A–E. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Q., Liu, Y., Lin, Z. et al. Weakly supervised 2D human pose transfer. Sci. China Inf. Sci. 64, 210103 (2021). https://doi.org/10.1007/s11432-021-3301-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-021-3301-5

Keywords

Navigation