Skip to main content

C3P: Cross-Domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation

  • Conference paper
  • First Online:
  • 2027 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13665))

Abstract

This paper first proposes and solves weakly supervised 3D human pose estimation (HPE) problem in point cloud, via propagating the pose prior within unlabelled RGB-point cloud sequence to 3D domain. Our approach termed C3P does not require any labor-consuming 3D keypoint annotation for training. To this end, we propose to transfer 2D HPE annotation information within the existing large-scale RGB datasets (e.g., MS COCO) to 3D task, using unlabelled RGB-point cloud sequence easy to acquire for linking 2D and 3D domains. The self-supervised 3D HPE clues within point cloud sequence are also exploited, concerning spatial-temporal constraints on human body symmetry, skeleton length and joints’ motion. And, a refined point set network structure for weakly supervised 3D HPE is proposed in encoder-decoder manner. The experiments on CMU Panoptic and ITOP datasets demonstrate that, our method can achieve the comparable results to the 3D fully supervised state-of-the-art counterparts. When large-scale unlabelled data (e.g., NTU RGB+D 60) is used, our approach can even outperform them under the more challenging cross-setup test setting. The source code is released at https://github.com/wucunlin/C3P for research use only.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Caetano, C., Sena, J., Brémond, F., Dos Santos, J.A., Schwartz, W.R.: SkeleMotion: a new representation of skeleton joint sequences based on motion information for 3D action recognition. In: Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–8 (2019)

    Google Scholar 

  2. Cai, Y., Ge, L., Cai, J., Thalmann, N.M., Yuan, J.: 3D hand pose estimation using synthetic data and weakly labeled RGB images. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3739–3753 (2020)

    Article  Google Scholar 

  3. Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 666–682 (2018)

    Google Scholar 

  4. Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)

    Google Scholar 

  5. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  6. Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4733–4742 (2016)

    Google Scholar 

  7. Chen, X., Lin, K.Y., Liu, W., Qian, C., Lin, L.: Weakly-supervised discovery of geometry-aware representation for 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 10895–10904 (2019)

    Google Scholar 

  8. Ge, L., Cai, Y., Weng, J., Yuan, J.: Hand pointnet: 3D hand pose estimation using point sets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8417–8426 (2018)

    Google Scholar 

  9. Ge, L., Ren, Z., Yuan, J.: Point-to-point regression pointnet for 3D hand pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 475–491 (2018)

    Google Scholar 

  10. Guo, H., Wang, G., Chen, X., Zhang, C.: Towards good practices for deep 3D hand pose estimation. arXiv preprint arXiv:1707.07248 (2017)

  11. Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., Fei-Fei, L.: Towards viewpoint invariant 3D human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 160–177 (2016)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  13. He, L., Wang, G., Liao, Q., Xue, J.H.: Depth-images-based pose estimation using regression forests and graphical models. Neurocomputing 164, 210–219 (2015)

    Article  Google Scholar 

  14. Iqbal, U., Molchanov, P., Gall, T.B.J., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  15. Iqbal, U., Molchanov, P., Kautz, J.: Weakly-supervised 3D human pose learning via multi-view images in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5243–5252 (2020)

    Google Scholar 

  16. Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  17. Joo, H., et al.: Panoptic studio: a massively multiview system for social interaction capture. IEEE Trans. Pattern Anal. Mach. Intell. (2017)

    Google Scholar 

  18. Kim, W.S., Ortega, A., Lai, P., Tian, D., Gomila, C.: Depth map distortion analysis for view rendering and depth coding. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 721–724 (2009)

    Google Scholar 

  19. Kocabas, M., Karagoz, S., Akbas, E.: Self-supervised learning of 3D human pose using multi-view geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1077–1086 (2019)

    Google Scholar 

  20. Kundu, J.N., Seth, S., Jampani, V., Rakesh, M., Babu, R.V., Chakraborty, A.: Self-supervised 3D human pose estimation via part guided novel image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6152–6162 (2020)

    Google Scholar 

  21. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  22. Liu, J., Wang, G., Duan, L.Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  23. Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention LSTM networks for 3D action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1647–1656 (2017)

    Google Scholar 

  24. Microsoft: Kinect for windows. https://developer.microsoft.com/en-us/windows/kinect/. Accessed 6 Feb 2022

  25. Microsoft: Kinect for x-box 360. https://www.xbox.com/en-US/kinect. Accessed 6 Feb 2022

  26. Mitra, R., Gundavarapu, N.B., Sharma, A., Jain, A.: Multiview-consistent semi-supervised learning for 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6907–6916 (2020)

    Google Scholar 

  27. Moon, G., Chang, J.Y., Lee, K.M.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5079–5088 (2018)

    Google Scholar 

  28. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  29. Obdržálek, Š., Kurillo, G., Han, J., Abresch, T., Bajcsy, R.: Real-time human pose detection and tracking for tele-rehabilitation in virtual reality. In: Medicine Meets Virtual Reality 19, pp. 320–324. IOS Press (2012)

    Google Scholar 

  30. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

    Google Scholar 

  31. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  32. Remelli, E., Han, S., Honari, S., Fua, P., Wang, R.: Lightweight multi-view 3D pose estimation through camera-disentangled representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6040–6049 (2020)

    Google Scholar 

  33. Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019 (2016)

    Google Scholar 

  34. Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1297–1304 (2011)

    Google Scholar 

  35. Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2107–2116 (2017)

    Google Scholar 

  36. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  37. Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 529–545 (2018)

    Google Scholar 

  38. Svenstrup, M., Tranberg, S., Andersen, H.J., Bak, T.: Pose estimation and adaptive robot behaviour for human-robot interaction. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 3571–3576 (2009)

    Google Scholar 

  39. Wan, C., Probst, T., Gool, L.V., Yao, A.: Self-supervised 3D hand pose estimation through training by fitting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10853–10862 (2019)

    Google Scholar 

  40. Wandt, B., Rosenhahn, B.: RepNet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7782–7791 (2019)

    Google Scholar 

  41. Wang, K., Lin, L., Ren, C., Zhang, W., Sun, W.: Convolutional memory blocks for depth data representation learning. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 2790–2797 (2018)

    Google Scholar 

  42. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  43. Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  44. Xiong, F., et al.: A2J: anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 793–802 (2019)

    Google Scholar 

  45. Ye, M., Wang, X., Yang, R., Ren, L., Pollefeys, M.: Accurate 3D pose estimation from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 731–738 (2011)

    Google Scholar 

  46. Ying, J., Zhao, X.: RGB-D fusion for point-cloud-based 3D human pose estimation. In: Proceedings of IEEE International Conference on Image Processing (ICIP), pp. 3108–3112 (2021)

    Google Scholar 

  47. Yub Jung, H., Lee, S., Seok Heo, Y., Dong Yun, I.: Random tree walk toward instantaneous 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2467–2474 (2015)

    Google Scholar 

  48. Zhang, B., et al.: 3D human pose estimation with cross-modality training and multi-scale local refinement. Appl. Soft Comput. 122, 108950 (2022)

    Article  Google Scholar 

  49. Zhang, Z., Hu, L., Deng, X., Xia, S.: Weakly supervised adversarial learning for 3D human pose estimation from point clouds. IEEE Trans. Visual Comput. Graphics 26(5), 1851–1859 (2020)

    Article  Google Scholar 

  50. Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 16259–16268 (2021)

    Google Scholar 

  51. Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 398–407 (2017)

    Google Scholar 

  52. Zhou, Y., Dong, H., El Saddik, A.: Learning to estimate 3D human pose from point cloud. IEEE Sens. J. 20(20), 12334–12342 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This work is jointly supported by the National Natural Science Foundation of China (Grant No. 61502187 and 61876211). Joey Tianyi Zhou is supported by SERC Central Research Fund (Use-inspired Basic Research), Programmatic Grant No. A18A1b0045 from the Singapore government’s Research, and Innovation and Enterprise 2020 plan (Advanced Manufacturing and Engineering domain).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Xiao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 7167 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, C., Xiao, Y., Zhang, B., Zhang, M., Cao, Z., Zhou, J.T. (2022). C3P: Cross-Domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13665. Springer, Cham. https://doi.org/10.1007/978-3-031-20065-6_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20065-6_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20064-9

  • Online ISBN: 978-3-031-20065-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics