PoseGate-Former: Transformer Encoder with Trainable Gate for 3D Human Pose Estimation Using Weakly Supervised Learning

Guan, Shannan; Lu, Haiyan; Zhu, Linchao; Fang, Gengfa

doi:10.1007/978-3-030-92310-5_31

Shannan Guan¹⁰,
Haiyan Lu¹⁰,
Linchao Zhu¹⁰ &
…
Gengfa Fang¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1517))

Included in the following conference series:

International Conference on Neural Information Processing

1895 Accesses

Abstract

Weakly supervised learning for 3D human pose estimation can learn a real human structure, but it generally has lower accuracy on reconstructing 3D poses. In this work, we present a 3D pose estimation model using a Transformer encoder based architecture with a trainable gate, PoseGate-Former. The model is trained using individual images from a weakly supervised learning approach. It can reduce possibility of overfitting on some action categories due to the addition of a trainable gate to the Transformer encoder. We evaluated this model on two benchmark datasets: Human3.6M and HumanEva-I. The experimental results show that this model can obtain substantially better accuracy in all action categories of 3D human poses in the datasets compared with some fully-supervised 3D pose estimation approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. CoRR abs/1704.00028 (2017). http://arxiv.org/abs/1704.00028
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014). https://doi.org/10.1109/TPAMI.2013.248
Lee, K., Lee, I., Lee, S.: Propagating LSTM: 3D pose estimation based on joint interdependency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 123–141. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_8
Chapter Google Scholar
Li, W., Liu, H., Ding, R., Liu, M., Wang, P.: Lifting transformer for 3D human pose estimation in video. CoRR abs/2103.14304 (2021). https://arxiv.org/abs/2103.14304
Luo, C., Chu, X., Yuille, A.L.: OriNet: a fully convolutional network for 3D human pose estimation. vol. abs/1811.04989 (2018). http://arxiv.org/abs/1811.04989
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017
Google Scholar
Park, S., Hwang, J., Kwak, N.: 3D human pose estimation using convolutional neural networks with 2D pose information. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 156–169. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_15
Chapter Google Scholar
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Sigal, L., Balan, A.O., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87, 4–27 (2009). https://doi.org/10.1007/s11263-009-0273-6
Wandt, B., Rosenhahn, B.: RepNet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation, June 2019
Google Scholar
Zheng, C., Zhu, S., Mendieta, M., Yang, T., Chen, C., Ding, Z.: 3D human pose estimation with spatial and temporal transformers. CoRR abs/2103.10455 (2021). https://arxiv.org/abs/2103.10455
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017
Google Scholar
Zhou, X., Sun, X., Zhang, W., Liang, S., Wei, Y.: Deep kinematic pose regression. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 186–201. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_17
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Australia Artificial Intelligence Institute, University of Technology Sydney, Ultimo, Australia
Shannan Guan, Haiyan Lu & Linchao Zhu
Global Big Data Technologies Centre, University of Technology Sydney, Ultimo, Australia
Gengfa Fang

Authors

Shannan Guan
View author publications
You can also search for this author in PubMed Google Scholar
Haiyan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Linchao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Gengfa Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shannan Guan , Haiyan Lu , Linchao Zhu or Gengfa Fang .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guan, S., Lu, H., Zhu, L., Fang, G. (2021). PoseGate-Former: Transformer Encoder with Trainable Gate for 3D Human Pose Estimation Using Weakly Supervised Learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1517. Springer, Cham. https://doi.org/10.1007/978-3-030-92310-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-92310-5_31
Published: 02 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92309-9
Online ISBN: 978-3-030-92310-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PoseGate-Former: Transformer Encoder with Trainable Gate for 3D Human Pose Estimation Using Weakly Supervised Learning