Abstract
Weakly supervised learning for 3D human pose estimation can learn a real human structure, but it generally has lower accuracy on reconstructing 3D poses. In this work, we present a 3D pose estimation model using a Transformer encoder based architecture with a trainable gate, PoseGate-Former. The model is trained using individual images from a weakly supervised learning approach. It can reduce possibility of overfitting on some action categories due to the addition of a trainable gate to the Transformer encoder. We evaluated this model on two benchmark datasets: Human3.6M and HumanEva-I. The experimental results show that this model can obtain substantially better accuracy in all action categories of 3D human poses in the datasets compared with some fully-supervised 3D pose estimation approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. CoRR abs/1704.00028 (2017). http://arxiv.org/abs/1704.00028
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014). https://doi.org/10.1109/TPAMI.2013.248
Lee, K., Lee, I., Lee, S.: Propagating LSTM: 3D pose estimation based on joint interdependency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 123–141. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_8
Li, W., Liu, H., Ding, R., Liu, M., Wang, P.: Lifting transformer for 3D human pose estimation in video. CoRR abs/2103.14304 (2021). https://arxiv.org/abs/2103.14304
Luo, C., Chu, X., Yuille, A.L.: OriNet: a fully convolutional network for 3D human pose estimation. vol. abs/1811.04989 (2018). http://arxiv.org/abs/1811.04989
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017
Park, S., Hwang, J., Kwak, N.: 3D human pose estimation using convolutional neural networks with 2D pose information. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 156–169. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_15
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Sigal, L., Balan, A.O., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87, 4–27 (2009). https://doi.org/10.1007/s11263-009-0273-6
Wandt, B., Rosenhahn, B.: RepNet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation, June 2019
Zheng, C., Zhu, S., Mendieta, M., Yang, T., Chen, C., Ding, Z.: 3D human pose estimation with spatial and temporal transformers. CoRR abs/2103.10455 (2021). https://arxiv.org/abs/2103.10455
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017
Zhou, X., Sun, X., Zhang, W., Liang, S., Wei, Y.: Deep kinematic pose regression. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 186–201. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_17
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Guan, S., Lu, H., Zhu, L., Fang, G. (2021). PoseGate-Former: Transformer Encoder with Trainable Gate for 3D Human Pose Estimation Using Weakly Supervised Learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1517. Springer, Cham. https://doi.org/10.1007/978-3-030-92310-5_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-92310-5_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92309-9
Online ISBN: 978-3-030-92310-5
eBook Packages: Computer ScienceComputer Science (R0)