GARNet: Graph Attention Residual Networks Based on Adversarial Learning for 3D Human Pose Estimation

Chen, Zhihua; Liu, Xiaoli; Sheng, Bing; Li, Ping

doi:10.1007/978-3-030-61864-3_24

GARNet: Graph Attention Residual Networks Based on Adversarial Learning for 3D Human Pose Estimation

Zhihua Chen¹⁶,
Xiaoli Liu¹⁶,
Bing Sheng¹⁷ &
…
Ping Li¹⁸

Conference paper
First Online: 18 October 2020

2051 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12221))

Abstract

Recent studies have shown that, with the help of complex network architecture, great progress has been made in estimating the pose and shape of a 3D human from a single image. However, existing methods fail to produce accurate and natural results for different environments. In this paper, we proposed a novel adversarial learning approach and studied the problem of learning graph attention network for regression. Graph Attention Residual Networks (GARNet), which processes regression tasks with graphic-structured data, learns to capture semantic information, such as local and global node relationships, through end-to-end training without additional supervision. The adversarial learning module is implemented by a novel multi-source discriminator network to learn the mapping from 2D pose distribution to 3D pose distribution. We conducted a comprehensive study to verify the effectiveness of our method. Experiments show that the performance of our method is superior to that of most existing techniques.

Supported by the National Natural Science Foundation of China (Grant No. 61672228, 61370174) and Shanghai Automotive Industry Science and Technology Development Foundation (No. 1837).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Kocabas, M., Karagoz, S., Akbas, E.: Self-supervised learning of 3D human pose using multi-view geometry. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1077–1086 (2019)
Google Scholar
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1263–1272 (2017)
Google Scholar
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: IEEE International Conference on Computer Vision, pp. 398–407 (2017)
Google Scholar
Liang, S., Sun, X., Wei, Y.: Compositional human pose regression. Comput. Vis. Image Underst. 176–177, 1–8 (2018)
Article Google Scholar
Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 2017 International Conference on 3D Vision, pp. 506–516 (2017)
Google Scholar
Tomè, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5689–5698 (2017)
Google Scholar
Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_33
Chapter Google Scholar
Morenonoguer, F.: 3D human pose estimation from a single image via distance matrix regression. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1561–1570 (2017)
Google Scholar
Buades, A., Coll, B., Morel, J.: A non-local algorithm for image denoising. In: 2005 IEEE Conference on Computer Vision and Pattern Recognition, pp. 60–65 (2005)
Google Scholar
Martinez, J.A., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: IEEE International Conference on Computer Vision, pp. 2659–2668 (2017)
Google Scholar
Pavlakos, G., Zhou, X., Daniilidis, K.: Ordinal depth supervision for 3D human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7307–7316 (2018)
Google Scholar
Fang, H., Xu, Y., Wang, W., Liu, X., Zhu, S.: Learning pose grammar to encode human body configuration for 3D pose estimation. In: Conference on Artificial Intelligence, pp. 6821–6828 (2018)
Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial tempobral graph convolutional networks for skeleton-based action recognition. In: Conference on Artificial Intelligence, pp. 7444–7452 (2018)
Google Scholar
Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 690–706. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_41
Chapter Google Scholar
Wandt, B., Ackermann, H., Rosenhahn, B.: A kinematic chain space for monocular motion capture. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11132, pp. 31–47. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11018-5_4
Chapter Google Scholar
Wandt, B., Rosenhahn, B.: RepNet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7782–7791 (2019)
Google Scholar
Chen, Y., Shen, C., Wei, X., Liu, L., Yang, J.: Adversarial PoseNet: a structure-aware convolutional network for human pose estimation. In: IEEE International Conference on Computer Vision, pp. 1221–1230 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, pp. 807–814 (2010)
Google Scholar
Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: IEEE International Conference on Computer Vision, pp. 2621–2630 (2017)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1325–1339 (2014)
Article Google Scholar
Andriluka, M., Pishchulin, L., Gehler, P V., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693 (2014)
Google Scholar
Du, Y., et al.: Marker-less 3D human motion capture with monocular image sequence and height-maps. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_2
Chapter Google Scholar
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D human pose estimation in the wild by adversarial learning. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5255–5264 (2018)
Google Scholar
Hossain, M.R.I., Little, J.J.: Exploiting temporal information for 3D human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 69–86. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_5
Chapter Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., Feng, D.D.: Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Trans. Syst. Man Cybern. Syst. 49(9), 1806–1819 (2019)
Article Google Scholar
Karambakhsh, A., Kamel, A., Sheng, B., Li, P., Yang, P., Feng, D.D.: Deep gesture interaction for augmented anatomy learning. Int. J. Inf. Manage. 45, 328–336 (2019)
Article Google Scholar
Sheng, B., Li, P., Zhang, Y., Mao, L.: GreenSea: visual soccer analysis using broad learning system. IEEE Trans. Cybern. 1–15 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
Zhihua Chen & Xiaoli Liu
Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Bing Sheng
Faculty of Information Technology, Macau University of Science and Technology, Macau, 999078, China
Ping Li

Authors

Zhihua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bing Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Ping Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhihua Chen .

Editor information

Editors and Affiliations

University of Geneva, Geneva, Switzerland
Nadia Magnenat-Thalmann
University of Crete, Heraklion, Greece
Constantine Stephanidis
University of Macau, Macau, China
Enhua Wu
Swiss Federal Institute of Technology, Lausanne, Switzerland
Daniel Thalmann
Shanghai Jiao Tong University, Shanghai, China
Bin Sheng
University of Sydney, Sydney, Australia
Jinman Kim
University of Crete, Heraklion, Greece
George Papagiannakis
University of Calgary, Calgary, AB, Canada
Marina Gavrilova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Z., Liu, X., Sheng, B., Li, P. (2020). GARNet: Graph Attention Residual Networks Based on Adversarial Learning for 3D Human Pose Estimation. In: Magnenat-Thalmann, N., et al. Advances in Computer Graphics. CGI 2020. Lecture Notes in Computer Science(), vol 12221. Springer, Cham. https://doi.org/10.1007/978-3-030-61864-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-61864-3_24
Published: 18 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61863-6
Online ISBN: 978-3-030-61864-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics