Adversarial Semantic Data Augmentation for Human Pose Estimation

Bin, Yanrui; Cao, Xuan; Chen, Xinya; Ge, Yanhao; Tai, Ying; Wang, Chengjie; Li, Jilin; Huang, Feiyue; Gao, Changxin; Sang, Nong

doi:10.1007/978-3-030-58529-7_36

Yanrui Bin ORCID: orcid.org/0000-0003-2845-3928¹²,
Xuan Cao¹³,
Xinya Chen ORCID: orcid.org/0000-0002-6537-4316¹²,
Yanhao Ge¹³,
Ying Tai¹³,
Chengjie Wang¹³,
Jilin Li¹³,
Feiyue Huang¹³,
Changxin Gao ORCID: orcid.org/0000-0003-2736-3920¹² &
…
Nong Sang ORCID: orcid.org/0000-0002-9167-1496¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12364))

Included in the following conference series:

European Conference on Computer Vision

3932 Accesses

Abstract

Human pose estimation is the task of localizing body keypoints from still images. The state-of-the-art methods suffer from insufficient examples of challenging cases such as symmetric appearance, heavy occlusion and nearby person. To enlarge the amounts of challenging cases, previous methods augmented images by cropping and pasting image patches with weak semantics, which leads to unrealistic appearance and limited diversity. We instead propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity. Furthermore, we propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamically predict tailored pasting configuration. Given off-the-shelf pose estimation network as discriminator, the generator seeks the most confusing transformation to increase the loss of the discriminator while the discriminator takes the generated sample as input and learns from it. The whole pipeline is optimized in an adversarial manner. State-of-the-art results are achieved on challenging benchmarks. The code has been publicly available at https://github.com/Binyr/ASDA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation

Ssman: self-supervised masked adaptive network for 3D human pose estimation

Article 27 March 2024

A Semi-supervised Data Augmentation Approach Using 3D Graphical Engines

References

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, pp. 3686–3693 (2014)
Google Scholar
Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 717–732. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_44
Chapter Google Scholar
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: CVPR, pp. 7103–7112 (2018)
Google Scholar
Chen, Y., Shen, C., Wei, X.S., Liu, L., Yang, J.: Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: ICCV, pp. 1212–1221 (2017)
Google Scholar
Chu, W., Hung, W.C., Tsai, Y.H., Cai, D., Yang, M.H.: Weakly-supervised caricature face parsing through domain adaptation. ICIP (2019)
Google Scholar
Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: CVPR, pp. 1831–1840 (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Google Scholar
Fieraru, M., Khoreva, A., Pishchulin, L., Schiele, B.: Learning to refine human pose estimation. In: CVPR Workshops, pp. 205–214 (2018)
Google Scholar
Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: CVPR, pp. 932–940 (2017)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Google Scholar
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3
Chapter Google Scholar
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol. 2, p. 5 (2010)
Google Scholar
Ke, L., Chang, M.-C., Qi, H., Lyu, S.: Multi-scale structure-aware network for human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 731–746. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_44
Chapter Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ICLR
Google Scholar
Li, W., et al.: Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148 (2019)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, T., et al.: Devil in the details: towards accurate single and multiple human parsing. arXiv preprint arXiv:1809.05996 (2018)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Nie, X., Feng, J., Zuo, Y., Yan, S.: Human pose estimation with parsing induced learner. In: CVPR (2018)
Google Scholar
Ning, G., Zhang, Z., He, Z.: Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Trans. Multimed. 20(5), 1246–1259 (2018)
Article Google Scholar
Peng, X., Tang, Z., Yang, F., Feris, R.S., Metaxas, D.: Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation. In: CVPR (2018)
Google Scholar
Su, Z., Ye, M., Zhang, G., Dai, L., Sheng, J.: Cascade feature aggregation for human pose estimation. arXiv preprint arXiv:1902.07837 (2019)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. arXiv preprint arXiv:1902.09212 (2019)
Tang, W., Wu, Y.: Does learning specific features for related parts help human pose estimation? In: CVPR, pp. 1107–1116 (2019)
Google Scholar
Tang, W., Yu, P., Wu, Y.: Deeply learned compositional models for human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 197–214. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_12
Chapter Google Scholar
Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS, pp. 1799–1807 (2014)
Google Scholar
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR, pp. 1653–1660 (2014)
Google Scholar
Wang, X., Shrivastava, A., Gupta, A.: A-fast-RCNN: hard positive generation via adversary for object detection. In: CVPR, pp. 2606–2615 (2017)
Google Scholar
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR. pp. 4724–4732 (2016)
Google Scholar
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_29
Chapter Google Scholar
Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: ICCV, pp. 1281–1290 (2017)
Google Scholar
Yu, A., Grauman, K.: Semantic jitter: dense supervision for visual comparisons via synthetic images. In: ICCV, pp. 5570–5579 (2017)
Google Scholar
Zhang, H., et al.: Human pose estimation with spatial contextual information. arXiv preprint arXiv:1901.01760 (2019)

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China under grant 61871435 and the Fundamental Research Funds for the Central Universities no. 2019kfyXKJC024.

Author information

Authors and Affiliations

Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
Yanrui Bin, Xinya Chen, Changxin Gao & Nong Sang
Tencent Youtu Lab, Shanghai, China
Xuan Cao, Yanhao Ge, Ying Tai, Chengjie Wang, Jilin Li & Feiyue Huang

Authors

Yanrui Bin
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Xinya Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yanhao Ge
View author publications
You can also search for this author in PubMed Google Scholar
Ying Tai
View author publications
You can also search for this author in PubMed Google Scholar
Chengjie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jilin Li
View author publications
You can also search for this author in PubMed Google Scholar
Feiyue Huang
View author publications
You can also search for this author in PubMed Google Scholar
Changxin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Nong Sang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nong Sang .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bin, Y. et al. (2020). Adversarial Semantic Data Augmentation for Human Pose Estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12364. Springer, Cham. https://doi.org/10.1007/978-3-030-58529-7_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-58529-7_36
Published: 13 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58528-0
Online ISBN: 978-3-030-58529-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adversarial Semantic Data Augmentation for Human Pose Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation

Ssman: self-supervised masked adaptive network for 3D human pose estimation

A Semi-supervised Data Augmentation Approach Using 3D Graphical Engines

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Adversarial Semantic Data Augmentation for Human Pose Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation

Ssman: self-supervised masked adaptive network for 3D human pose estimation

A Semi-supervised Data Augmentation Approach Using 3D Graphical Engines

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation