research-article

Open access

Learning Positional Priors for Pretraining 2D Pose Estimators

Authors:

Chuanguang Yang,

Tianyao ZhengAuthors Info & Claims

HUMA'21: Proceedings of the 2nd International Workshop on Human-centric Multimedia Analysis

Pages 3 - 11

https://doi.org/10.1145/3475723.3484252

Published: 25 November 2021 Publication History

Abstract

The target of 2D human pose estimation is to locate the keypoints of body parts from 2D images. State-of-the-art methods for pose estimation usually construct pixel-wise heatmaps from keypoints as labels for learning neural networks, which are usually initialized randomly or using classification models on large dataset, such as ImageNet, for their backbones. According to statistical data, there are strong positional priors for human keypoints, which are highly dependent on their relationship between image patches. To learn positional priors for pretraining pose estimators, we propose Heatmap-Style Jigsaw Puzzles (HSJP) problem as self-supervised pretext task, whose target is to predict the location of each patch from an image composed of shuffled patches. During pretraining, we only use person images in MS-COCO, rather than introducing extra large dataset like ImageNet. A heatmap-style label for patch location is designed and our learning process is in a non-contrastive way. The weights learned by HSJP pretext task are utilised as backbones of 2D human pose estimators, which are then finetuned on MS-COCO human keypoints dataset. With two popular and strong 2D human pose estimators, HRNet and SimpleBaseline, we evaluate mAP score on both MS-COCO validation and test-dev datasets. Our experiments show that downstream pose estimators with our self-supervised pretraining obtain much better performance than those trained from scratch, and are comparable to those using ImageNet classification models as their initial backbones.

Supplementary Material

MP4 File (HUMA21-fp3505.mp4)

The presentation of paper "Learning Positional Priors for Pretraining 2D Pose Estimators"

Download
15.00 MB

References

[1]

Wenjia Bai, Chen Chen, Giacomo Tarroni, Jinming Duan, Florian Guitton, Steffen E. Petersen, Yike Guo, Paul M. Matthews, and Daniel Rueckert. 2019. Self-Supervised Learning for Cardiac MR Image Segmentation by Anatomical Position Prediction. In MICCAI .

[2]

Dov Bridger, Dov Danon, and Ayellet Tal. 2020. Solving Jigsaw Puzzles With Eroded Boundaries. In CVPR .

[3]

Yuanhao Cai, Zhicheng Wang, Zhengxiong Luo, Binyi Yin, Angang Du, Haoqian Wang, Xiangyu Zhang, Xinyu Zhou, Erjin Zhou, and Jian Sun. 2020. Learning Delicate Local Representations for Multi-person Pose Estimation. In ECCV, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.).

[4]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. In CVPR .

[5]

Maria Fabio Carlucci, Antonio D'Innocente, Silvia Bucci, Barbara Caputo, and Tatiana Tommasi. 2019. Domain Generalization by Solving Jigsaw Puzzles. In CVPR .

[6]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020 b. A Simple Framework for Contrastive Learning of Visual Representations. In ICML .

[7]

Tianlong Chen, Sijia Liu, Shiyu Chang, Yu Cheng, Lisa Amini, and Zhangyang Wang. 2020 c. Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning. In CVPR .

[8]

Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. 2020 a. Improved Baselines with Momentum Contrastive Learning. arXiv preprint arXiv:2003.04297 (2020).

[9]

Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018. Cascaded Pyramid Network for Multi-Person Pose Estimation. In CVPR .

[10]

Taeg Sang Cho, Shai Avidan, and William T. Freeman. 2010. A probabilistic image jigsaw puzzle solver. In CVPR .

[11]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In CVPR .

[12]

Carl Doersch, Abhinav Gupta, and Alexei A. Efros. 2015. Unsupervised Visual Representation Learning by Context Prediction. In ICCV .

[13]

Alexey Dosovitskiy, Jost Tobias Springenberg, Martin A. Riedmiller, and Thomas Brox. 2014. Discriminative Unsupervised Feature Learning with Convolutional Neural Networks. In NeurIPS .

[14]

Richard Durstenfeld. 1964. Algorithm 235: Random permutation. Commun. ACM, Vol. 7 (1964), 420.

Digital Library

[15]

H. Freeman and L. Garder. 1964. Apictorial Jigsaw Puzzles: The Computer Solution of a Problem in Pattern Recognition. IEEE Transactions on Electronic Computers, Vol. EC-13, 2 (1964), 118--127. https://doi.org/10.1109/PGEC.1964.263781

[16]

Andrew C. Gallagher. 2012. Jigsaw puzzles with pieces of unknown orientation. In CVPR .

[17]

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Á vila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Ré mi Munos, and Michal Valko. 2020. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. CoRR, Vol. abs/2006.07733 (2020). arxiv: 2006.07733 https://arxiv.org/abs/2006.07733

[18]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. In CVPR .

[19]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR .

[20]

Junjie Huang, Zheng Zhu, Feng Guo, and Guan Huang. 2020. The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation. In CVPR .

[21]

Umar Iqbal, Anton Milan, and Juergen Gall. 2017. PoseTrack: Joint Multi-person Pose Estimation and Tracking. In CVPR .

[22]

Lipeng Ke, Ming-Ching Chang, Honggang Qi, and Siwei Lyu. 2018. Multi-Scale Structure-Aware Network for Human Pose Estimation. In ECCV .

[23]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR .

[24]

E. Donald Knuth. 1997. The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms. (1997).

Digital Library

[25]

David A. Kosiba, Pierre M. Devaux, Sanjay Balasubramanian, Tarak Gandhi, and Rangachar Kasturi. 1994. An automatic jigsaw puzzle solver. In ICPR .

[26]

Sven Kreiss, Lorenzo Bertoni, and Alexandre Alahi. 2019. PifPaf: Composite Fields for Human Pose Estimation. In CVPR .

[27]

Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. 2016. Learning Representations for Automatic Colorization. In ECCV .

[28]

Wenbo Li, Zhicheng Wang, Binyi Yin, Qixiang Peng, Yuming Du, Tianzi Xiao, Gang Yu, Hongtao Lu, Yichen Wei, and Jian Sun. 2019. Rethinking on Multi-Stage Networks for Human Pose Estimation. CoRR, Vol. abs/1901.00148 (2019). arxiv: 1901.00148 http://arxiv.org/abs/1901.00148

[29]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollá r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV .

[30]

Alejandro Newell and Jia Deng. 2020. How Useful Is Self-Supervised Pretraining for Visual Tasks?. In CVPR .

[31]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked Hourglass Networks for Human Pose Estimation. In ECCV .

[32]

Xuecheng Nie, Jiashi Feng, Jianfeng Zhang, and Shuicheng Yan. 2019. Single-Stage Multi-Person Pose Machines. In ICCV .

[33]

Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In ECCV .

[34]

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. 2016. Context Encoders: Feature Learning by Inpainting. In CVPR .

[35]

Marie-Morgane Paumard, David Picard, and Hedi Tabia. 2018. Image Reassembly Combining Deep Learning and Shortest Path Problem. In ECCV .

[36]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 39, 6 (2017), 1137--1149.

Digital Library

[37]

T. E. John Richardson and Tomaso Vecchi. 2002. A jigsaw-puzzle imagery task for assessing active visuospatial processes in old and young people. Behavior research methods, instruments, and computers (2002), 69--82.

[38]

Matteo Ruggero Ronchi and Pietro Perona. 2017. Benchmarking and Error Diagnosis in Multi-instance Pose Estimation. In ICCV .

[39]

Kilho Son, James Hays, and David B. Cooper. 2014. Solving Square Jigsaw Puzzles with Loop Constraints. In ECCV (Lecture Notes in Computer Science).

[40]

Kilho Son, Daniel Moreno, James Hays, and David B. Cooper. 2016. Solving Small-Piece Jigsaw Puzzles by Growing Consensus. In CVPR .

[41]

Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian. 2017. Pose-Driven Deep Convolutional Model for Person Re-identification. In ICCV .

[42]

Kai Su, Dongdong Yu, Zhenqi Xu, Xin Geng, and Changhu Wang. 2019. Multi-Person Pose Estimation with Enhanced Channel-wise and Spatial Information. In CVPR .

[43]

Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep High-Resolution Representation Learning for Human Pose Estimation. In CVPR .

[44]

Zhi Tian, Hao Chen, and Chunhua Shen. 2019. DirectPose: Direct End-to-End Multi-Person Pose Estimation. CoRR, Vol. abs/1911.07451 (2019). arxiv: 1911.07451 http://arxiv.org/abs/1911.07451

[45]

Jonathan Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation. In NIPS . http://papers.nips.cc/paper/5573-joint-training-of-a-convolutional-network-and-a-graphical-model-for-human-pose-estimation

[46]

Alexander Toshev and Christian Szegedy. 2014. DeepPose: Human Pose Estimation via Deep Neural Networks. In CVPR .

[47]

Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. 2018. Tracking Emerges by Colorizing Videos. In ECCV .

[48]

Xiaolong Wang and Abhinav Gupta. 2015. Unsupervised Learning of Visual Representations Using Videos. In ICCV .

[49]

Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, and Xilin Chen. 2020. Self-Supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation. In CVPR .

[50]

Bin Xiao, Haiping Wu, and Yichen Wei. 2018. Simple Baselines for Human Pose Estimation and Tracking. In ECCV .

[51]

Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In AAAI .

[52]

Chuanguang Yang, Zhulin An, Hui Zhu, Xiaolong Hu, Kun Zhang, Kaiqiang Xu, Chao Li, and Yongjun Xu. 2020. Gated Convolutional Networks with Hybrid Connectivity for Image Classification. In AAAI .

[53]

Mang Ye, Xu Zhang, Pong C. Yuen, and Shih-Fu Chang. 2019. Unsupervised Embedding Learning via Invariant and Spreading Instance Feature. In CVPR .

[54]

Dong Zhang and Mubarak Shah. 2015. Human Pose Estimation in Videos. In ICCV .

[55]

Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, and Ce Zhu. 2020. Distribution-Aware Coordinate Representation for Human Pose Estimation. In CVPR .

[56]

Richard Zhang, Phillip Isola, and A. Alexei Efros. 2016. Colorful Image Colorization. In ECCV .

Cited By

Yang CAn ZCai LXu Y(2022)Knowledge Distillation Using Hierarchical Self-Supervision Augmented DistributionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.3186807(1-15)Online publication date: 2022
https://doi.org/10.1109/TNNLS.2022.3186807

Index Terms

Learning Positional Priors for Pretraining 2D Pose Estimators
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Biometrics

Recommendations

Recent Advances of Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective
Estimation of the human pose from a monocular camera has been an emerging research topic in the computer vision community with many applications. Recently, benefiting from the deep learning technologies, a significant amount of research efforts have ...
Improving Medical Image Classification in Noisy Labels Using only Self-supervised Pretraining
Data Engineering in Medical Imaging
Abstract
Noisy labels hurt deep learning-based supervised image classification performance as the models may overfit the noise and learn corrupted feature extractors. For natural image classification training with noisy labeled data, model initialization ...
Semi- and weakly-supervised human pose estimation
Highlights
- Human pose estimation is achieved by semi- and weakly-supervised learning.
- Semi-...
Graphical abstract

Display Omitted

Abstract
For human pose estimation in still images, this paper proposes three semi- and weakly-supervised learning schemes. While recent advances of convolutional neural networks improve human pose estimation using supervised training data, our ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HUMA'21: Proceedings of the 2nd International Workshop on Human-centric Multimedia Analysis

November 2021

50 pages

ISBN:9781450386715

DOI:10.1145/3475723

General Chairs:
Wu Liu
AI Research of JD.com, China
,
Junbo Guo
State Key Laboratory of Communication Content Cognition, People's Daily Online, China
,
John Smith
IBM Research, USA
,
Program Chairs:
Xinchen Liu
AI Research of JD.com, China
,
Dingwen Zhang
Northwestern Polytechnical University, China
,
Wenbing Huang
Tsinghua University, China

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Strategic Priority Research Program of the Chinese Academy of Sciences
Equipment PreResearch Fund

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20, 2021

Virtual Event, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
263
Total Downloads

Downloads (Last 12 months)88
Downloads (Last 6 weeks)16

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang CAn ZCai LXu Y(2022)Knowledge Distillation Using Hierarchical Self-Supervision Augmented DistributionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.3186807(1-15)Online publication date: 2022
https://doi.org/10.1109/TNNLS.2022.3186807

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten