Fixed-resolution representation network for human pose estimation

Liu, Yongxiang; Hou, Xiaorong

doi:10.1007/s00530-022-00919-5

Fixed-resolution representation network for human pose estimation

Regular Paper
Published: 01 April 2022

Volume 28, pages 1597–1609, (2022)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

341 Accesses
Explore all metrics

Abstract

Human pose estimation from a single image is a fundamental yet challenging task in computer vision. Most existing methods gradually generated multi-resolution from high-resolution to low-resolution, then recovered the higher resolution from the low resolution and used it to generate final pose heatmaps, such as Hourglass and HRNet and their variants. In this paper, we propose a novel architecture named fixed-resolution representation network for human pose estimation, which maintains fixed-resolution through the whole process to keep rich spatial-structural information. An Improved Pyramid Convolutional Bottleneck (IPCB) is firstly proposed to encode feature maps with multi receptive fields with the same resolution. Secondly, we introduce an efficient channel attention mechanism to enhance the feature extraction and information selection capability of IPCB, making the performance of IPCB better. Thirdly, considering the deviation from using the flip test of reasoning, we use an existing technology: Unbiased Data Processing. Fourthly, due to the change of the model structure and the limited computing resources, we introduce an iterative retraining strategy to solve the problem of pre-training. We empirically demonstrate the effectiveness of our method and achieve a competitive performance with 1.7M parameters and 3G FLOPs, 89.5 (PCKh@0.5) and 92.7 (PCK@0.2) respectively, compared with the state-of-the-art methods on the benchmark dataset: the MPII and LSP key points detection dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Light-Weight High-Performance HRNet for Human Pose Estimation

EfficientPose: A Lightweight and Efficient Model with Transformer for Human Pose Estimation

A lightweight pose estimation network with multi-scale receptive field

Article 25 June 2023

References

Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: CVPR, pp. 915–922 (2013)
Zheng, L., Huang, Y., Lu, H., Yang, Y.: Pose invariant embedding for deep person re-identification. Proc. IEEE Trans. Image Process. 28, 4500–4509 (2019)
Article MathSciNet Google Scholar
Zhang, Z.: Microsoft kinect sensor and its effect. IEEE MultiMedia 19, 4–10 (2012)
Article Google Scholar
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: ECCV, pp. 103–119 (2018)
Li, Y., Chen, X., Zhu, Z., Xie, L., Huang, G., Du, D., Wang, X.: Attention-guided unified network for panoptic segmentation. In: CVPR, pp. 7019–7028 (2019)
Zhu, J., Zou, W., Xu, L., Hu, Y., Zhu, Z., Chang, M., Huang, J., Huang, G., Du, D.: Action machine: rethinking action recognition in trimmed videos. In: arXiv (2018)
Zhu, J., Zou, W., Zhu, Z., Hu, Y.: Convolutional relation network for skeleton-based action recognition. Neurocomputing 370, 109–117 (2019)
Article Google Scholar
Zhu, J., Zou, W., Zhu, Z.: End-to-end video-live representation learning for action recognition. In: ICPR, pp. 645–650 (2018)
Zhu, J., Zhou, W., Zhu, Z.: Two-stream gated fusion convnets for action recognition. In: ICPR, pp. 597–602 (2018)
Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. NIPS 27, 1799–1807 (2014)
Google Scholar
Toshev, A., Szegedy DeepPose, C.: Human pose estimation via deep neural networks. CVPR 27, 1653–1660 (2014)
Google Scholar
Newell, A., Yang, K.: Jia Deng Stacked hourglass networks for human pose estimation. ECCV 9912, 483–499 (2016)
Google Scholar
Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: CVPR, pp. 4733–4742 (2016)
Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. CVPR 9912, 4724–4732 (2016)
Google Scholar
Chen, Y., Yingli, T., Mingyi, H.: Monocular human pose estimation: a survey of deep learning-based methods. Comput. Vis. Image Understand. 192, 102897 (2020)
Article Google Scholar
Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. ICCV 27, 1799–1807 (2017)
Google Scholar
Rafi, U., Leibe, B., Gall, J., Kostrikov, I.: An efficient convolutional network for human pose estimation. In: BMVC (2016)
Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. In: IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017), pp. 468–475 (2017)
Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. ECCV 9911, 717–732 (2016)
Google Scholar
Nie, X., Feng, J., Zuo, Y., Yan, S.: Human pose estimation with parsing induced learner. In: CVPR (2018)
Zhang, F., Zhu, X., Ye, M.: Fast human pose estimation. In: CVPR, pp. 3512–3521 (2019)
Lipeng, K., Ming Ching, C., Honggang, Q., Siwei, L.: Multi-scale structure-aware network for human pose estimation. In: ECCV (2018)
Sun, K., xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR, pp. 5686–5696 (2019)
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: HigherHRNet: scale-aware representation learning for bottom-up human pose estimation. In: CVPR, pp. 5385–5394 (2020)
Cai, Y., Wang, Z., Luo, Z., Yin, B., Angang, D., Wang, H., Zhang, X., Zhou, X., Zhou, E., Sun, J.: Learning delicate local representations for multi-person pose estimation. ECCV 12348, 455–472 (2020)
Google Scholar
Kim, S.-T., Lee, H.J.: Lightweight stacked hourglass network for human pose estimation. In: Appl. Sci., 10 (2020)
Lianping, Y., Qin, Y., Xiangde, Z.: Lightweight densely connected residual network for human pose estimation. Real Time Image Process 18, 825–827 (2021)
Article Google Scholar
Xiao, Y., Yu, D., Wang, X., Lv, T., Fan, Y., Wu, L.: SPCNet: spatial preserve and content-aware network for human pose estimation. In: European Conference on Artificial Intelligence, pp. 2776–2783 (2020)
Yu, C., Xiao, B., Gao, C.: et. Lite-HRNet: a lightweight high-resolution network. In: CVPR, pp. 10440–10450 (2021)
Zhang, F., Zhu, X., Ye, M.: Fast human pose estimation. In: CVPR, pp. 3517–3526 (2019)
Ren, Z., Zhou, Y., Chen, Y., et al.: Efficient human pose estimation by maximizing fusion and high-level spatial attention. In: arXiv (2021)
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR, pp. 648–656 (2015)
Hou, L., Cao, J., Zhao, Y., et al.: $P^{2}$ Net: augmented parallel-pyramid net for attention guided pose estimation. In: ICPR, pp. 9658–9665 (2020)
Yang, H., Guo, L., Wu, X., et al.: Scale-aware attention-based multi-resolution representation for multi-person pose estimation. In: Multimedia Systems (2021)
Artacho, B., Savakis, A.: OmniPose: a multi-scale framework for multi-person pose estimation. In: arXiv (2021)
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: CVPR, pp. 6450–6458 (2017)
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)
Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. In: arXiv, pp. 1412–7755 (2014)
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: CVPR, pp. 21–29 (2016)
Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: CVPR, pp. 5669–5678 (2017)
Su, K., Yu, D., Xu, Z., Geng, X., Wang, C.: Multi-person pose estimation with enhanced channel-wise and spatial information. In: CVPR, pp. 5674–5682 (2019)
Yuan, Y., Fu, R., Huang, L., et al.: HRFormer: high-resolution transformer for dense prediction. In: arXiv (2021)
Huang, L., Yuan, Y., Guo, J., et al.: Interlaced sparse self-attention for semantic segmentation. In: arXiv (2019)
Luo, Z., Wang, Z., Cai, Y., et al.: Efficient human pose estimation by learning deeply aggregated representations. In: arXiv (2020)
Wang, Q., Banggu, W., Zhu, P., Li, P., Zuo, W., Qinghua, H.: ECA-Net: efficient channel attention for deep convolutional neural network. CVPR 9912, 7132–7141 (2020)
Google Scholar
Sun, X., Xiao, B., Wei, F., et al.: Integral human pose regression. In: ECCV, pp. 536–553 (2018)
Zhang, F., Zhu, X., Dai, H., et al.: Distribution-aware coordinate representation for human pose estimation. In: CVPR, pp. 7091–7100 (2020)
Huang, J., Zhu, Z., Guo, F., Huang, G.: The devil is in the details: delving into unbiased data processing for human pose estimation. In: CVPR, pp. 5699–5708 (2020)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: ECCV, pp. 472–487 (2018)
Zhang, Z., Tang, J., Wu, G.: Simple and lightweight human pose estimation. In: arXiv (2020)
Yilun, C., Zhicheng, W., Yuxiang, P., Zhiqiang, Z., Gang, Y., Jian, S.: Cascaded pyramid network for multi-person pose estimation. In: CVPR, pp. 7103–7112 (2018)
Cosmin Duta, I., Liu, L., Zhu, F., Shao, L.: Pyramidal convolution: rethinking convolutional neural network for visual recognition. In: arXiv (2020)
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: arXiv (2020)
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: British Machine Vision Conference (2010)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Computer Science, vol. 12 (2014)
Peng, X., Tang, Z., Yang, F., Feris, R., Metaxas, D.: Jointly optimize data augmentation and network training: adversarial data augmentation in human pose estimation. In: CVPR, pp. 2226–2234 (2018)
Su, Z., Ye, M., Zhang, G., Dai, L., Sheng, J.: Cascade feature aggregation for human pose estimation. In: arXiv, pp. 1902–07837 (2019)
Bin, Y., Cao, X., Chen, X., Ge, Y., Tai, Y., Wang, C., Li, J., Huang, F., Gao, C., Sang, N.: Adversarial semantic data augmentation for human pose estimation. In: ECCV (2020)
Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in neural information processing systems (2014)
Ning, G., Zhang, Z., He, Z.: Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Trans. Multim. 20, 1246–1259 (2018)
Article Google Scholar
Bulat, D., Kossaifi, J., Tzimiropoulos, G., Pantic, M.: Toward fast and accurate human pose estimation via soft-gated skip connections. In: FG, pp. 8–15 (2020)

Download references

Author information

Authors and Affiliations

School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
Yongxiang Liu & Xiaorong Hou

Authors

Yongxiang Liu
View author publications
You can also search for this author inPubMed Google Scholar
Xiaorong Hou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yongxiang Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by I. Bartolini.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Hou, X. Fixed-resolution representation network for human pose estimation. Multimedia Systems 28, 1597–1609 (2022). https://doi.org/10.1007/s00530-022-00919-5

Download citation

Received: 17 June 2021
Accepted: 11 March 2022
Published: 01 April 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00530-022-00919-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fixed-resolution representation network for human pose estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Light-Weight High-Performance HRNet for Human Pose Estimation

EfficientPose: A Lightweight and Efficient Model with Transformer for Human Pose Estimation

A lightweight pose estimation network with multi-scale receptive field

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now