Skip to main content
Log in

IDPNet: a light-weight network and its variants for human pose estimation

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Most previous advanced backbones usually ignore the requirements on real-time and speed, which are directly affected by model size. In our study, we present a light-weight model, IDPNet. We design a dense layer and identity block parallel block as the basic block of the backbone. And we introduce an intra-level block fusion representation head to fuse high-resolution. As a result, our IDPNet decreases the number of parameters by 85.3% on both two datasets, and the GFLOPs by 80.4% and 60.7%, respectively. To extend the usability, we propose two extra variant networks IDPNet-Balance and IDPNet-Precision. We train and test our IDPNet over the COCO keypoint detection dataset and the MPII human pose dataset without pretrain. The best accuracy in both datasets is prior than previous networks. During testing process, all models can predict per image at the speed of 13 ms, 20 ms and 21 ms, respectively, and they also achieve real-time fundamentally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Reddy ND, Vo M, Narasimhan SG (2018) CarFusion: combining point tracking and part detection for dynamic 3d reconstruction of vehicle. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 18–23

  2. Fernando T, Denman S, Sridharan S, Fookes C (2018) Tracking by prediction: a deep generative model for mutli-person localisation and tracking. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp 12–15

  3. Li MP, Zhou Z, Liu X (2020) Cross refinement techniques for markerless human motion capture. ACM Trans Multimed Comput 16:1–18

    Article  Google Scholar 

  4. Zhu X, Zhu Y, Wang H, Wen H, Yan Y, Liu P (2022) Skeleton sequence and RGB frame based multi-modality feature fusion network for action recognition. ACM Trans Multimed Comput 18:1–24

    Article  Google Scholar 

  5. Krizhevsky A, Sutskever I, Hinton G (2017) Imagenet classification with deep convolutional neural networks. In: Proceedings of the Conference and Workshop on Neural Information Processing Systems, pp 4–9

  6. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations, pp 7–9

  7. Szegedy C, Liu W, Jia Y, Sermanet P, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7–12

  8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

  9. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. IEEE 86:2278–2324

    Article  Google Scholar 

  10. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 10–16

  11. Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal 39:2481–2495

    Article  Google Scholar 

  12. Noh H, Hong S, Han B (2015) Learning Deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 7–13

  13. Weng W, Zhu X (2015) U-net: convolutional networks for biomedical image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, pp 7–12

  14. Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 8–14

  15. Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 10–16

  16. Yang W, Li S, Ouyang W Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 22–29

  17. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 18–23

  18. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5686–5696

  19. Dantone M, Gall J, Leistner C, VanGool L (2013) Human pose estimation using body parts dependent joint regressors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 23–27

  20. Gkioxari G, Hariharan B, Girshick R, Malik J (2014) Using k-poselets for detecting people and localizing their keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 23–28

  21. Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61:55–79

    Article  Google Scholar 

  22. Andriluka M, Roth S, Schiele B (2009) Pictorial structures revisited: people detection and articulated pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1014–1021

  23. Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, pp 20–25

  24. Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 23–27

  25. Toshev A, Szegedy C (2014) DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 23–28

  26. Toshev A, Gkioxari G, Jaitly N (2016) Chained predictions using convolutional neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 10–16

  27. Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 8–14

  28. Sun K, Lan C, Xing J, Zeng W, Liu D, Wang J (2017) Human pose estimation using global and local normalization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 22–29

  29. Fan X, Zheng K, Lin Y, Wang S (2015) Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7–12

  30. Peng X, Tang ZQ, Yang F, Feris R, Metaxas DN (2018) Jointly optimize data augmentation and network training: adversarial data augmentation in human pose estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18–23

  31. Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4733–4742

  32. Chu X, Ouyang W, Li H, Wang X (2016) Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4715–4723

  33. Chu X, Yang W, Ouyang W, Ma C, Yuille A, Wang X (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 21–26

  34. Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3073–3082

  35. Zhou Y, Hu X, Zhang B (2018) Interlinked convolutional neural networks for face parsing. In: Proceedings of the International Symposium on Neural Networks, pp 25–28

  36. Saxena S, Verbeek J (2016) Convolutional neural fabrics. In: Proceedings of the Advances in Neural Information Processing Systems. Montreal, Canada, pp 4060–4068

  37. Huang G, Chen D, Li T, Wu F, Laurens V, Weinberger K (2017) Multi-scale dense convolutional networks for efficient prediction. CoRR (ACM)

  38. Bulat A, Tzimiropoulos G (2017) Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 22–29

  39. Zhang F, Zhu X, Ye M (2019) Fast human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 15–20

  40. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 21–26

  41. Sandler M, Howard A, Zhu M, Zhmoginov A, Wang W, Weyand T, Andreetto M, Adam H (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18–23

  42. Zhang X, Zhou X, Lin M, Sun J (2018) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18–23

  43. Ma NN, Zhang XY, Zheng HT, Sun J (2018) ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 8–14

  44. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally W, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50× fewer parameters and <0.5 mb model size. In: Proceedings of the International Conference on Learning Representations, pp 2–4

  45. Shen X, Yuan G, Niu W, Ma X, Wang Y (2021) Towards fast and accurate multi-person pose estimation on mobile devices. In: Proceedings of the International Joint Conferences on Artificial Intelligence Organization, pp 19–26

  46. Gao SH, Cheng MM, Zhao K, Zhang XY, Yang M, Torr P (2021) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal 43:652–662

    Article  Google Scholar 

  47. Dai HB, Shi HL, Liu W, Wang L, Liu Y, Mei T (2022) FasterPose: a faster simple baseline for human pose estimation. ACM Trans Multimed Comput 18:1–16

    Article  Google Scholar 

  48. Ding X, Guo Y, Ding G, Han J (2019) ACNet: strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 1911–1920

  49. Cai YH, Wang ZC, Luo ZX, Yin B, Du A, Wang H, Zhang X, Zhou X, Zhou E, Sun J (2020) Learning delicate local representations for multi-person pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 23–28

  50. Lin TY, Maire M, Belongie S, Girshick R, Bourdev L, Hays J, Perona P, Ramanan D, Zitnick CL, Dollar P (2014) Microsoft COCO: common objects in context. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 5–12

  51. Wang Z, Li W, Yin B et al (2018) Mscoco keypoints challenge. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 8–14

  52. Kingma D, Ba J (2015) Adam: a method for stochastic optimization. CoRR (ACM). 2015, abs/1412.6980

  53. Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 23–28

  54. Yu C, Xiao B, Gao C, Yuan L, Zhang L, Sang N, Wang J (2021) Lite-HRNet: a lightweight high-resolution network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10435–10445

  55. Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 10–16

  56. He K, Girshick R, Dollar P (2019) Rethinking ImageNet pre-training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 3059–3062

  57. Huang J, Zhu Z, Guo F, Huang G (2020) The devil is in the details: delving into unbiased data processing for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5699–5708

Download references

Funding

This research was funded by the National Key Research and Development Program of China, Grant No. 2022YFB2503405, and the Natural Science Foundation of Jilin Province, Grant No. 20210101061JC.

Author information

Authors and Affiliations

Authors

Contributions

Author Contributions: “Con-ceptualization, Huan Liu; methodology, Huan Liu; software, Huan Liu; validation, Huan Liu; formal analysis, Huan Liu; investigation, Huan Liu; resources, Huan Liu; data curation, Huan Liu; writing—original draft preparation, Huan Liu; writing—review and editing, Huan Liu; visualization, Huan Liu; supervision, Huan Liu, Jian Wu and Rui He; project administration, Jian Wu and Rui He; funding acquisition, Jian Wu and Rui He. All authors have read and agreed to the published version of the manuscript.”

Corresponding author

Correspondence to Rui He.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Data availability

The datasets used during the current study are available from https://cocodataset.org/#download and http://human-pose.mpi-inf.mpg.de/.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Wu, J. & He, R. IDPNet: a light-weight network and its variants for human pose estimation. J Supercomput 80, 6169–6191 (2024). https://doi.org/10.1007/s11227-023-05691-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05691-5

Keywords

Navigation