A lightweight pose estimation network with multi-scale receptive field

Li, Shuo; Dai, Ju; Chen, Zhangmeng; Pan, Junjun

doi:10.1007/s00371-023-02953-4

A lightweight pose estimation network with multi-scale receptive field

Original article
Published: 25 June 2023

Volume 39, pages 3429–3440, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Shuo Li^1,2,
Ju Dai ORCID: orcid.org/0000-0002-9397-8539²,
Zhangmeng Chen^1,2 &
…
Junjun Pan^1,2

365 Accesses
1 Citation
Explore all metrics

Abstract

Existing lightweight networks perform inferior to large-scale models in human pose estimation because of shallow model depths and limited receptive fields. Current approaches utilize large convolution kernels or attention mechanisms to encourage long-range receptive field learning at the expense of model redundancy. In this paper, we propose a novel Multi-scale Field Lightweight High-resolution Network (MFite-HRNet) for human pose estimation. Specifically, our model mainly consists of two lightweight blocks, a Multi-scale Receptive Field Block (MRB) and a Large Receptive Field Block (LRB), to learn informative multi-scale and long-range spatial context information. The MRB utilizes group depthwise dilation convolutions with varied dilation rates to extract multi-scale spatial relationships from different feature maps. The LRB leverages large depthwise convolution kernels to model large-range spatial knowledge at the low-level features. We apply MFite-HRNet to single-person and multi-person pose estimation tasks. Experiments on COCO, MPII, and CrowdPose datasets demonstrate that our network outperforms current state-of-the-art lightweight networks in either single-person or multi-person pose estimation tasks. The source code will be publicly available at https://github.com/lskdje/MFite-HRNet.git.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Data Availability

Data are available on reasonable request from the corresponding author.

Notes

Small HRNet is available at https://github.com/HRNet/HRNet-Semantic-Segmentation. It simply reduces the depths and widths of the original HRNet.

References

Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV, pp. 483–499 (2016)
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: CVPR, pp. 7103–7112 (2018)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR, pp. 5693–5703 (2019)
Wang, C.-H., Huang, K.-Y., Yao, Y., Chen, J.-C., Shuai, H.-H., Cheng, W.-H.: Lightweight deep learning: an overview. IEEE CONSUM ELECTR M, 1–12 (2022) https://doi.org/10.1109/MCE.2022.3181759
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv PrePrint: arXiv:1704.04861 (2017)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: CVPR, pp. 6848–6856 (2018)
Yu, C., Xiao, B., Gao, C., Yuan, L., Zhang, L., Sang, N., Wang, J.: Lite-hrnet: A lightweight high-resolution network. In: CVPR, pp. 10440–10450 (2021)
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: ECCV, pp. 116–131 (2018)
Li, Q., Zhang, Z., Xiao, F., Zhang, F., Bhanu, B.: Dite-hrnet: Dynamic lightweight high-resolution network for human pose estimation. In: IJCAI, pp. 1095–1101 (2022)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014)
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: CVPR, pp. 3686–3693 (2014)
Li, J., Wang, C., Zhu, H., Mao, Y., Fang, H.-S., Lu, C.: Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In: CVPR, pp. 10863–10872 (2019)
Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR, pp. 4724–4732 (2016)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: ECCV, pp. 466–481 (2018)
Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: ECCV, pp. 529–545 (2018)
Fang, H.-S., Xie, S., Tai, Y.-W., Lu, C.: Rmpe: Regional multi-person pose estimation. In: ICCV, pp. 2334–2343 (2017)
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: CVPR, pp. 5386–5395 (2020)
Geng, Z., Sun, K., Xiao, B., Zhang, Z., Wang, J.: Bottom-up human pose estimation via disentangled keypoint regression. In: CVPR, pp. 14676–14686 (2021)
Jin, S., Liu, W., Xie, E., Wang, W., Qian, C., Ouyang, W., Luo, P.: Differentiable hierarchical graph grouping for multi-person pose estimation. In: ECCV, pp. 718–734 (2020)
Kreiss, S., Bertoni, L., Alahi, A.: Pifpaf: Composite fields for human pose estimation. In: CVPR, pp. 11977–11986 (2019)
Kendall, A., Grimes, M., Cipolla, R.: Posenet: A convolutional network for real-time 6-dof camera relocalization. In: ICCV, pp. 2938–2946 (2015)
Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR, pp. 7291–7299 (2017)
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In: ECCV, pp. 34–50 (2016)
Newell, A., Huang, Z., Deng, J.: Associative embedding: End-to-end learning for joint detection and grouping. In: NeurIPS, pp. 2277–2287 (2017)
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)
Howard, A., Pang, R., Adam, H., Le, Q.V., Sandler, M., Chen, B., Wang, W., Chen, L., Tan, M., Chu, G., Vasudevan, V., Zhu, Y.: Searching for mobilenetv3. In: ICCV, pp. 1314–1324 (2019)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI 40(4), 834–848 (2017)
Article Google Scholar
Neff, C., Sheth, A., Furgurson, S., Tabkhi, H.: Efficienthrnet: Efficient scaling for lightweight high-resolution multi-person pose estimation. arXiv preprint arXiv:2012.14214 (2020)
Wang, Y., Li, M., Cai, H., Chen, W.-M., Han, S.: Lite pose: Efficient architecture design for 2d human pose estimation. In: CVPR, pp. 13126–13136 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Contributors, M.: OpenMMLab pose estimation toolbox and benchmark. https://github.com/open-mmlab/mmpose (2020)
Huang, J., Zhu, Z., Guo, F., Huang, G.: The devil is in the details: delving into unbiased data processing for human pose estimation. In: CVPR, pp. 5700–5709 (2020)
Cai, H., Chen, T., Zhang, W., Yu, Y., Wang, J.: Efficient architecture search by network transformation. In: AAAI, vol. 32 (2018)

Download references

Acknowledgements

This research is supported by National Key R\( { \& }\)D Program of China (No. 2022ZD0115902) and National Natural Science Foundation of China (Nos. 62102208, 62272017, U20A20195, 62172437).

Author information

Authors and Affiliations

Beihang University, Haidian, Beijing, 100191, China
Shuo Li, Zhangmeng Chen & Junjun Pan
Peng Cheng Laboratory, Nanshan, Shenzhen, 518000, China
Shuo Li, Ju Dai, Zhangmeng Chen & Junjun Pan

Authors

Shuo Li
View author publications
You can also search for this author in PubMed Google Scholar
Ju Dai
View author publications
You can also search for this author in PubMed Google Scholar
Zhangmeng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Junjun Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ju Dai or Junjun Pan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, S., Dai, J., Chen, Z. et al. A lightweight pose estimation network with multi-scale receptive field. Vis Comput 39, 3429–3440 (2023). https://doi.org/10.1007/s00371-023-02953-4

Download citation

Accepted: 09 June 2023
Published: 25 June 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00371-023-02953-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A lightweight pose estimation network with multi-scale receptive field

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A lightweight pose estimation network with multi-scale receptive field

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation