Abstract
Recent research on human pose estimation exploits complex structures to improve performance on benchmark datasets, ignoring the resource overhead and inference speed when the model is actually deployed. In this paper, we lighten the computation cost and parameters of the deconvolution head network in SimpleBaseline and introduce an attention mechanism that utilizes original, inter-level, and intra-level information to intensify the accuracy. Additionally, we propose a novel loss function called heatmap weighting loss, which generates weights for each pixel on the heatmap that makes the model more focused on keypoints. Experiments demonstrate our method achieves a balance between performance, resource volume, and inference speed. Specifically, our method can achieve 65.3 AP score on COCO test-dev, while the inference speed is 55 FPS and 18 FPS on the mobile GPU and CPU, respectively.
Also with China’s Belt & Road Joint Lab on Measur. & Contr. Tech., and National Key Lab of Multi-Spectral Information Intelligent Processing Technology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp. 3686–3693 (2014)
Bridgeman, L., Volino, M., Guillemaut, J.Y., Hilton, A.: Multi-person 3D pose estimation and tracking in sports. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019
Cai, Y., et al.: Learning delicate local representations for multi-person pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 455–472. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_27
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Contributors, M.: Openmmlab pose estimation toolbox and benchmark (2020). https://github.com/open-mmlab/mmpose
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Kumarapu, L., Mukherjee, P.: Animepose: multi-person 3D pose estimation and animation. Pattern Recogn. Lett. 147, 16–24 (2021)
Li, T., et al.: Automatic timed up-and-go sub-task segmentation for Parkinson’s disease patients using video-based activity classification. IEEE Trans. Neural Syst. Rehabil. Eng. 26(11), 2189–2199 (2018). https://doi.org/10.1109/TNSRE.2018.2875738
Li, W., et al.: Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148 (2019)
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Jan, Y., Sohel, F., Shiratuddin, M.F., Wong, K.W.: WNet: joint multiple head detection and head pose estimation from a spectator crowd image. In: Carneiro, G., You, S. (eds.) ACCV 2018. LNCS, vol. 11367, pp. 484–493. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21074-8_38
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: 2013 IEEE International Conference on Computer Vision, pp. 3487–3494 (2013). https://doi.org/10.1109/ICCV.2013.433
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Xiang, X., Chen, W., Zeng, D.: Intelligent target tracking and shooting system with mean shift. In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 417–421. IEEE (2008)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_29
Xu, L., et al.: Vipnas: efficient video pose estimation via neural architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16072–16081 (2021)
Yu, C., et al.: Lite-HrNet: a lightweight high-resolution network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10440–10450 (2021)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Acknowledgement
This research was supported by HUST Independent Innovation Research Fund (2021XXJS096), Sichuan University Interdisciplinary Innovation Research Fund (RD-03-202108), “Natural Science Fund of Hubei Province (2022Q252)”, “Alibaba Inno. Res. program (CRAQ7WHZ11220001-20978282)”, and the Key Lab of Image Processing and Intelligent Control, Ministry of Education, China.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, S., Xiang, X. (2023). Lightweight Human Pose Estimation Using Loss Weighted by Target Heatmap. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13643. Springer, Cham. https://doi.org/10.1007/978-3-031-37660-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-37660-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37659-7
Online ISBN: 978-3-031-37660-3
eBook Packages: Computer ScienceComputer Science (R0)