Skip to main content

TBSA-Net: A Temperature-Based Structure-Aware Hand Pose Estimation Model in Infrared Images

  • Conference paper
  • First Online:
Green, Pervasive, and Cloud Computing (GPC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14504))

Included in the following conference series:

  • 103 Accesses

Abstract

In recent years, numerous researchers have conducted in-depth studies and made significant progress in 2D Hand Pose Estimation (HPE) tasks on RGB images. However, the field of HPE in the context of infrared images has received limited attention. Due to the limited channel information and high correlation with temperature, models designed for RGB images may suffer from insufficient accuracy in infrared images. Our experiments reveal that the temperature distributions of the human hand in infrared images exhibit significant regularity. In this paper, we propose the Temperature-Based Hand Judgement Model (TB-HJM) that leverages this characteristic. During the training phase, a higher penalty is given when the predicted pose’s temperature distribution does not align with the actual temperature distribution, and vice versa. In the testing phase, TB-HJM is utilized to select a hand proposal that closely matches the temperature distribution as the final output. Additionally, to address the lack of visual information in infrared images, we use PBNHead and GCN Refine Module to merge structural information into the network to ensure model accuracy. Experimental results demonstrate that our model outperforms the benchmark model (HRNet) by 1.72% in terms of AUC and achieves an improvement of 0.6448 by reducing the EPE from 3.02 to 2.38, achieving state-of-the-art performance on our infrared hand dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing network structure for 3D human pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2262–2271 (2019)

    Google Scholar 

  2. MMPose Contributors: Openmmlab pose estimation toolbox and benchmark (2020). https://github.com/open-mmlab/mmpose

  3. Geng, Z., Sun, K., Xiao, B., Zhang, Z., Wang, J.: Bottom-up human pose estimation via disentangled keypoint regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14676–14686 (2021)

    Google Scholar 

  4. Iqbal, U., Molchanov, P., Gall, T.B.J., Kautz, J.: Hand pose estimation via latent 2.5 D heatmap regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 118–134 (2018)

    Google Scholar 

  5. Jiang, T., et al.: RTMPose: real-time multi-person pose estimation based on MMPose. arXiv preprint arXiv:2303.07399 (2023)

  6. Jin, S., et al.: Whole-body human pose estimation in the wild. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 196–214. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_12

    Chapter  Google Scholar 

  7. Li, G., Muller, M., Thabet, A., Ghanem, B.: DeepGCNs: can GCNs go as deep as CNNs? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9267–9276 (2019)

    Google Scholar 

  8. Li, J., Bian, S., Zeng, A., Wang, C., Pang, B., Liu, W., Lu, C.: Human pose regression with residual log-likelihood estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11025–11034 (2021)

    Google Scholar 

  9. Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603 (2019)

    Google Scholar 

  10. Li, W., et al.: Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148 (2019)

  11. Li, Y., et al.: SimCC: a simple coordinate classification perspective for human pose estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, Part VI, pp. 89–106. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20068-7_6

    Chapter  Google Scholar 

  12. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  13. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  14. Qiu, L., et al.: Peeking into occluded joints: a novel framework for crowd pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 488–504. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_29

    Chapter  Google Scholar 

  15. Rogalski, A., Chrzanowski, K.: Infrared devices and techniques. Optoelectron. Rev. 10(2), 111–136 (2002)

    Google Scholar 

  16. Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)

  17. Santavas, N., Kansizoglou, I., Bampis, L., Karakasis, E., Gasteratos, A.: Attention! a lightweight 2D hand pose estimation approach. IEEE Sens. J. 21(10), 11488–11496 (2020)

    Article  Google Scholar 

  18. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

    Google Scholar 

  19. Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 529–545 (2018)

    Google Scholar 

  20. Tang, W., Wu, Y.: Does learning specific features for related parts help human pose estimation? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1107–1116 (2019)

    Google Scholar 

  21. Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)

    Google Scholar 

  22. Wang, D., Zhang, S.: Contextual instance decoupling for robust multi-person pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11060–11068 (2022)

    Google Scholar 

  23. Wang, Y., Peng, C., Liu, Y.: Mask-pose cascaded CNN for 2D hand pose estimation from single color image. IEEE Trans. Circuits Syst. Video Technol. 29(11), 3258–3268 (2018)

    Article  Google Scholar 

  24. Wang, Y., Zhang, B., Peng, C.: SRHandNet: real-time 2D hand pose estimation with simultaneous region localization. IEEE Trans. Image Process. 29, 2977–2986 (2019)

    Article  Google Scholar 

  25. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)

    Google Scholar 

  26. Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 466–481 (2018)

    Google Scholar 

  27. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  28. Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3D human pose regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3425–3435 (2019)

    Google Scholar 

  29. Zheng, C., et al.: Deep learning-based human pose estimation: a survey. arXiv preprint arXiv:2012.13392 (2020)

  30. Zhou, Z.H.: Machine Learning. Springer, Cham (2021)

    Google Scholar 

Download references

Acknowledgements

This research was supported by the National Key Research and Development Program of China under Grant No. 2022ZD0115403, National Natural Science Foundation of China under Grant No. 62072236, and the Fundamental Research Funds for the Central Universities under Grant No. 56XCA2205404.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunlong Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xia, H., Li, Y., Liu, C., Zhao, Y. (2024). TBSA-Net: A Temperature-Based Structure-Aware Hand Pose Estimation Model in Infrared Images. In: Jin, H., Yu, Z., Yu, C., Zhou, X., Lu, Z., Song, X. (eds) Green, Pervasive, and Cloud Computing. GPC 2023. Lecture Notes in Computer Science, vol 14504. Springer, Singapore. https://doi.org/10.1007/978-981-99-9896-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-9896-8_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-9895-1

  • Online ISBN: 978-981-99-9896-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics