Skip to main content
Log in

FSR: a feature self-regulation network for partially occluded hand pose estimation

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Hand pose estimation is important for many applications, but the performance is not satisfying when the hand is interacting with objects. To alleviate the influence of unknown objects, we propose a novel network which makes full use of the multimodal information of the RGB-D images. The network can use the color features and/or the depth features selectively according to the prediction result of whether the hand is severely occluded or slightly occluded. We also use a new principal feature enhancement structure with an irrelevant feature weakening strategy to make the pose estimation more accurate. The FHAD dataset is used in the experiments for the performance evaluation. For ‘action-split’ data group and ‘subject-split’ data group, the obtained mean joint error is 10.63 mm and 10.61mm, respectively. These results are better than those of the state-of-the-art methods. For ‘object-split’ data group, the obtained mean joint error is 17.42mm, which is on par with the best results so far. The experimental results show the effectiveness of the proposed architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Ghaderi, Z., Khotanlou, H.: Weakly supervised pairwise Frank-Wolfe algorithm to recognize a sequence of human actions in RGB-D videos. SIViP 13(8), 1619–1627 (2019)

    Article  Google Scholar 

  2. Zhang, Y.-X., Zhang, H.-B., Du, J.-X., Lei, Q., Yang, L., Zhong, B.: RGB+2D skeleton: local hand-crafted and 3D convolution feature coding for action recognition. SIViP 15(2), 1379–1386 (2021)

    Article  Google Scholar 

  3. Deng, X., Zhang, Y., Shi, J., Zhu, Y., Cheng, D., Zuo, D., Cui, Z., Tan, P., Chang, L., Wang, H.: Hand Pose Understanding With Large-Scale Photo-Realistic Rendering Dataset. IEEE Trans. Image Process. 30, 4275–4290 (2021)

    Article  Google Scholar 

  4. Ge, L., Cai, Y., Weng, J., Yuan, J.: Hand PointNet: 3D hand pose estimation using point sets. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8417–8426 (2018)

  5. Neverova, N., Wolf, C., Nebout, F., Taylor, G.W.: Hand pose estimation through semi-supervised and weakly-supervised learning. Comput. Vis. Image Underst. 164, 56–67 (2017)

    Article  Google Scholar 

  6. Oberweger, M., Lepetit, V.: DeepPrior++: Improving fast and accurate 3D hand pose estimation. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 585–594 (2017)

  7. Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. 33(5), 1–10 (2014)

    Article  Google Scholar 

  8. Xu, C., Govindarajan, L.N., Zhang, Y., Cheng, L.: Lie-X: Depth image based articulated object pose estimation, Tracking, and Action Recognition on Lie Groups. Int. J. Comput. Vision 123(3), 454–478 (2017)

    Article  MathSciNet  Google Scholar 

  9. Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: IEEE International Conference on Computer Vision (ICCV), pp. 4903-4911 (2017)

  10. Cai, Y., Ge, L., Cai, J.: 3D hand pose estimation using synthetic data and weakly labeled RGB images. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3739–3753 (2021)

    Article  Google Scholar 

  11. Wang, Y., Zhang, B., Peng, C.: Srhandnet: Real-time 2D hand pose estimation with simultaneous region localization. IEEE Trans. Image Process. 29, 2977–2986 (2019)

    Article  Google Scholar 

  12. Deng, X., Zhu, Y., Zhang, Y., Cui, Z., Tan, P., Qu, W., Wang, H.: Weakly supervised learning for single depth-based hand shape recovery. IEEE Trans. Image Process. 30, 532–545 (2020)

    Article  Google Scholar 

  13. Yuan, S., Garcia-Hernando, G., Stenger, B., Moon, G., Chang, J. Y., Lee, K. M.,Kim, T. K.: Depth-based 3D hand pose estimation: From current achievements to future goals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2636-2645 (2018)

  14. Choi, C., Ho Yoon, S., Chen, C. N., Ramani, K.: Robust hand pose estimation during the interaction with an unknown object. In: IEEE International Conference on Computer Vision (ICCV), pp. 3123-3132 (2017)

  15. Oberweger, M., Wohlhart, P., Lepetit, V.: Generalized feedback loop for joint hand-object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 1898–1912 (2020)

    Article  Google Scholar 

  16. Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M. J., Laptev, I., Schmid, C.: Learning joint reconstruction of hands and manipulated objects. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11807-11816 (2019)

  17. Kulon, D., Guler, R. A., Kokkinos, I., Bronstein, M. M., Zafeiriou, S.: Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4990-5000 (2020)

  18. Boukhayma, A., Bem, R. D., Torr, P. H.: 3D hand shape and pose from images in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10843-10852 (2019)

  19. Hampali, S., Oberweger, M., Rad, M., Lepetit, V.: HO-3D: A multi-user, multi-object dataset for joint 3D hand-object pose estimation (2019). arXiv:1907.01481

  20. Doosti, B., Naha, S., Mirbagheri, M., Crandall, D. J.: HOPE-Net: A graph-based model for hand-object pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6608-6617 (2020)

  21. Baek, S., Kim, K. I., Kim, T. K.: Weakly-supervised domain adaptation via GAN and mesh model for estimating 3D hand poses interacting objects. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6121-6131 (2020)

  22. Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: IEEE International Conference on Computer Vision (ICCV), pp. 1284-1293 (2017)

  23. Panteleris, P., Kyriazis, N., Argyros, A. A.: 3D tracking of human hands in interaction with unknown objects. In: British Machine Vision Conference (BMVC), pp. 123.1-123.12 (2015)

  24. Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. Int. J. Comput. Vision 118(2), 172–193 (2016)

    Article  MathSciNet  Google Scholar 

  25. Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T. K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409-419 (2018)

  26. Sridhar, S., Mueller, F., Zollhöfer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 294-310 (2016)

  27. Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., Theobalt, C.: Ganerated hands for real-time 3D hand tracking from monocular RGB. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49-59 (2018)

  28. Goudie, D., Galata, A.: 3d hand-object pose estimation from depth with convolutional neural networks. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 406-413 (2017)

  29. Chen, Y., Tu, Z., Kang, D., Chen, R., Bao, L., Zhang, Z., Yuan, J.: Joint hand-object 3D reconstruction from a single image with cross-branch feature fusion. IEEE Trans. Image Process. 30, 4008–4021 (2021)

    Article  Google Scholar 

  30. Armagan, A., Garcia-Hernando, G., Baek, S., Hampali, S., Rad, M., Zhang, Z., Xie, S., Chen, M. S., Zhang, B., Xiong, F., Xiao, Y.: Measuring generalisation to unseen viewpoints, articulations, shapes and objects for 3D hand pose estimation under hand-object interaction (2020). arXiv:2003.13764

  31. de La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3D hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)

    Article  Google Scholar 

  32. Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 89-98 (2018)

  33. Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 666-682 (2018)

  34. Iqbal, U., Molchanov, P., Breuel Juergen Gall, T., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 118-134 (2018)

  35. Rad, M., Oberweger, M., Lepetit, V.: Domain transfer for 3D pose estimation from color images without manual annotations. In: Asian Conference on Computer Vision (ACCV), pp. 69-84 (2018)

  36. Yang, L., Yao, A.: Disentangling latent hands for image synthesis and pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9877-9886 (2019)

  37. Ge, L., Ren, Z., Li, Y., Xue, Z., Wang, Y., Cai, J., Yuan, J.: 3D hand shape and pose estimation from a single RGB image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10833-10842 (2019)

  38. Liu, J., Ding, H., Shahroudy, A., Duan, L.Y., Jiang, X., Wang, G., Kot, A.C.: Feature boosting network for 3D pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 494–501 (2019)

    Article  Google Scholar 

  39. Zhou, Y., Habermann, M., Xu, W., Habibie, I., Theobalt, C., Xu, F.: Monocular real-time hand shape and motion capture using multi-modal data. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5346-5355 (2020)

  40. Tekin, B., Bogo, F., Pollefeys, M.: H+O: Unified egocentric recognition of 3D hand-object poses and interactions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4511-4520 (2019)

  41. Zhou, Y., Lu, J., Du, K., Lin, X., Sun, Y., Ma, X.: HBE: Hand branch ensemble network for realtime 3D hand pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 501-516 (2018)

  42. Baek, S., In Kim, K., Kim, T. K.: Augmented skeleton space transfer for depth-based hand pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8330-8339 (2018)

  43. Wan, C., Probst, T., Van Gool, L., Yao, A.: Dense 3D regression for hand pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5147-5156 (2018)

  44. Moon, G., Yong Chang, J., Mu Lee, K.: V2V-posenet: Voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5079-5088 (2018)

  45. Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1991-2000 (2017)

  46. Du, K., Lin, X., Sun, Y., Ma, X.: CrossInfoNet: multi-task information sharing based hand pose estimation, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9896-9905 (2019)

  47. Wan, C., Probst, T., Gool, L. V., Yao, A.: Self-supervised 3D hand pose estimation through training by fitting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10853-10862 (2019)

  48. Xiong, F., Zhang, B., Xiao, Y., Cao, Z., Yu, T., Zhou, J. T., Yuan, J.: A2J: Anchor-to-Joint regression network for 3D articulated pose estimation from a single depth image. In: IEEE International Conference on Computer Vision (ICCV), pp. 793-802 (2019)

  49. Malik, J., Abdelaziz, I., Elhayek, A., Shimada, S., Ali, S. A., Golyanik, V., Theobalt, C., Stricker, D.: HandVoxNet: Deep voxel-based network for 3D hand shape and pose estimation from a single depth map. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7113-7122 (2020)

  50. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5693-5703 (2019)

  51. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132-7141 (2018)

  52. Yuan, S., Garcia-Hernando, G., Stenger, B., Moon, G., Chang, J. Y., Lee, K. M., Kim, T. K.: Depth-based 3D hand pose estimation: From current achievements to future goals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2636-2645 (2018)

  53. Hasson, Y., Tekin, B., Bogo, F., Laptev, I., Pollefeys, M., Schmid, C.: Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 571-580 (2020)

Download references

Funding

The work was supported by the National Natural Science Foundation of China [grant numbers 61873046,U17 08263].

Author information

Authors and Affiliations

Authors

Contributions

XiangBo Lin contributed to conceptualization, methodology, formal analysis, investigation, resources, writing—original draft, writing—review and editing, visualization, supervision, project administration and funding acquisition. YiBo Li provided software and was involved in methodology, validation, formal analysis, investigation, data curation, writing—original draft, editing and visualization. YiDan Zhou provided software and contributed to validation and data curation. Yi Sun was involved in conceptualization, methodology, formal analysis, investigation, resources, writing—original draft, writing—review and editing, supervision, project administration and funding acquisition. XiaoHong Ma contributed to conceptualization, methodology, formal analysis, investigation and supervision.

Corresponding author

Correspondence to Yi Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Availability of data and material

The datasets generated or analyzed during the current study are available online, referring to [25].

Code availability

The codes generated or analyzed during the current study are available from the corresponding author on reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, X., Li, Y., Zhou, Y. et al. FSR: a feature self-regulation network for partially occluded hand pose estimation. SIViP 16, 1187–1195 (2022). https://doi.org/10.1007/s11760-021-02069-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-021-02069-z

Keywords

Navigation