FSR: a feature self-regulation network for partially occluded hand pose estimation

Lin, Xiangbo; Li, Yibo; Zhou, Yidan; Sun, Yi; Ma, Xiaohong

doi:10.1007/s11760-021-02069-z

FSR: a feature self-regulation network for partially occluded hand pose estimation

Original Paper
Published: 03 January 2022

Volume 16, pages 1187–1195, (2022)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Xiangbo Lin¹,
Yibo Li¹,
Yidan Zhou¹,
Yi Sun¹ &
…
Xiaohong Ma¹

435 Accesses
1 Altmetric
Explore all metrics

Abstract

Hand pose estimation is important for many applications, but the performance is not satisfying when the hand is interacting with objects. To alleviate the influence of unknown objects, we propose a novel network which makes full use of the multimodal information of the RGB-D images. The network can use the color features and/or the depth features selectively according to the prediction result of whether the hand is severely occluded or slightly occluded. We also use a new principal feature enhancement structure with an irrelevant feature weakening strategy to make the pose estimation more accurate. The FHAD dataset is used in the experiments for the performance evaluation. For ‘action-split’ data group and ‘subject-split’ data group, the obtained mean joint error is 10.63 mm and 10.61mm, respectively. These results are better than those of the state-of-the-art methods. For ‘object-split’ data group, the obtained mean joint error is 17.42mm, which is on par with the best results so far. The experimental results show the effectiveness of the proposed architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Adaptive Joint Interdependency Learning for 2D Occluded Hand Pose Estimation

Hand pose estimation based on regression method from monocular RGB cameras for handling occlusion

Article 01 August 2023

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

References

Ghaderi, Z., Khotanlou, H.: Weakly supervised pairwise Frank-Wolfe algorithm to recognize a sequence of human actions in RGB-D videos. SIViP 13(8), 1619–1627 (2019)
Article Google Scholar
Zhang, Y.-X., Zhang, H.-B., Du, J.-X., Lei, Q., Yang, L., Zhong, B.: RGB+2D skeleton: local hand-crafted and 3D convolution feature coding for action recognition. SIViP 15(2), 1379–1386 (2021)
Article Google Scholar
Deng, X., Zhang, Y., Shi, J., Zhu, Y., Cheng, D., Zuo, D., Cui, Z., Tan, P., Chang, L., Wang, H.: Hand Pose Understanding With Large-Scale Photo-Realistic Rendering Dataset. IEEE Trans. Image Process. 30, 4275–4290 (2021)
Article Google Scholar
Ge, L., Cai, Y., Weng, J., Yuan, J.: Hand PointNet: 3D hand pose estimation using point sets. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8417–8426 (2018)
Neverova, N., Wolf, C., Nebout, F., Taylor, G.W.: Hand pose estimation through semi-supervised and weakly-supervised learning. Comput. Vis. Image Underst. 164, 56–67 (2017)
Article Google Scholar
Oberweger, M., Lepetit, V.: DeepPrior++: Improving fast and accurate 3D hand pose estimation. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 585–594 (2017)
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. 33(5), 1–10 (2014)
Article Google Scholar
Xu, C., Govindarajan, L.N., Zhang, Y., Cheng, L.: Lie-X: Depth image based articulated object pose estimation, Tracking, and Action Recognition on Lie Groups. Int. J. Comput. Vision 123(3), 454–478 (2017)
Article MathSciNet Google Scholar
Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: IEEE International Conference on Computer Vision (ICCV), pp. 4903-4911 (2017)
Cai, Y., Ge, L., Cai, J.: 3D hand pose estimation using synthetic data and weakly labeled RGB images. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3739–3753 (2021)
Article Google Scholar
Wang, Y., Zhang, B., Peng, C.: Srhandnet: Real-time 2D hand pose estimation with simultaneous region localization. IEEE Trans. Image Process. 29, 2977–2986 (2019)
Article Google Scholar
Deng, X., Zhu, Y., Zhang, Y., Cui, Z., Tan, P., Qu, W., Wang, H.: Weakly supervised learning for single depth-based hand shape recovery. IEEE Trans. Image Process. 30, 532–545 (2020)
Article Google Scholar
Yuan, S., Garcia-Hernando, G., Stenger, B., Moon, G., Chang, J. Y., Lee, K. M.,Kim, T. K.: Depth-based 3D hand pose estimation: From current achievements to future goals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2636-2645 (2018)
Choi, C., Ho Yoon, S., Chen, C. N., Ramani, K.: Robust hand pose estimation during the interaction with an unknown object. In: IEEE International Conference on Computer Vision (ICCV), pp. 3123-3132 (2017)
Oberweger, M., Wohlhart, P., Lepetit, V.: Generalized feedback loop for joint hand-object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 1898–1912 (2020)
Article Google Scholar
Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M. J., Laptev, I., Schmid, C.: Learning joint reconstruction of hands and manipulated objects. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11807-11816 (2019)
Kulon, D., Guler, R. A., Kokkinos, I., Bronstein, M. M., Zafeiriou, S.: Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4990-5000 (2020)
Boukhayma, A., Bem, R. D., Torr, P. H.: 3D hand shape and pose from images in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10843-10852 (2019)
Hampali, S., Oberweger, M., Rad, M., Lepetit, V.: HO-3D: A multi-user, multi-object dataset for joint 3D hand-object pose estimation (2019). arXiv:1907.01481
Doosti, B., Naha, S., Mirbagheri, M., Crandall, D. J.: HOPE-Net: A graph-based model for hand-object pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6608-6617 (2020)
Baek, S., Kim, K. I., Kim, T. K.: Weakly-supervised domain adaptation via GAN and mesh model for estimating 3D hand poses interacting objects. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6121-6131 (2020)
Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: IEEE International Conference on Computer Vision (ICCV), pp. 1284-1293 (2017)
Panteleris, P., Kyriazis, N., Argyros, A. A.: 3D tracking of human hands in interaction with unknown objects. In: British Machine Vision Conference (BMVC), pp. 123.1-123.12 (2015)
Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. Int. J. Comput. Vision 118(2), 172–193 (2016)
Article MathSciNet Google Scholar
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T. K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409-419 (2018)
Sridhar, S., Mueller, F., Zollhöfer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 294-310 (2016)
Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., Theobalt, C.: Ganerated hands for real-time 3D hand tracking from monocular RGB. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49-59 (2018)
Goudie, D., Galata, A.: 3d hand-object pose estimation from depth with convolutional neural networks. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 406-413 (2017)
Chen, Y., Tu, Z., Kang, D., Chen, R., Bao, L., Zhang, Z., Yuan, J.: Joint hand-object 3D reconstruction from a single image with cross-branch feature fusion. IEEE Trans. Image Process. 30, 4008–4021 (2021)
Article Google Scholar
Armagan, A., Garcia-Hernando, G., Baek, S., Hampali, S., Rad, M., Zhang, Z., Xie, S., Chen, M. S., Zhang, B., Xiong, F., Xiao, Y.: Measuring generalisation to unseen viewpoints, articulations, shapes and objects for 3D hand pose estimation under hand-object interaction (2020). arXiv:2003.13764
de La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3D hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)
Article Google Scholar
Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 89-98 (2018)
Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 666-682 (2018)
Iqbal, U., Molchanov, P., Breuel Juergen Gall, T., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 118-134 (2018)
Rad, M., Oberweger, M., Lepetit, V.: Domain transfer for 3D pose estimation from color images without manual annotations. In: Asian Conference on Computer Vision (ACCV), pp. 69-84 (2018)
Yang, L., Yao, A.: Disentangling latent hands for image synthesis and pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9877-9886 (2019)
Ge, L., Ren, Z., Li, Y., Xue, Z., Wang, Y., Cai, J., Yuan, J.: 3D hand shape and pose estimation from a single RGB image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10833-10842 (2019)
Liu, J., Ding, H., Shahroudy, A., Duan, L.Y., Jiang, X., Wang, G., Kot, A.C.: Feature boosting network for 3D pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 494–501 (2019)
Article Google Scholar
Zhou, Y., Habermann, M., Xu, W., Habibie, I., Theobalt, C., Xu, F.: Monocular real-time hand shape and motion capture using multi-modal data. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5346-5355 (2020)
Tekin, B., Bogo, F., Pollefeys, M.: H+O: Unified egocentric recognition of 3D hand-object poses and interactions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4511-4520 (2019)
Zhou, Y., Lu, J., Du, K., Lin, X., Sun, Y., Ma, X.: HBE: Hand branch ensemble network for realtime 3D hand pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 501-516 (2018)
Baek, S., In Kim, K., Kim, T. K.: Augmented skeleton space transfer for depth-based hand pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8330-8339 (2018)
Wan, C., Probst, T., Van Gool, L., Yao, A.: Dense 3D regression for hand pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5147-5156 (2018)
Moon, G., Yong Chang, J., Mu Lee, K.: V2V-posenet: Voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5079-5088 (2018)
Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1991-2000 (2017)
Du, K., Lin, X., Sun, Y., Ma, X.: CrossInfoNet: multi-task information sharing based hand pose estimation, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9896-9905 (2019)
Wan, C., Probst, T., Gool, L. V., Yao, A.: Self-supervised 3D hand pose estimation through training by fitting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10853-10862 (2019)
Xiong, F., Zhang, B., Xiao, Y., Cao, Z., Yu, T., Zhou, J. T., Yuan, J.: A2J: Anchor-to-Joint regression network for 3D articulated pose estimation from a single depth image. In: IEEE International Conference on Computer Vision (ICCV), pp. 793-802 (2019)
Malik, J., Abdelaziz, I., Elhayek, A., Shimada, S., Ali, S. A., Golyanik, V., Theobalt, C., Stricker, D.: HandVoxNet: Deep voxel-based network for 3D hand shape and pose estimation from a single depth map. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7113-7122 (2020)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5693-5703 (2019)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132-7141 (2018)
Yuan, S., Garcia-Hernando, G., Stenger, B., Moon, G., Chang, J. Y., Lee, K. M., Kim, T. K.: Depth-based 3D hand pose estimation: From current achievements to future goals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2636-2645 (2018)
Hasson, Y., Tekin, B., Bogo, F., Laptev, I., Pollefeys, M., Schmid, C.: Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 571-580 (2020)

Download references

Funding

The work was supported by the National Natural Science Foundation of China [grant numbers 61873046,U17 08263].

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian University of Technology, Dalian, China
Xiangbo Lin, Yibo Li, Yidan Zhou, Yi Sun & Xiaohong Ma

Authors

Xiangbo Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yibo Li
View author publications
You can also search for this author in PubMed Google Scholar
Yidan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohong Ma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XiangBo Lin contributed to conceptualization, methodology, formal analysis, investigation, resources, writing—original draft, writing—review and editing, visualization, supervision, project administration and funding acquisition. YiBo Li provided software and was involved in methodology, validation, formal analysis, investigation, data curation, writing—original draft, editing and visualization. YiDan Zhou provided software and contributed to validation and data curation. Yi Sun was involved in conceptualization, methodology, formal analysis, investigation, resources, writing—original draft, writing—review and editing, supervision, project administration and funding acquisition. XiaoHong Ma contributed to conceptualization, methodology, formal analysis, investigation and supervision.

Corresponding author

Correspondence to Yi Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Availability of data and material

The datasets generated or analyzed during the current study are available online, referring to [25].

Code availability

The codes generated or analyzed during the current study are available from the corresponding author on reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, X., Li, Y., Zhou, Y. et al. FSR: a feature self-regulation network for partially occluded hand pose estimation. SIViP 16, 1187–1195 (2022). https://doi.org/10.1007/s11760-021-02069-z

Download citation

Received: 21 January 2021
Revised: 03 June 2021
Accepted: 24 October 2021
Published: 03 January 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11760-021-02069-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FSR: a feature self-regulation network for partially occluded hand pose estimation

Abstract

Access this article

Similar content being viewed by others

Adaptive Joint Interdependency Learning for 2D Occluded Hand Pose Estimation

Hand pose estimation based on regression method from monocular RGB cameras for handling occlusion

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Availability of data and material

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FSR: a feature self-regulation network for partially occluded hand pose estimation

Abstract

Access this article

Similar content being viewed by others

Adaptive Joint Interdependency Learning for 2D Occluded Hand Pose Estimation

Hand pose estimation based on regression method from monocular RGB cameras for handling occlusion

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Availability of data and material

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation