Skip to main content
Log in

Highly efficient gaze estimation method using online convolutional re-parameterization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Existing gaze estimation methods with multi-branch structures significantly improve accuracy but come at the cost of extra training overhead and slow inference speed. In this paper. We propose a hybrid model combining online re-parameterization structures and improved transformer encoders for precise and efficient gaze estimation that significantly reduces training requirements while accelerating inference speed. Our multi-branch model employs online re-parameterization structures to extract multi-scale gaze-related features and can be equivalently transformed into a single-branch model during training and inference to achieve significant cost savings and operational improvements. Moreover, we employ transformer encoders to enhance the global correlation of gaze-related features. To offset performance degradation when the conventional position embeddings that affect the inference speed of encoders are removed, we substitute zero-padding position embeddings for the conventional position embeddings to facilitate encoders to learn absolute position information without introducing additional inference costs. Our experimental results demonstrate that the proposed model achieves improved performance on multiple datasets while saving the training time by 57%, memory usage by 36%, and accelerating the inference speed by 26%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Code Availability

The code is available upon reasonable request.

References

  1. Bao Y, Cheng Y, Liu Y et al (2021) Adaptive feature fusion network for gaze tracking in mobile tablets. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 9936–9943

  2. Chen Z, Shi BE (2018) Appearance-based gaze estimation using dilated-convolutions. In: Asian conference on computer vision. Springer, pp 309–324

  3. Cheng Y, Lu F (2022) Gaze estimation using transformer. In: 2022 26th international conference on pattern recognition (ICPR). IEEE, pp 3341–3347

  4. Cheng Y, Lu F, Zhang X (2018) Appearance-based gaze estimation via evaluation-guided asymmetric regression. In: Proceedings of the European conference on computer vision (ECCV). pp 100–115

  5. Cheng Y, Huang S, Wang F et al (2020) A coarse-to-fine adaptive network for appearance-based gaze estimation. In: Proceedings of the AAAI conference on artificial intelligence. pp 10623–10630

  6. Cheng Y, Zhang X, Lu F et al (2020) Gaze estimation by exploring two-eye asymmetry. IEEE Trans Image Process 29:5259–5272

    Article  Google Scholar 

  7. Cheng Y, Wang H, Bao Y et al (2021) Appearance-based gaze estimation with deep learning: a review and benchmark. arXiv:2104.12668

  8. Ding X, Guo Y, Ding G et al (2019) ACNet: strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

  9. Ding X, Zhang X, Han J et al (2021) Diverse branch block: building a convolution as an inception-like unit. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10886–10895

  10. Ding X, Zhang X, Ma N et al (2021) RepVGG: making VGG-style convnets great again. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 13733–13742

  11. Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929

  12. Fischer T, Chang HJ, Demiris Y (2018) RT-GENE: real-time eye gaze estimation in natural environments. In: Proceedings of the European conference on computer vision (ECCV). pp 334–352

  13. Funes Mora KA, Monay F, Odobez JM (2014) Eyediap: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the symposium on eye tracking research and applications. pp 255–258

  14. Guestrin ED, Eizenman M (2006) General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Trans Biomed Eng 53(6):1124–1133

    Article  Google Scholar 

  15. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778

  16. Hu M, Feng J, Hua J et al (2022) Online convolutional re-parameterization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 568–577

  17. Huang T, You S, Zhang B et al (2022) DyRep: bootstrapping training with dynamic re-parameterization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 588–597

  18. Kellnhofer P, Recasens A, Stent S et al (2019) Gaze360: physically unconstrained gaze estimation in the wild. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6912–6921

  19. Krafka K, Khosla A, Kellnhofer P et al (2016) Eye tracking for everyone. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2176–2184

  20. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25

  21. Li Y, Zhang K, Cao J et al (2021) LocalViT: bringing locality to vision transformers. arXiv:2104.05707

  22. Ma N, Zhang X, Zheng HT et al (2018) ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European conference on computer vision (ECCV). pp 116–131

  23. Martin S, Vora S, Yuen K et al (2018) Dynamics of driver’s gaze: explorations in behavior modeling and maneuver prediction. IEEE Trans Intell Veh 3(2):141–150

    Article  Google Scholar 

  24. Massé B, Ba S, Horaud R (2017) Tracking gaze and visual focus of attention of people involved in social interaction. IEEE Trans Pattern Anal Mach Intell 40(11):2711–2724

    Article  Google Scholar 

  25. Meißner M, Oll J (2019) The promise of eye-tracking methodology in organizational research: a taxonomy, review, and future avenues. Organ Res Methods 22(2):590–617

    Article  Google Scholar 

  26. Murthy L, Biswas P (2021) Appearance-based gaze estimation using attention and difference mechanism. In: 2021 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 3137–3146

  27. O Oh J, Chang HJ, Choi SI (2022) Self-attention with convolution and deconvolution for efficient eye gaze estimation from a full face image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4992–5000

  28. Ren D, Chen J, Zhong J et al (2021) Gaze estimation via bilinear pooling-based attention networks. Journal of Visual CommunImage Represent 81:103369. https://doi.org/10.1016/j.jvcir.2021.103369

    Article  Google Scholar 

  29. Shishido E, Ogawa S, Miyata S et al (2019) Application of eye trackers for understanding mental disorders: cases for schizophrenia and autism spectrum disorder. Neuropsychopharmacology Rep 39(2):72–77. https://doi.org/10.1002/npr2.12046

    Article  Google Scholar 

  30. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  31. Vasu PKA, Gabriel J, Zhu J et al (2023) FastViT: a fast hybrid vision transformer using structural reparameterization. arXiv:2303.14189

  32. Vasu PKA, Gabriel J, Zhu J et al (2023) Mobileone: an improved one millisecond mobile backbone. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7907–7917

  33. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30

  34. Wadekar SN, Chaurasia A (2022) MobileViTv3: mobile-friendly vision transformer with simple and effective fusion of local, global and input features. arXiv:2209.15159

  35. Wang W, Xie E, Li X et al (2022) PVT v2: improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424

    Article  Google Scholar 

  36. Wang X, Zhou J, Wang L et al (2023) BoT2L-Net: appearance-based gaze estimation using bottleneck transformer block and two identical losses in unconstrained environments. Electron 12(7). https://doi.org/10.3390/electronics12071704

  37. Xu T, Wu B, Fan R et al (2023) FR-Net: a light-weight FFT residual net for gaze estimation. arXiv:2305.11875

  38. Xu Y, Dong Y, Wu J et al (2018) Gaze prediction in dynamic 360 immersive videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5333–5342

  39. Zhang X, Sugano Y, Fritz M et al (2015) Appearance-based gaze estimation in the wild. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  40. Zhang X, Sugano Y, Fritz M et al (2017) It’s written all over your face: full-face appearance-based gaze estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp 51–60

  41. Zhang X, Sugano Y, Fritz M et al (2019) MPIIGaze: real-world dataset and deep appearance-based gaze estimation. IEEE Trans Pattern Anal Mach Intell 41(1):162–175. https://doi.org/10.1109/TPAMI.2017.2778103

    Article  Google Scholar 

  42. Zhang X, Park S, Beeler T et al (2020) ETH-XGaze: a large scale dataset for gaze estimation under extreme head pose and gaze variation. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16. Springer, pp 365–381

  43. Zhu Z, Ji Q (2007) Novel eye gaze tracking techniques under natural head movement. IEEE Trans Biomed Eng 54(12):2246–2260

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Natural Science Foundation of Jiangsu Province of China (Grant No. BK20180594); Natural Science Foundation of Jiangsu Province of China (Grant No. BK20231036).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to De Gu.

Ethics declarations

Conflict of Interest

The authors declare that there is no conflict of interest related to this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, D., Lv, M., Liu, J. et al. Highly efficient gaze estimation method using online convolutional re-parameterization. Multimed Tools Appl 83, 83867–83887 (2024). https://doi.org/10.1007/s11042-024-18941-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-024-18941-2

Keywords