Skip to main content
Log in

Progressive local-to-global vision transformer for occluded face hallucination

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Hallucinating a photo-realistic high-resolution (HR) face image from an occluded low-resolution (LR) face image is beneficial for a series of face-related applications. However, previous efforts focused on either super-resolving HR face images from non-occluded LR counterparts or inpainting occluded HR faces. It is necessary to address all these challenges jointly for real-world face images in unconstrained environment. In this paper, we develop a novel Local-to-Global Face Hallucination Transformer (LGFH-Transformer), which simultaneously handles the occluded LR face image super-resolution (SR) and inpainting in a unified framework. Specifically, the LGFH-Transformer is built on self-attention modules which excel at modeling long-range information between image patch sequences. Meanwhile, we introduce a mask-guided convolution and gated mechanism into the building modules (i.e., multi-head attention and feed-forward network) of each Transformer block, which can bring in the complimentary strength of convolution operation to emphasize on the spatially local context. Moreover, equipped with the delicate designed local-to-global feature reasoning mechanism in the phase of encoder, we exploit facial geometry priors (i.e., facial parsing maps) as the semantic guidance during the hallucination process in the phase of decoder to reconstruct more realistic facial details. Extensive experiments demonstrate the effectiveness and advancement of LGFH-Transformer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

References

  1. Ballester C, Bertalmio M, Caselles V, Sapiro G, Verdera J (2001) Filling-in by joint interpolation of vector fields and gray levels. IEEE Trans Image Process 10(8):1200–1211. https://doi.org/10.1109/83.935036

    Article  MathSciNet  Google Scholar 

  2. Bertalmio M, Sapiro G, Caselles V, Ballester C (2000) Image inpainting. In: Proceedings of the 27th annual conference on computer graphics and interactive techniques, pp 417–424

  3. Cai J, Han H, Shan S, Chen X (2019) Fcsr-gan: joint face completion and super-resolution via multi-task learning. IEEE Trans Biom Behav Ident Sci 2(2):109–121

    Article  Google Scholar 

  4. Cao Q, Shen L, Xie W, Parkhi O M, Zisserman A (2018) Vggface2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, pp 67–74. https://doi.org/10.1109/fg.2018.00020

  5. Chen C, Li X, Yang L, Lin X, Zhang L, Wong K-Y K (2021) Progressive semantic-aware style transformation for blind face restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11896–11905. https://doi.org/10.1109/cvpr46437.2021.01172

  6. Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12299–12310. https://doi.org/10.1109/cvpr46437.2021.01212

  7. Dai Q, Li J, Yi Q, Fang F, Zhang G (2021) Feedback network for mutually boosted stereo image super-resolution and disparity estimation. arXiv:2106.00985. https://doi.org/10.1145/3474085.3475356

  8. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations

  9. Efros A A, Freeman W T (2001) Image quilting for texture synthesis and transfer. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp 341–346. https://doi.org/10.1145/383259.383296

  10. Fu C, Wu X, Hu Y, Huang H, He R (2021) Dvg-face: dual variational generation for heterogeneous face recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2021.3052549

  11. Gatys L A, Ecker A S, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2414–2423. https://doi.org/10.1109/cvpr.2016.265

  12. Guo K, Hu M, Ren S, Li F, Zhang J, Guo H, Kui X (2022) Deep illumination-enhanced face super-resolution network for low-light images. ACM Trans Multimed Comput Commun Applic (TOMM) 18(3):1–19. https://doi.org/10.1145/3495258

    Article  Google Scholar 

  13. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv:1606.08415

  14. Hore A, Ziou D (2010) Image quality metrics: Psnr vs. ssim. In: 2010 20th International conference on pattern recognition. IEEE, pp 2366–2369. https://doi.org/10.1109/icpr.2010.579

  15. Huang H, He R, Sun Z, Tan T (2017) Wavelet-srnet: a wavelet-based cnn for multi-scale face super resolution. In: Proceedings of the IEEE international conference on computer vision, pp 1689–1697. https://doi.org/10.1109/ICCV.2017.187

  16. Jaderberg M, Simonyan K, Zisserman A, et al. (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28:2017–2025

    Google Scholar 

  17. Kingma D P, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  18. Kwatra V, Essa I, Bobick A, Kwatra N (2005) Texture optimization for example-based synthesis. In: ACM SIGGRAPH 2005 Papers, pp 795–802. https://doi.org/10.1145/1186822.1073263

  19. Li Y, Liu S, Yang J, Yang M-H (2017) Generative face completion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3911–3919. https://doi.org/10.1109/cvpr.2017.624

  20. Li H, Wang W, Yu C, Zhang S (2021) Swapinpaint: identity-specific face inpainting with identity swapping. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/tcsvt.2021.3130196

  21. Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1833–1844. https://doi.org/10.1109/iccvw54120.2021.00210

  22. Lim B, Son S, Kim H, Nah S, Mu Lee K (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144. https://doi.org/10.1109/cvprw.2017.151

  23. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738. https://doi.org/10.1109/iccv.2015.425

  24. Liu S, Yang J, Huang C, Yang M-H (2015) Multi-objective convolutional learning for face labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3451–3459. https://doi.org/10.1109/cvpr.2015.729897

  25. Liu G, Reda F A, Shih K J, Wang T-C, Tao A, Catanzaro B (2018) Image inpainting for irregular holes using partial convolutions. In: Proceedings of the European conference on computer vision (ECCV), pp 85–100. https://doi.org/10.1007/978-3-030-01252-6_6

  26. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022. https://doi.org/10.1109/iccv48922.2021.00986

  27. Lu W, Zhao H, Jiang X, Jin X, Wang M, Lyu J, Shi K (2022) Diverse facial inpainting guided by exemplars. arXiv:2202.06358

  28. Ma C, Jiang Z, Rao Y, Lu J, Zhou J (2020) Deep face super-resolution with iterative collaboration between attentive recovery and landmark estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5569–5578. https://doi.org/10.1109/cvpr42600.2020.00561

  29. Ma X, Zhou X, Huang H, Jia G, Chai Z, Wei X (2022) Contrastive attention network with dense field estimation for face completion. Pattern Recogn 124:108465. https://doi.org/10.1016/j.patcog.2021.108465

    Article  Google Scholar 

  30. Meng Q, Zhao S, Huang Z, Zhou F (2021) Magface: a universal representation for face recognition and quality assessment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14225–14234. https://doi.org/10.1109/cvpr46437.2021.01400

  31. Qiu H, Gong D, Li Z, Liu W, Tao D (2021) End2end occluded face recognition by masking corrupted features. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2021.3098962

  32. Telea A (2004) An image inpainting technique based on the fast marching method. J Graph Tools 9(1):23–34. https://doi.org/10.1080/10867651.2004.10487596

    Article  Google Scholar 

  33. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. Advances in Neural Information Processing Systems, 30

  34. Wang L, Wang Y, Liang Z, Lin Z, Yang J, An W, Guo Y (2019) Learning parallax attention for stereo image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12250–12259. https://doi.org/10.1109/cvpr.2019.01253

  35. Wang X, Li Y, Zhang H, Shan Y (2021) Towards real-world blind face restoration with generative facial prior. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9168–9178. https://doi.org/10.1109/cvpr46437.2021.00905

  36. Wang H, Hu Q, Wu C, Chi J, Yu X, Wu H (2021) Dclnet: dual closed-loop networks for face super-resolution. Knowl-Based Syst 222:106987. https://doi.org/10.1016/j.knosys.2021.106987

    Article  Google Scholar 

  37. Wang J, Chen S, Wu Z, Jiang Y-G (2022) Ft-tdr: frequency-guided transformer and top-down refinement network for blind face inpainting. IEEE Trans Multimedia. https://doi.org/10.1109/tmm.2022.3146774

  38. Wang Y, Hu Y, Zhang J (2022) Panini-net: gan prior based degradation-aware feature interpolation for face restoration. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)

  39. Wang Z, Bovik A C, Sheikh H R, Simoncelli E P (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/tip.2003.819861

    Article  Google Scholar 

  40. Yang Y, Guo X, Ma J, Ma L, Ling H (2019) Lafin: generative landmark guided face inpainting. arXiv:1911.11394. https://doi.org/10.1007/978-3-030-60633-6_2

  41. Yang L, Wang S, Ma S, Gao W, Liu C, Wang P, Ren P (2020) Hifacegan: face renovation via collaborative suppression and replenishment. In: Proceedings of the 28th ACM international conference on multimedia, pp 1551–1560. https://doi.org/10.1145/3394171.3413965

  42. Yang T, Ren P, Xie X, Zhang L (2021) Gan prior embedded network for blind face restoration in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 672–681. https://doi.org/10.1109/cvpr46437.2021.00073

  43. Yi D, Lei Z, Liao S, Li S Z (2014) Learning face representation from scratch. arXiv:1411.7923

  44. Yu X, Porikli F (2016) Ultra-resolving face images by discriminative generative networks. In: European conference on computer vision. Springer, pp 318–333. https://doi.org/10.1007/978-3-319-46454-1_20

  45. Yu X, Porikli F (2017) Face hallucination with tiny unaligned images by transformative discriminative neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 31

  46. Yu X, Fernando B, Ghanem B, Porikli F, Hartley R (2018) Face super-resolution guided by facial component heatmaps. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 217–233. https://doi.org/10.1007/978-3-030-01240-3_14

  47. Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z-H, Tay FEH, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567. https://doi.org/10.1109/iccv48922.2021.00060

  48. Zamir S W, Arora A, Khan S, Hayat M, Khan F S, Yang M-H (2021) Restormer: efficient transformer for high-resolution image restoration. arXiv:2111.09881

  49. Zhang S, He R, Sun Z, Tan T (2017) Demeshnet: blind face inpainting for deep meshface verification. IEEE Trans Inf Forensics Secur 13(3):637–647. https://doi.org/10.1109/tifs.2017.2763119

    Article  Google Scholar 

  50. Zhang X, Wang X, Shi C, Yan Z, Li X, Kong B, Lyu S, Zhu B, Lv J, Yin Y, er al (2022) De-gan: domain embedded gan for high quality face image inpainting. Pattern Recogn 124:108415. https://doi.org/10.1016/j.patcog.2021.108415

    Article  Google Scholar 

  51. Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PHS et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890. https://doi.org/10.1109/cvpr46437.2021.00681

  52. Zhong Y, Deng W, Hu J, Zhao D, Li X, Wen D (2021) Sface: sigmoid-constrained hypersphere loss for robust face recognition. IEEE Trans Image Process 30:2587–2598. https://doi.org/10.1109/tip.2020.3048632

    Article  Google Scholar 

  53. Zhu Z, Huang G, Deng J, Ye Y, Huang J, Chen X, Zhu J, Yang T, Lu J, Du D et al (2021) Webface260m: a benchmark unveiling the power of million-scale deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10492–10502. https://doi.org/10.1109/cvpr46437.2021.01035

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chengdong Wu.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Chi, J., Wu, C. et al. Progressive local-to-global vision transformer for occluded face hallucination. Multimed Tools Appl 83, 8219–8240 (2024). https://doi.org/10.1007/s11042-023-15028-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15028-2

Keywords

Navigation