Progressive local-to-global vision transformer for occluded face hallucination

Wang, Huan; Chi, Jianning; Wu, Chengdong; Yu, Xiaosheng; Wu, Hao

doi:10.1007/s11042-023-15028-2

Progressive local-to-global vision transformer for occluded face hallucination

Published: 15 June 2023

Volume 83, pages 8219–8240, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Huan Wang¹,
Jianning Chi¹,
Chengdong Wu¹,
Xiaosheng Yu¹ &
…
Hao Wu²

211 Accesses
1 Altmetric
Explore all metrics

Abstract

Hallucinating a photo-realistic high-resolution (HR) face image from an occluded low-resolution (LR) face image is beneficial for a series of face-related applications. However, previous efforts focused on either super-resolving HR face images from non-occluded LR counterparts or inpainting occluded HR faces. It is necessary to address all these challenges jointly for real-world face images in unconstrained environment. In this paper, we develop a novel Local-to-Global Face Hallucination Transformer (LGFH-Transformer), which simultaneously handles the occluded LR face image super-resolution (SR) and inpainting in a unified framework. Specifically, the LGFH-Transformer is built on self-attention modules which excel at modeling long-range information between image patch sequences. Meanwhile, we introduce a mask-guided convolution and gated mechanism into the building modules (i.e., multi-head attention and feed-forward network) of each Transformer block, which can bring in the complimentary strength of convolution operation to emphasize on the spatially local context. Moreover, equipped with the delicate designed local-to-global feature reasoning mechanism in the phase of encoder, we exploit facial geometry priors (i.e., facial parsing maps) as the semantic guidance during the hallucination process in the phase of decoder to reconstruct more realistic facial details. Extensive experiments demonstrate the effectiveness and advancement of LGFH-Transformer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Face Super-Resolution with Spatial Attention Guided by Multiscale Receptive-Field Features

Learning Unoccluded Face Texture Completion from Single Image in the Wild

Article 15 May 2022

Face Super-Resolution Guided by Facial Component Heatmaps

Data Availability

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

References

Ballester C, Bertalmio M, Caselles V, Sapiro G, Verdera J (2001) Filling-in by joint interpolation of vector fields and gray levels. IEEE Trans Image Process 10(8):1200–1211. https://doi.org/10.1109/83.935036
Article MathSciNet Google Scholar
Bertalmio M, Sapiro G, Caselles V, Ballester C (2000) Image inpainting. In: Proceedings of the 27th annual conference on computer graphics and interactive techniques, pp 417–424
Cai J, Han H, Shan S, Chen X (2019) Fcsr-gan: joint face completion and super-resolution via multi-task learning. IEEE Trans Biom Behav Ident Sci 2(2):109–121
Article Google Scholar
Cao Q, Shen L, Xie W, Parkhi O M, Zisserman A (2018) Vggface2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, pp 67–74. https://doi.org/10.1109/fg.2018.00020
Chen C, Li X, Yang L, Lin X, Zhang L, Wong K-Y K (2021) Progressive semantic-aware style transformation for blind face restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11896–11905. https://doi.org/10.1109/cvpr46437.2021.01172
Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12299–12310. https://doi.org/10.1109/cvpr46437.2021.01212
Dai Q, Li J, Yi Q, Fang F, Zhang G (2021) Feedback network for mutually boosted stereo image super-resolution and disparity estimation. arXiv:2106.00985. https://doi.org/10.1145/3474085.3475356
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations
Efros A A, Freeman W T (2001) Image quilting for texture synthesis and transfer. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp 341–346. https://doi.org/10.1145/383259.383296
Fu C, Wu X, Hu Y, Huang H, He R (2021) Dvg-face: dual variational generation for heterogeneous face recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2021.3052549
Gatys L A, Ecker A S, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2414–2423. https://doi.org/10.1109/cvpr.2016.265
Guo K, Hu M, Ren S, Li F, Zhang J, Guo H, Kui X (2022) Deep illumination-enhanced face super-resolution network for low-light images. ACM Trans Multimed Comput Commun Applic (TOMM) 18(3):1–19. https://doi.org/10.1145/3495258
Article Google Scholar
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv:1606.08415
Hore A, Ziou D (2010) Image quality metrics: Psnr vs. ssim. In: 2010 20th International conference on pattern recognition. IEEE, pp 2366–2369. https://doi.org/10.1109/icpr.2010.579
Huang H, He R, Sun Z, Tan T (2017) Wavelet-srnet: a wavelet-based cnn for multi-scale face super resolution. In: Proceedings of the IEEE international conference on computer vision, pp 1689–1697. https://doi.org/10.1109/ICCV.2017.187
Jaderberg M, Simonyan K, Zisserman A, et al. (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28:2017–2025
Google Scholar
Kingma D P, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Kwatra V, Essa I, Bobick A, Kwatra N (2005) Texture optimization for example-based synthesis. In: ACM SIGGRAPH 2005 Papers, pp 795–802. https://doi.org/10.1145/1186822.1073263
Li Y, Liu S, Yang J, Yang M-H (2017) Generative face completion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3911–3919. https://doi.org/10.1109/cvpr.2017.624
Li H, Wang W, Yu C, Zhang S (2021) Swapinpaint: identity-specific face inpainting with identity swapping. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/tcsvt.2021.3130196
Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1833–1844. https://doi.org/10.1109/iccvw54120.2021.00210
Lim B, Son S, Kim H, Nah S, Mu Lee K (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144. https://doi.org/10.1109/cvprw.2017.151
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738. https://doi.org/10.1109/iccv.2015.425
Liu S, Yang J, Huang C, Yang M-H (2015) Multi-objective convolutional learning for face labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3451–3459. https://doi.org/10.1109/cvpr.2015.729897
Liu G, Reda F A, Shih K J, Wang T-C, Tao A, Catanzaro B (2018) Image inpainting for irregular holes using partial convolutions. In: Proceedings of the European conference on computer vision (ECCV), pp 85–100. https://doi.org/10.1007/978-3-030-01252-6_6
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022. https://doi.org/10.1109/iccv48922.2021.00986
Lu W, Zhao H, Jiang X, Jin X, Wang M, Lyu J, Shi K (2022) Diverse facial inpainting guided by exemplars. arXiv:2202.06358
Ma C, Jiang Z, Rao Y, Lu J, Zhou J (2020) Deep face super-resolution with iterative collaboration between attentive recovery and landmark estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5569–5578. https://doi.org/10.1109/cvpr42600.2020.00561
Ma X, Zhou X, Huang H, Jia G, Chai Z, Wei X (2022) Contrastive attention network with dense field estimation for face completion. Pattern Recogn 124:108465. https://doi.org/10.1016/j.patcog.2021.108465
Article Google Scholar
Meng Q, Zhao S, Huang Z, Zhou F (2021) Magface: a universal representation for face recognition and quality assessment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14225–14234. https://doi.org/10.1109/cvpr46437.2021.01400
Qiu H, Gong D, Li Z, Liu W, Tao D (2021) End2end occluded face recognition by masking corrupted features. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2021.3098962
Telea A (2004) An image inpainting technique based on the fast marching method. J Graph Tools 9(1):23–34. https://doi.org/10.1080/10867651.2004.10487596
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. Advances in Neural Information Processing Systems, 30
Wang L, Wang Y, Liang Z, Lin Z, Yang J, An W, Guo Y (2019) Learning parallax attention for stereo image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12250–12259. https://doi.org/10.1109/cvpr.2019.01253
Wang X, Li Y, Zhang H, Shan Y (2021) Towards real-world blind face restoration with generative facial prior. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9168–9178. https://doi.org/10.1109/cvpr46437.2021.00905
Wang H, Hu Q, Wu C, Chi J, Yu X, Wu H (2021) Dclnet: dual closed-loop networks for face super-resolution. Knowl-Based Syst 222:106987. https://doi.org/10.1016/j.knosys.2021.106987
Article Google Scholar
Wang J, Chen S, Wu Z, Jiang Y-G (2022) Ft-tdr: frequency-guided transformer and top-down refinement network for blind face inpainting. IEEE Trans Multimedia. https://doi.org/10.1109/tmm.2022.3146774
Wang Y, Hu Y, Zhang J (2022) Panini-net: gan prior based degradation-aware feature interpolation for face restoration. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)
Wang Z, Bovik A C, Sheikh H R, Simoncelli E P (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/tip.2003.819861
Article Google Scholar
Yang Y, Guo X, Ma J, Ma L, Ling H (2019) Lafin: generative landmark guided face inpainting. arXiv:1911.11394. https://doi.org/10.1007/978-3-030-60633-6_2
Yang L, Wang S, Ma S, Gao W, Liu C, Wang P, Ren P (2020) Hifacegan: face renovation via collaborative suppression and replenishment. In: Proceedings of the 28th ACM international conference on multimedia, pp 1551–1560. https://doi.org/10.1145/3394171.3413965
Yang T, Ren P, Xie X, Zhang L (2021) Gan prior embedded network for blind face restoration in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 672–681. https://doi.org/10.1109/cvpr46437.2021.00073
Yi D, Lei Z, Liao S, Li S Z (2014) Learning face representation from scratch. arXiv:1411.7923
Yu X, Porikli F (2016) Ultra-resolving face images by discriminative generative networks. In: European conference on computer vision. Springer, pp 318–333. https://doi.org/10.1007/978-3-319-46454-1_20
Yu X, Porikli F (2017) Face hallucination with tiny unaligned images by transformative discriminative neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Yu X, Fernando B, Ghanem B, Porikli F, Hartley R (2018) Face super-resolution guided by facial component heatmaps. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 217–233. https://doi.org/10.1007/978-3-030-01240-3_14
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z-H, Tay FEH, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567. https://doi.org/10.1109/iccv48922.2021.00060
Zamir S W, Arora A, Khan S, Hayat M, Khan F S, Yang M-H (2021) Restormer: efficient transformer for high-resolution image restoration. arXiv:2111.09881
Zhang S, He R, Sun Z, Tan T (2017) Demeshnet: blind face inpainting for deep meshface verification. IEEE Trans Inf Forensics Secur 13(3):637–647. https://doi.org/10.1109/tifs.2017.2763119
Article Google Scholar
Zhang X, Wang X, Shi C, Yan Z, Li X, Kong B, Lyu S, Zhu B, Lv J, Yin Y, er al (2022) De-gan: domain embedded gan for high quality face image inpainting. Pattern Recogn 124:108415. https://doi.org/10.1016/j.patcog.2021.108415
Article Google Scholar
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PHS et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890. https://doi.org/10.1109/cvpr46437.2021.00681
Zhong Y, Deng W, Hu J, Zhao D, Li X, Wen D (2021) Sface: sigmoid-constrained hypersphere loss for robust face recognition. IEEE Trans Image Process 30:2587–2598. https://doi.org/10.1109/tip.2020.3048632
Article Google Scholar
Zhu Z, Huang G, Deng J, Ye Y, Huang J, Chen X, Zhu J, Yang T, Lu J, Du D et al (2021) Webface260m: a benchmark unveiling the power of million-scale deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10492–10502. https://doi.org/10.1109/cvpr46437.2021.01035

Download references

Author information

Authors and Affiliations

Northeastern University, Shenyang, 110167, China
Huan Wang, Jianning Chi, Chengdong Wu & Xiaosheng Yu
The University of Sydney, Sydney, NSW 2006, Australia
Hao Wu

Authors

Huan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianning Chi
View author publications
You can also search for this author in PubMed Google Scholar
Chengdong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaosheng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chengdong Wu.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, H., Chi, J., Wu, C. et al. Progressive local-to-global vision transformer for occluded face hallucination. Multimed Tools Appl 83, 8219–8240 (2024). https://doi.org/10.1007/s11042-023-15028-2

Download citation

Received: 26 April 2022
Revised: 18 July 2022
Accepted: 27 February 2023
Published: 15 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15028-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Progressive local-to-global vision transformer for occluded face hallucination

Abstract

Access this article

Similar content being viewed by others

Face Super-Resolution with Spatial Attention Guided by Multiscale Receptive-Field Features

Learning Unoccluded Face Texture Completion from Single Image in the Wild

Face Super-Resolution Guided by Facial Component Heatmaps

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Progressive local-to-global vision transformer for occluded face hallucination

Abstract

Access this article

Similar content being viewed by others

Face Super-Resolution with Spatial Attention Guided by Multiscale Receptive-Field Features

Learning Unoccluded Face Texture Completion from Single Image in the Wild

Face Super-Resolution Guided by Facial Component Heatmaps

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation