Face inpainting based on GAN by facial prediction and fusion as guidance information
Introduction
Image inpainting aims to fill in the missing part of an image with visually plausible contents. Recent advances in deep generative models have shown promising potential [1], [2], [3], [4], [5]. Barnes et al. [1] utilized the continuity of images to reduce the search scopes of patch similarity and introduced a fast nearest-neighbor field algorithm called PatchMatch (PM) for image editing applications. Iizuka et al. [2] used both local discriminator and global discriminator (GLCIC) to work on the missing area and the overall image respectively. Yu et al. [3] proposed two-stage generative image inpainting with contextual attention (CA) to refine low-resolution images generated by stage one. Zheng et al. [4] proposed a pluralistic image completion network (PICNet) with a reconstructive path and a generative path to creating multiple plausible results. Zeng et al. [5] used a pyramid-context encoder network (PEN) to complete image inpainting. Liu et al. [6] designed coherent semantic attention (CSA) to complete image inpainting by correcting the similarity between inpainting region and neighborhood. Yu et al. [7] proposed region normalization (RN) to repair the missing areas. Based on these frameworks, there is a growing body of work on image inpainting, such as [8] and [9].
However, existing methods seriously slow down performance or generate unsatisfying results with easily detectable flaws. Moreover, there is often perceivable discontinuity near the holes and requires further post-processing to refine the results. Especially, for face inpainting, the face region semantic information (face structure, contour, and content information) has not been fully explored, which leads to unnatural face images, such as asymmetry eyebrows and different sizes of eyes. This is unrealistic in many practical applications. Unlike a conventional image inpainting method, face inpainting requires content, contour, and structure information about the target object for realistic outputs. While these general image inpainting methods did not consider these special particularities of the human face and did not fully explore and utilize the information contained in the face. This would lead to unnatural and fuzzy face images. Consequently, face inpainting remains a challenging problem.
Furthermore, only few papers are dedicated to face inpainting task [10], [11]. They generally incorporate simple face features into the generator for human face completion. However, the advantage of face region semantic information has not been fully explored and utilized, which also leads to unnatural face images, especially for the corrupted face image with large holes. In this paper, we focus on a large missing face inpainting model. There remain three key difficulties in large missing face completion. First, for the corrupted face image with large holes, it is inadvisable to complete the missing facial region according to other facial regions. Because a large missing area with square mask is more difficult to complete than that with an irregular mask or small square mask such that some image inpainting methods generally inpaint cropped images with irregular and small square masks. The main reason is that the receptive field of the convolution kernel is square, and the convolution kernel cannot capture any information once it reaches the big square missing region. While for the irregular or small mask, the convolution kernel can catch useful information in the receptive field from the background or the missing region. Second, the content of the missing area is very different from the content of the background area such that it is very difficult to generate the natural and harmonious face from the background image. For example, some image inpainting methods use the attention idea to find similar blocks from the background to repair the missing area. While similarity matching between each missing block of each image with the surrounding background block would take a long time to train and easily generate distorted facial features. Third, large missing face information completion should focus on reconstructing facial parts with natural and harmonious features instead of focusing on generating a better restoration. Therefore, face inpainting remains a challenging problem as it requires generating semantically new pixels for the missing key components with consistency on structures and appearance. How to improve the adaptability of the repairing network and the correctness of repairing results still requires further studies.
In this paper, for large missing face inpainting, a generative adversarial network (GAN) with facial prediction and fusion as guidance information (FIFPNet), is proposed to generate high-resolution face images with photo-realistic (natural) details. It adopts two stages and decomposes into the prior face image generative process and high-resolution face inpainting. In Stage-I, we construct a new encoder–decoder network with variational autoencoder (VAE)-based backbone to predict face semantic knowledge (including contour, structure, and content information) and do information fusion for face inpainting. Specifically, we use cropped images to reconstruct face landmark, face part, and face mask to achieve face structure, content, and contour information, and merge these information as a guide to complete the face inpainting. This would fully explore face region semantic information and generate coordinated coarse face images. Based on the Stage-I result, Stage-II uses the generated adversarial network to refine the face image. Both global and patch discriminators are used to synthesize high-quality photo-realistic inpainting. The main contributions of this paper are summarized as follows:
- •
To fully explore the semantic information of the face, FIFPNet uses two encoders and three decoders to obtain face structure, content, and contour information from the face landmark, face part, and face mask respectively and complete the feature fusion. Meanwhile, we reconstruct the landmark to ensure the fusion relationship with structural information as the leading factor, and contour and content information as the auxiliary to complete natural and harmonious face completion.
- •
Traditional VAE uses the Kullback–Leibler (KL) divergence triad as the distribution constraint for the intermediate sampling, but the large cardinality of KL divergence would cause a great disturbance to the reconstruction loss. For the sampling of feature extraction, we use a discriminator instead of KL divergence as the constraint. This simultaneously fights with the encoder and enhances its learning ability.
- •
Fusing the face region semantic information into both Stage-I generator and Stage-II generator not only guides the face inpainting but also enhances the robustness of the generator.
The remainder of this paper is organized as follows. Section 2 describes the current mainstream image inpainting methods and face inpainting methods. FIFPNet is introduced in Section 3. In Section 4, experimental results and ablation study demonstrate the effectiveness and efficiency of FIFPNet. Finally, our conclusions are presented in Section 5.
Section snippets
Variational Auto-Encoder (VAE).
AutoEncoder is a process of reconstruction in which the encoder extracts data features and the decoder decodes them directly. Kingma et al. [12] thought that data itself is subject to a normal distribution, so they proposed Variational Auto-Encoder (VAE) to limit the distribution of the potential feature vector extracted by the encoder. Given a real sample , VAE supposes there is a distribution dedicated to , and further assumes that follows Gaussian distribution. Thus,
Proposed algorithm
Existing face inpainting methods usually ignore or fail to take full advantage of the face region semantic information (i.e. structural information). They just only use this related information as loss items to constrain the completed face image. This yields weak influence in constructing facial structure information and optimizing facial texture information. For example, symmetrical relations may disappear in these methods. Therefore, we consider fusing face structure, content, and contour
Experiment
Conclusion
In this paper, we proposed a new generative framework with GAN-based backbone by predicting and fusing facial semantic information to guide large missing face inpainting. It mainly consists of two stages for face inpainting. Stage-I aims to embed the face region semantic information into latent variables as guidance information for face inpainting. It would generate low-resolution coordinated face images. Stage-II combines generator, both global and patch discriminators and yields clear and
CRediT authorship contribution statement
Xian Zhang: Conceptualization, Methodology, Software. Canghong Shi: Validation, Resourecs. Xin Wang: Writing - review & editing. Xi Wu: Writing - review & editing. Xiaojie Li: Writing - original draft, Funding acquisition. Jiancheng Lv: Formal analysis, Investigation. Imran Mumtaz: Formal analysis, Investigation.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the Sichuan Science and Technology Program, China (2019JDJQ0002, 2018GZ0184, and 2018RZ 0072), the Key Project of Natural Science of Sichuan Provincial Education Department, China (17ZA0063), and the Natural Science Foundation For Young Scientists of Chengdu University of Information Technology, China (J201704).
References (59)
- et al.
Cycle-consistent GAN-based stain translation of renal pathology images with glomerulus detection application
Appl. Soft Comput.
(2021) - et al.
A virtual sample generation approach based on a modified conditional GAN and centroidal voronoi tessellation sampling to cope with small sample size problems: Application to soft sensing for chemical process
Appl. Soft Comput.
(2021) - et al.
Imbalanced sample fault diagnosis of rotating machinery using conditional variational auto-encoder generative adversarial network
Appl. Soft Comput.
(2020) - et al.
Image completion using planar structure guidance
ACM Trans. Graph.
(2014) - et al.
Multi-scale semantic image inpainting with residual learning and GAN
Neurocomputing
(2019) - et al.
Patchmatch: A randomized correspondence algorithm for structural image editing
- et al.
Globally and locally consistent image completion
ACM Trans. Graph.
(2017) - J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, T.S. Huang, Generative image inpainting with contextual attention, in:...
- C. Zheng, T.-J. Cham, J. Cai, Pluralistic image completion, in: Proceedings of the IEEE Conference on Computer Vision...
- Y. Zeng, J. Fu, H. Chao, B. Guo, Learning pyramid-context encoder network for high-quality image inpainting, in:...
Region normalization for image inpainting
Face completion with semantic knowledge and collaborative adversarial learning
Auto-encoding variational bayes
Generating diverse high-fidelity images with vq-vae-2
Beta-vae: Learning basic visual concepts with a constrained variational framework
Conditional variational autoencoder for prediction and feature recovery applied to intrusion detection in IoT
Sensors
Generative adversarial nets
Paired cycle-GAN-based image correction for quantitative cone-beam computed tomography
Med. Phys.
D-SRGAN: DEM super-resolution with generative adversarial networks
SN Comput. Sci.
Research on super-resolution reconstruction algorithm of remote sensing image based on generative adversarial networks
A survey of image synthesis and editing with generative adversarial networks
Tsinghua Sci. Technol.
Anycost GANs for interactive image synthesis and editing
PISE: Person image synthesis and editing with decoupled GAN
Cited by (13)
E2F-Net: Eyes-to-face inpainting via StyleGAN latent space
2024, Pattern RecognitionSLMGAN: Single-layer metasurface design with symmetrical free-form patterns using generative adversarial networks
2022, Applied Soft ComputingCitation Excerpt :Third, the current tandem network can only output the results which are memorized rather than composite new solutions by learning the design mechanisms. The generative adversarial network (GAN) [30–34] provides a promising solution to alleviate above problems. Although some works attempt to generate more free-form patterns by using GANs, their output patterns are still some simple customized geometries [29] or conventional basic shapes [23].
Face image inpainting via latent features reconstruction and mask awareness
2022, Computers and Electrical EngineeringCitation Excerpt :Motivated by the consensus that the face reconstruction model is able to decompose a given face image into multiple face hidden vectors in the feature space [8], and then the face image can be reconstructed, we attempt to apply the semantic editing of latent features to the image inpainting task. Meanwhile, inspired by the fact that the difference between faces lies in two levels: structural information and texture details [5], we propose a two-stage model for image inpainting. In the first stage, the face style features are extracted through our structure reconstructor.
Face deblurring based on regularized structure and enhanced texture information
2024, Complex and Intelligent SystemsGenerative adversarial networks with attentional multimodal for human face synthesis
2024, Indonesian Journal of Electrical Engineering and Computer Science