Elsevier

Applied Soft Computing

Volume 111, November 2021, 107626
Applied Soft Computing

Face inpainting based on GAN by facial prediction and fusion as guidance information

https://doi.org/10.1016/j.asoc.2021.107626Get rights and content

Highlights

  • FIFPNet predicts and fuses face semantic information to complete face inpainting.

  • we use a discriminator instead of KL loss to enhances its learning ability in VAE.

  • Guidance information can enhance the robustness of the generators in FIFPNet.

Abstract

Face inpainting, a special case of image inpainting, aims to complete the occluded facial regions with unconstrained pose and orientation. However, existing methods generate unsatisfying results with easily detectable flaws. There are often fuzzy boundaries and details near the holes. Especially, for face inpainting, the face region semantic information (face structure, contour, and content information) has not been fully utilized, which leads to unnatural face images, such as asymmetry eyebrow and different sizes of eyes. This is unrealistic in many practical applications. To solve the problems, a new generative adversarial network by facial prediction and fusion as guidance information, is proposed for large missing regions of face inpainting. In the proposed method, two stages are adopted to complete coarse inpainting and refinement of the face. In Stage-I, we combine generator with a new encoder–decoder network with variational autoencoder-based backbone to predict the face region semantic information (including face structure, contour and content information) and do facial fusion for face inpainting. This could fully explore face region semantic information and generate coordinated coarse face images. Stage-II builds upon Stage-I results to refine face image. Both global and patch discriminators are used to synthesize high-quality photo-realistic inpainting. Experimental results on both CelebA and CelebA-HQ datasets demonstrate the effectiveness and efficiency of our method.

Introduction

Image inpainting aims to fill in the missing part of an image with visually plausible contents. Recent advances in deep generative models have shown promising potential [1], [2], [3], [4], [5]. Barnes et al. [1] utilized the continuity of images to reduce the search scopes of patch similarity and introduced a fast nearest-neighbor field algorithm called PatchMatch (PM) for image editing applications. Iizuka et al. [2] used both local discriminator and global discriminator (GLCIC) to work on the missing area and the overall image respectively. Yu et al. [3] proposed two-stage generative image inpainting with contextual attention (CA) to refine low-resolution images generated by stage one. Zheng et al. [4] proposed a pluralistic image completion network (PICNet) with a reconstructive path and a generative path to creating multiple plausible results. Zeng et al. [5] used a pyramid-context encoder network (PEN) to complete image inpainting. Liu et al. [6] designed coherent semantic attention (CSA) to complete image inpainting by correcting the similarity between inpainting region and neighborhood. Yu et al. [7] proposed region normalization (RN) to repair the missing areas. Based on these frameworks, there is a growing body of work on image inpainting, such as [8] and [9].

However, existing methods seriously slow down performance or generate unsatisfying results with easily detectable flaws. Moreover, there is often perceivable discontinuity near the holes and requires further post-processing to refine the results. Especially, for face inpainting, the face region semantic information (face structure, contour, and content information) has not been fully explored, which leads to unnatural face images, such as asymmetry eyebrows and different sizes of eyes. This is unrealistic in many practical applications. Unlike a conventional image inpainting method, face inpainting requires content, contour, and structure information about the target object for realistic outputs. While these general image inpainting methods did not consider these special particularities of the human face and did not fully explore and utilize the information contained in the face. This would lead to unnatural and fuzzy face images. Consequently, face inpainting remains a challenging problem.

Furthermore, only few papers are dedicated to face inpainting task [10], [11]. They generally incorporate simple face features into the generator for human face completion. However, the advantage of face region semantic information has not been fully explored and utilized, which also leads to unnatural face images, especially for the corrupted face image with large holes. In this paper, we focus on a large missing face inpainting model. There remain three key difficulties in large missing face completion. First, for the corrupted face image with large holes, it is inadvisable to complete the missing facial region according to other facial regions. Because a large missing area with square mask is more difficult to complete than that with an irregular mask or small square mask such that some image inpainting methods generally inpaint cropped images with irregular and small square masks. The main reason is that the receptive field of the convolution kernel is square, and the convolution kernel cannot capture any information once it reaches the big square missing region. While for the irregular or small mask, the convolution kernel can catch useful information in the receptive field from the background or the missing region. Second, the content of the missing area is very different from the content of the background area such that it is very difficult to generate the natural and harmonious face from the background image. For example, some image inpainting methods use the attention idea to find similar blocks from the background to repair the missing area. While similarity matching between each missing block of each image with the surrounding background block would take a long time to train and easily generate distorted facial features. Third, large missing face information completion should focus on reconstructing facial parts with natural and harmonious features instead of focusing on generating a better restoration. Therefore, face inpainting remains a challenging problem as it requires generating semantically new pixels for the missing key components with consistency on structures and appearance. How to improve the adaptability of the repairing network and the correctness of repairing results still requires further studies.

In this paper, for large missing face inpainting, a generative adversarial network (GAN) with facial prediction and fusion as guidance information (FIFPNet), is proposed to generate high-resolution face images with photo-realistic (natural) details. It adopts two stages and decomposes into the prior face image generative process and high-resolution face inpainting. In Stage-I, we construct a new encoder–decoder network with variational autoencoder (VAE)-based backbone to predict face semantic knowledge (including contour, structure, and content information) and do information fusion for face inpainting. Specifically, we use cropped images to reconstruct face landmark, face part, and face mask to achieve face structure, content, and contour information, and merge these information as a guide to complete the face inpainting. This would fully explore face region semantic information and generate coordinated coarse face images. Based on the Stage-I result, Stage-II uses the generated adversarial network to refine the face image. Both global and patch discriminators are used to synthesize high-quality photo-realistic inpainting. The main contributions of this paper are summarized as follows:

  • To fully explore the semantic information of the face, FIFPNet uses two encoders and three decoders to obtain face structure, content, and contour information from the face landmark, face part, and face mask respectively and complete the feature fusion. Meanwhile, we reconstruct the landmark to ensure the fusion relationship with structural information as the leading factor, and contour and content information as the auxiliary to complete natural and harmonious face completion.

  • Traditional VAE uses the Kullback–Leibler (KL) divergence triad as the distribution constraint for the intermediate sampling, but the large cardinality of KL divergence would cause a great disturbance to the reconstruction loss. For the sampling of feature extraction, we use a discriminator instead of KL divergence as the constraint. This simultaneously fights with the encoder and enhances its learning ability.

  • Fusing the face region semantic information into both Stage-I generator and Stage-II generator not only guides the face inpainting but also enhances the robustness of the generator.

The remainder of this paper is organized as follows. Section 2 describes the current mainstream image inpainting methods and face inpainting methods. FIFPNet is introduced in Section 3. In Section 4, experimental results and ablation study demonstrate the effectiveness and efficiency of FIFPNet. Finally, our conclusions are presented in Section 5.

Section snippets

Variational Auto-Encoder (VAE).

AutoEncoder is a process of reconstruction in which the encoder extracts data features and the decoder decodes them directly. Kingma et al. [12] thought that data itself is subject to a normal distribution, so they proposed Variational Auto-Encoder (VAE) to limit the distribution of the potential feature vector z extracted by the encoder. Given a real sample xk, VAE supposes there is a distribution p(z|xk) dedicated to xk, and further assumes that p(z|xk) follows Gaussian distribution. Thus,

Proposed algorithm

Existing face inpainting methods usually ignore or fail to take full advantage of the face region semantic information (i.e. structural information). They just only use this related information as loss items to constrain the completed face image. This yields weak influence in constructing facial structure information and optimizing facial texture information. For example, symmetrical relations may disappear in these methods. Therefore, we consider fusing face structure, content, and contour

Experiment

Conclusion

In this paper, we proposed a new generative framework with GAN-based backbone by predicting and fusing facial semantic information to guide large missing face inpainting. It mainly consists of two stages for face inpainting. Stage-I aims to embed the face region semantic information into latent variables as guidance information for face inpainting. It would generate low-resolution coordinated face images. Stage-II combines generator, both global and patch discriminators and yields clear and

CRediT authorship contribution statement

Xian Zhang: Conceptualization, Methodology, Software. Canghong Shi: Validation, Resourecs. Xin Wang: Writing - review & editing. Xi Wu: Writing - review & editing. Xiaojie Li: Writing - original draft, Funding acquisition. Jiancheng Lv: Formal analysis, Investigation. Imran Mumtaz: Formal analysis, Investigation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the Sichuan Science and Technology Program, China (2019JDJQ0002, 2018GZ0184, and 2018RZ 0072), the Key Project of Natural Science of Sichuan Provincial Education Department, China (17ZA0063), and the Natural Science Foundation For Young Scientists of Chengdu University of Information Technology, China (J201704).

References (59)

  • H. Liu, B. Jiang, Y. Xiao, C. Yang, Coherent semantic attention for image inpainting, in: IEEE International Conference...
  • YuT. et al.

    Region normalization for image inpainting

  • G. Liu, F.A. Reda, K.J. Shih, T.-C. Wang, A. Tao, B. Catanzaro, Image inpainting for irregular holes using partial...
  • J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, T.S. Huang, Free-form image inpainting with gated convolution, in: Proceedings...
  • LiaoH. et al.

    Face completion with semantic knowledge and collaborative adversarial learning

  • Y. Li, S. Liu, J. Yang, M.-H. Yang, Generative face completion, in: Proceedings of the IEEE Conference on Computer...
  • KingmaD.P. et al.

    Auto-encoding variational bayes

    (2013)
  • RazaviA. et al.

    Generating diverse high-fidelity images with vq-vae-2

    (2019)
  • HigginsI. et al.

    Beta-vae: Learning basic visual concepts with a constrained variational framework

    (2016)
  • Lopez-MartinM. et al.

    Conditional variational autoencoder for prediction and feature recovery applied to intrusion detection in IoT

    Sensors

    (2017)
  • GoodfellowI.J. et al.

    Generative adversarial nets

  • HarmsJ. et al.

    Paired cycle-GAN-based image correction for quantitative cone-beam computed tomography

    Med. Phys.

    (2019)
  • H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, D.N. Metaxas, Stackgan: Text to photo-realistic image synthesis...
  • T. Karras, S. Laine, T. Aila, A style-based generator architecture for generative adversarial networks, in: Proceedings...
  • DemirayB.Z. et al.

    D-SRGAN: DEM super-resolution with generative adversarial networks

    SN Comput. Sci.

    (2021)
  • WenjieJ. et al.

    Research on super-resolution reconstruction algorithm of remote sensing image based on generative adversarial networks

  • WuX. et al.

    A survey of image synthesis and editing with generative adversarial networks

    Tsinghua Sci. Technol.

    (2017)
  • LinJ. et al.

    Anycost GANs for interactive image synthesis and editing

    (2021)
  • ZhangJ. et al.

    PISE: Person image synthesis and editing with decoupled GAN

    (2021)
  • Cited by (13)

    • SLMGAN: Single-layer metasurface design with symmetrical free-form patterns using generative adversarial networks

      2022, Applied Soft Computing
      Citation Excerpt :

      Third, the current tandem network can only output the results which are memorized rather than composite new solutions by learning the design mechanisms. The generative adversarial network (GAN) [30–34] provides a promising solution to alleviate above problems. Although some works attempt to generate more free-form patterns by using GANs, their output patterns are still some simple customized geometries [29] or conventional basic shapes [23].

    • Face image inpainting via latent features reconstruction and mask awareness

      2022, Computers and Electrical Engineering
      Citation Excerpt :

      Motivated by the consensus that the face reconstruction model is able to decompose a given face image into multiple face hidden vectors in the feature space [8], and then the face image can be reconstructed, we attempt to apply the semantic editing of latent features to the image inpainting task. Meanwhile, inspired by the fact that the difference between faces lies in two levels: structural information and texture details [5], we propose a two-stage model for image inpainting. In the first stage, the face style features are extracted through our structure reconstructor.

    • Generative adversarial networks with attentional multimodal for human face synthesis

      2024, Indonesian Journal of Electrical Engineering and Computer Science
    View all citing articles on Scopus
    View full text