Skip to main content
Log in

Attentive Semantic and Perceptual Faces Completion Using Self-attention Generative Adversarial Networks

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

We propose an approach based on self-attention generative adversarial networks to accomplish the task of image completion where completed images become globally and locally consistent. Using self-attention GANs with contextual and other constraints, the generator can draw realistic images, where fine details are generated in the damaged region and coordinated with the whole image semantically. To train the consistent generator, i.e. image completion network, we employ global and local discriminators where the global discriminator is responsible for evaluating the consistency of the entire image, while the local discriminator assesses the local consistency by analyzing local areas containing completed regions only. Last but not least, attentive recurrent neural block is introduced to obtain the attention map about the missing part in the image, which will help the subsequent completion network to fill contents better. By comparing the experimental results of different approaches on CelebA dataset, our method shows relatively good results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Ba J, Mnih V, Kavukcuoglu K (2014) Multiple object recognition with visual attention. Preprint. arXiv:1412.7755

  2. Barnes C, Shechtman E, Finkelstein A, Goldman DB (2009) PatchMatch: a randomized correspondence algorithm for structural image editing. In: ACM transactions on graphics (ToG), vol 28. ACM, New York, p 24

  3. Buades A, Coll B, Morel JM (2005) A non-local algorithm for image denoising. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 2. IEEE, New York, pp 60–65

  4. Cho K, Courville A, Bengio Y (2015) Describing multimedia content using attention-based encoder–decoder networks. IEEE Trans Multimed 17(11):1875–1886

    Google Scholar 

  5. Darabi S, Shechtman E, Barnes C, Goldman DB, Sen P (2012) Image melding: combining inconsistent images using patch-based synthesis. ACM Trans Graph 31(4):82 https://doi.org/10.1145/2185520.2185578

  6. Denton EL, Chintala S, Fergus R et al (2015) Deep generative image models using a Laplacian pyramid of adversarial networks. In: Advances in neural information processing systems, pp 1486–1494

  7. Desimone R, Duncan J (1995) Neural mechanisms of selective visual attention. Annu Rev Neurosci 18(1):193–222

    Google Scholar 

  8. Drori I, Cohen-Or D, Yeshurun H (2003) Fragment-based image completion. In: ACM transactions on graphics (TOG), vol 22. ACM, New York, pp 303–312

  9. Duan M, Li K, Li K (2017) An ensemble CNN2ELM for age estimation. IEEE Trans Inf Forensics Secur 13(3):758–772

    Google Scholar 

  10. Fawzi A, Samulowitz H, Turaga D, Frossard P (2016) Image inpainting through neural networks hallucinations. In: 2016 IEEE 12th image, video, and multidimensional signal processing workshop (IVMSP). IEEE, New York, pp 1–5

  11. Gehring J, Miao Y, Metze F, Waibel A (2013) Extracting deep bottleneck features using stacked auto-encoders. IEEE, pp 3377–3381. https://doi.org/10.1109/ICASSP.2013.6638284

  12. Gers F (2001) Long short-term memory in recurrent neural networks. PhD thesis, Verlag nicht ermittelbar

  13. Goldman B, Shechtman E, Belaunde I (2010) Content-aware fill. https://research.adobe.com/project/content-aware-fill

  14. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  15. Guo S, Tan G, Pan H, Chen L, Gao C (2017) Face alignment under occlusion based on local and global feature regression. Multimed Tools Appl 76(6):8677–8694

    Google Scholar 

  16. Hays J, Efros AA (2007) Scene completion using millions of photographs. ACM Trans Graph 26(3):4. https://doi.org/10.1145/1276377.1276382

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  18. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Google Scholar 

  19. Hong C, Yu J, Tao D, Wang M (2014) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751

    Google Scholar 

  20. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670

    Google Scholar 

  21. Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multi-modal face pose estimation with multi-task manifold deep learning. IEEE Trans Ind Inf 15:3952–3961

    Google Scholar 

  22. Iizuka S, Simo-Serra E, Ishikawa H (2017) Globally and locally consistent image completion. ACM Trans Graph (ToG) 36(4):107. https://doi.org/10.1145/3072959.3073659

  23. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 11:1254–1259

    Google Scholar 

  24. Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. http://arxiv.org/abs/1506.02025

  25. Kataoka Y, Matsubara T, Uehara K (2016) Image generation using generative adversarial networks and attention mechanism. In: 2016 IEEE/ACIS 15th international conference on computer and information science (ICIS). IEEE, New York, pp 1–6

  26. Lee CH, Liu Z, Wu L, Luo P (2019) MaskGAN: towards diverse and interactive facial image manipulation. Technical report

  27. Li Y, Liu S, Yang J, Yang MH (2017) Generative face completion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3911–3919

  28. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738

  29. Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212

  30. Ouyang X, Zhang X, Ma D, Agam G (2018) Generating image sequence from description with LSTM conditional GAN. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, New York, pp 2456–2461

  31. Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: feature learning by inpainting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2536–2544

  32. Qian R, Tan RT, Yang W, Su J, Liu J (2018) Attentive generative adversarial network for raindrop removal from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2482–2491

  33. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. Preprint. arXiv:1511.06434

  34. Rares A, Reinders MJ, Biemond J (2005) Edge-based image restoration. IEEE Trans Image Process 14(10):1454–1468

    Google Scholar 

  35. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  36. Wexler Y, Shechtman E, Irani M (2004) Space–time video completion. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, vol 1. IEEE, New York, p 1

  37. Wexler Y, Shechtman E, Irani M (2007) Space–time completion of video. IEEE Trans Pattern Anal Mach Intell 3:463–476

    Google Scholar 

  38. Xia C, Zhang H, Gao X (2017) Combining multi-layer integration algorithm with background prior and label propagation for saliency detection. J Vis Commun Image Represent 48:110–121

    Google Scholar 

  39. Xie J, Xu L, Chen E (2012) Image denoising and inpainting with deep neural networks. In: Proceedings of the 25th international conference on neural information processing systems, NIPS’12, vol 1. Curran Associates Inc., Lake Tahoe, Nevada, pp 341–349. http://dl.acm.org/citation.cfm?id=2999134.2999173

  40. Xingjian S, Chen Z, Wang H, Yeung DY, Wong WK, Woo W (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810

  41. Xu K, Ba JL, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: 32nd International Conference on Machine Learning, ICML 2015, pp 2048–2057

  42. Yeh RA, Chen C, Yian Lim T, Schwing AG, Hasegawa-Johnson M, Do MN (2017) Semantic image inpainting with deep generative models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5485–5493

  43. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325–341

  44. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. Preprint. arXiv:1511.07122

  45. Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032

    Google Scholar 

  46. Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779

    Google Scholar 

  47. Yu J, Kuang Z, Zhang B, Zhang W, Lin D, Fan J (2018) Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Trans Inf Forensics Secur 13(5):1317–1332

    Google Scholar 

  48. Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5505–5514

  49. Yu J, Zhu C, Zhang J, Huang Q, Tao D (2019) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2908982

    Google Scholar 

  50. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915

  51. Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks. Preprint. arXiv:1805.08318

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China under Grant 2018YFB1003401, and in part by the National Outstanding Youth Science Program of National Natural Science Foundation of China under Grant 61625202.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kenli Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Li, K. & Li, K. Attentive Semantic and Perceptual Faces Completion Using Self-attention Generative Adversarial Networks. Neural Process Lett 51, 211–229 (2020). https://doi.org/10.1007/s11063-019-10080-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-019-10080-2

Keywords

Navigation