Pluralistic Free-Form Image Completion

Zheng, Chuanxia; Cham, Tat-Jen; Cai, Jianfei

doi:10.1007/s11263-021-01502-7

Pluralistic Free-Form Image Completion

Published: 30 July 2021

Volume 129, pages 2786–2805, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1048 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Image completion involves filling plausible contents to missing regions in images. Current image completion methods produce only one result for a given masked image, although there may be many reasonable possibilities. In this paper, we present an approach for pluralistic image completion—the task of generating multiple and diverse plausible solutions for free-form image completion. A major challenge faced by learning-based approaches is that usually only one ground truth training instance per label for this multi-output problem. To overcome this, we propose a novel and probabilistically principled framework with two parallel paths. One is a reconstructive path that utilizes the only one ground truth to get prior distribution of missing patches and rebuild the original image from this distribution. The other is a generative path for which the conditional prior is coupled to the distribution obtained in the reconstructive path. Both are supported by adversarial learning. We then introduce a new short+long term patch attention layer that exploits distant relations among decoder and encoder features, to improve appearance consistency between the original visible and the generated new regions. Experiments show that our method not only yields better results in various datasets than existing state-of-the-art methods, but also provides multiple and diverse outputs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

End-to-End Object Detection with Transformers

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

Notes

References

Ballester, C., Bertalmio, M., Caselles, V., Sapiro, G., & Verdera, J. (2001). Filling-in by joint interpolation of vector fields and gray levels. IEEE Transactions on Image Processing, 10(8), 1200–1211.
Article MathSciNet Google Scholar
Bao, J., Chen, D., Wen, F., Li, H., & Hua, G. (2017). Cvae-gan: Fine-grained image generation through asymmetric training. In 2017 IEEE international conference on computer vision (ICCV) (pp. 2764–2773). IEEE.
Barnes, C., Shechtman, E., Finkelstein, A., & Goldman, D. B. (2009). Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics (ToG), 28, 24.
Article Google Scholar
Bertalmio, M, Sapiro, G., Caselles, V., & Ballester. C. (2000). Image inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques (pp. 417–424). ACM Press/Addison-Wesley Publishing Co.
Bertalmio, M., Vese, L., Sapiro, G., & Osher, S. (2003). Simultaneous structure and texture image inpainting. IEEE Transactions on Image Processing, 12(8), 882–889.
Article Google Scholar
Chen, Z., Nie, S., Wu, T., & Healey, C. G. (2018). High resolution face completion with multiple controllable attributes via fully end-to-end progressive generative adversarial networks. ArXiv preprint arXiv:180107632.
Criminisi, A., Perez, P., & Toyama, K. (2003). Object removal by exemplar-based inpainting. In Computer vision and pattern recognition, 2003. Proceedings. 2003 IEEE computer society conference on (Vol. 2, pp. II–II). IEEE.
Criminisi, A., Pérez, P., & Toyama, K. (2004). Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on Image Processing, 13(9), 1200–1212.
Article Google Scholar
Deng, Y., & Wang, J. (2020). Image inpainting using parallel network. In 2020 IEEE international conference on image processing (ICIP) (pp. 1088–1092). IEEE.
Doersch, C, Singh, S, Gupta, A, Sivic, J, & Efros, A. (2012). What makes paris look like paris? ACM Transactions on Graphics, 31(4), 1–9.
Article Google Scholar
Eslami, S. M. A., Jimenez Rezende, D., Besse, F., Viola, F., Morcos, A. S., Garnelo, M., et al. (2018). Neural scene representation and rendering. Science, 360(6394), 1204–1210.
Article Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).
Hara, T., & Harada, T. (2020). Spherical image generation from a single normal field of view image by considering scene symmetry. ArXiv preprint arXiv:200102993.
Hays, J., & Efros, A. A. (2007). Scene completion using millions of photographs. In ACM Transactions on Graphics (TOG) (Vol. 26, p. 4). ACM.
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems (pp. 6626–6637).
Liu, H., Jiang, B., Song, Y., Huang, W., & Yang, C. (2020). Rethinking image inpainting via a mutual encoder-decoder with feature equalizations. In Proceedings of the European conference on computer vision.
Iizuka, S., Simo-Serra, E., & Ishikawa, H. (2017). Globally and locally consistent image completion. ACM Transactions on Graphics (TOG), 36(4), 107.
Article Google Scholar
Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5967–5976). IEEE.
Jaderberg, M., Simonyan, K., & Zisserman, A. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).
Jia, J., & Tang, C. K. (2004). Inference of segmented color and texture description by tensor voting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 771–786.
Article Google Scholar
Jo, Y., & Park, J. (2019). Sc-fegan: Face editing generative adversarial network with user’s sketch and color. ArXiv preprint arXiv:190206838.
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of Gans for improved quality, stability, and variation. ArXiv preprint arXiv:1710.10196.
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8110–8119).
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. ArXiv preprint arXiv:1312.6114.
Köhler, R., Schuler, C., Schölkopf, B., & Harmeling, S. (2014). Mask-specific inpainting with deep neural networks. In German conference on pattern recognition (pp. 523–534). Springer.
Lee, H. Y., Tseng, H. Y., Huang, J. B., Singh, M., & Yang, M. H. (2018). Diverse image-to-image translation via disentangled representations. In European conference on computer vision (ECCV).
Levin, A., Zomet, A., & Weiss, Y. (2003). Learning how to inpaint from global image statistics. In Null (p. 305). IEEE.
Li, Y., Liu, S., Yang, J., & Yang, M. H. (2017). Generative face completion. In Computer vision and pattern recognition (CVPR), 2017 IEEE conference on (pp. 5892–5900). IEEE.
Liu, G., Reda, F. A., Shih, K. J., Wang, T. C., Tao, A., & Catanzaro, B. (2018). Image inpainting for irregular holes using partial convolutions. In Proceedings of the European conference on computer vision (ECCV).
Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision (pp. 3730–3738).
Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., & Smolley, S. P. (2017). Least squares generative adversarial networks. In Computer vision (ICCV), 2017 IEEE international conference on (pp. 2813–2821). IEEE.
Mathieu, M., Couprie, C., & LeCun, Y. (2015). Deep multi-scale video prediction beyond mean square error. ArXiv preprint arXiv:151105440.
Nazeri, K., Ng, E., Joseph, T., Qureshi, F., & Ebrahimi, M. (2019). Edgeconnect: generative image inpainting with adversarial edge learning. ArXiv preprint arXiv:190100212.
Park, E., Yang, J., Yumer, E., Ceylan, D., & Berg, A. C. (2017). Transformation-grounded image generation network for novel 3D view synthesis. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 702–711). IEEE.
Park, T., Liu, M. Y., Wang, T. C., & Zhu, J. Y. (2019). Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2337–2346).
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A. A. (2016). Context encoders: feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2536–2544).
Peng, J., Liu, D., Xu, S., & Li, H. (2021). Generating diverse structure for image inpainting with hierarchical VQ-VAE. ArXiv preprint arXiv:210310022.
Portenier, T., Hu, Q., Szabo, A., Bigdeli, S. A., Favaro, P., & Zwicker, M. (2018). Faceshop: deep sketch-based face image editing. ACM Transactions on Graphics (TOG), 37(4), 99.
Article Google Scholar
Ren, J. S., Xu, L., Yan, Q., & Sun, W. (2015). Shepard convolutional neural networks. In Advances in neural information processing systems (pp. 901–909).
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article MathSciNet Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In Advances in neural information processing systems (pp. 2234–2242).
Shaham, T. R., Dekel, T., & Michaeli, T. (2019). Singan: learning a generative model from a single natural image. In Proceedings of the IEEE international conference on computer vision (pp. 4570–4580).
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2107–2116).
Sohn, K., Lee, H., & Yan, X. (2015). Learning structured output representation using deep conditional generative models. In Advances in neural information processing systems (pp. 3483–3491).
Song, Y., Yang, C., Lin, Z., Liu, X., Huang, Q., Li, H., & Jay, C. (2018a). Contextual-based image inpainting: infer, match, and translate. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 3–19).
Song, Y., Yang, C., Shen, Y., Wang, P., Huang, Q., & Kuo, C. C. J. (2018b). Spg-net: Segmentation prediction and guidance network for image inpainting. ArXiv preprint arXiv:1805.03356.
Walker, J., Doersch, C., Gupta, A., & Hebert, M. (2016). An uncertain future: forecasting from static images using variational autoencoders. In European conference on computer vision (ECCV).
Wang, Y., Tao, X., Qi, X., Shen, X., & Jia, J. (2018). Image inpainting via generative multi-column convolutional neural networks. In Advances in neural information processing systems (pp. 331–340).
Yan, Z., Li, X., Li, M., Zuo, W., & Shan, S. (2018). Shift-net: image inpainting via deep feature rearrangement. In The European conference on computer vision (ECCV).
Yang, C., Lu, X., Lin, Z., Shechtman, E., Wang, O., & Li, H. (2017). High-resolution image inpainting using multi-scale neural patch synthesis. In The IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 1, p. 3).
Yeh, R. A., Chen, C., Lim, T. Y., Schwing, A. G., Hasegawa-Johnson, M., & Do, M. N. (2017). Semantic image inpainting with deep generative models. In Computer vision and pattern recognition (CVPR), 2017 IEEE conference on (pp. 6882–6890). IEEE.
Yi, Z., Tang, Q., Azizi, S., Jang, D., & Xu, Z. (2020). Contextual residual aggregation for ultra high-resolution image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7508–7517).
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., & Huang, T. S. (2018). Generative image inpainting with contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5505–5514).
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., & Huang, T. S. (2019). Free-form image inpainting with gated convolution. In Proceedings of the IEEE international conference on computer vision (pp. 4471–4480).
Zeng, Y., Lin, Z., Yang, J., Zhang, J., Shechtman, E., & Lu, H. (2020). High-resolution image inpainting with iterative confidence feedback and guided upsampling. In European conference on computer vision (pp. 1–17). Springer.
Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2018a). Self-attention generative adversarial networks. ArXiv preprint arXiv:180508318.
Zhang, R., Isola, P., & Efros, A. A. (2016). Colorful image colorization. In European conference on computer vision (pp. 649–666). Springer.
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018b). The unreasonable effectiveness of deep features as a perceptual metric. In CVPR.
Zhao, L., Mo, Q., Lin, S., Wang, Z., Zuo, Z., Chen, H., Xing, W., & Lu, D. (2020). Uctgan: diverse image inpainting based on unsupervised cross-space translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5741–5750).
Zheng, C., Cham, T. J., & Cai, J. (2018). T2net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In Proceedings of the European conference on computer vision (ECCV) (pp. 767–783).
Zheng, C., Cham, T. J., & Cai, J. (2019). Pluralistic image completion. In The IEEE conference on computer vision and pattern recognition (CVPR).
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., & Torralba, A. (2018). Places: a 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6), 1452–1464.
Article Google Scholar
Zhou, T., Tulsiani, S., Sun, W., Malik, J., & Efros, A. A. (2016). View synthesis by appearance flow. In European conference on computer vision (pp. 286–301). Springer.
Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017a). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223–2232).
Zhu, J. Y., Zhang, R., Pathak, D., Darrell, T., Efros, A. A., Wang, O., & Shechtman, E. (2017b). Toward multimodal image-to-image translation. In Advances in neural information processing systems (pp. 465–476).

Download references

Acknowledgements

This study is supported under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAFICP) Funding Initiative, as well as cash and in-kind contribution from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). This research is also supported by the Monash FIT Start-up Grant.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Chuanxia Zheng & Tat-Jen Cham
Department of Data Science and AI, Monash University, Clayton, VIC, Australia
Jianfei Cai

Authors

Chuanxia Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Tat-Jen Cham
View author publications
You can also search for this author in PubMed Google Scholar
Jianfei Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chuanxia Zheng.

Additional information

Communicated by Jian Sun.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, C., Cham, TJ. & Cai, J. Pluralistic Free-Form Image Completion. Int J Comput Vis 129, 2786–2805 (2021). https://doi.org/10.1007/s11263-021-01502-7

Download citation

Received: 21 November 2020
Accepted: 07 July 2021
Published: 30 July 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11263-021-01502-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pluralistic Free-Form Image Completion

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

A survey on Image Data Augmentation for Deep Learning

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pluralistic Free-Form Image Completion

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

A survey on Image Data Augmentation for Deep Learning

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation