BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

Ju, Xuan; Liu, Xian; Wang, Xintao; Bian, Yuxuan; Shan, Ying; Xu, Qiang

doi:10.1007/978-3-031-72661-3_9

Xuan Ju^13,14,
Xian Liu^13,14,
Xintao Wang¹³,
Yuxuan Bian¹⁴,
Ying Shan¹³ &
…
Qiang Xu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15078))

Included in the following conference series:

European Conference on Computer Vision

400 Accesses

Abstract

Image inpainting, the process of restoring corrupted images, has seen significant advancements with the advent of diffusion models (DMs). Despite these advancements, current DM adaptations for inpainting, which involve modifications to the sampling strategy or the development of inpainting-specific DMs, frequently suffer from semantic inconsistencies and reduced image quality. Addressing these challenges, our work introduces a novel paradigm: the division of masked image features and noisy latent into separate branches. This division dramatically diminishes the model’s learning load, facilitating a nuanced incorporation of essential masked image information in a hierarchical fashion. Herein, we present BrushNet, a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM, guaranteeing coherent and enhanced image inpainting outcomes. Additionally, we introduce BrushData and BrushBench to facilitate segmentation-based inpainting training and performance assessment. Our extensive experimental analysis demonstrates BrushNet’s superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.99; Price excludes VAT (USA)

Softcover Book: USD 161.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Multi-scale Patch Approach with Diffusion Model for Image Dehazing

DeepGIN: Deep Generative Inpainting Network for Extreme Image Inpainting

Multi-level Discriminator and Wavelet Loss for Image Inpainting with Large Missing Area

Notes

1.
The proposed BrushData and BrushBench will be released along with the codes.

References

Avrahami, O., Fried, O., Lischinski, D.: Blended latent diffusion. ACM Trans. Graph. (TOG) 42(4), 1–11 (2023)
Article Google Scholar
Avrahami, O., Lischinski, D., Fried, O.: Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18208–18218 (2022)
Google Scholar
Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: International Conference and Exhibition on Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 417–424 (2000)
Google Scholar
Binghui, C., Chao, L., Chongyang, Z., Wangmeng, X., Yifeng, G., Xuansong, X.: Replaceanything as you want: Ultra-high quality content replacement (2023). https://aigcdesigngroup.github.io/replace-anything/
Bińkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying MMD GANs. arXiv preprint arXiv:1801.01401 (2018)
Corneanu, C., Gadde, R., Martinez, A.M.: Latentpaint: image inpainting in latent space with diffusion models. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 4334–4343 (2024)
Google Scholar
Criminisi, A., Pérez, P., Toyama, K.: Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 13(9), 1200–1212 (2004)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255. IEEE (2009)
Google Scholar
epinikion: epicrealism (2023). https://civitai.com/models/25694?modelVersionId =143906
heni29833: Henmixreal (2024). https://civitai.com/models/20282?modelVersionId=305687
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems (NIPS) 30 (2017)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems (NIPS) 33, 6840–6851 (2020)
Google Scholar
Ho, J., Salimans, T.: Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)
Huang, H., He, R., Sun, Z., Tan, T., et al.: Introvae: introspective variational autoencoders for photographic image synthesis. Advances in Neural Information Processing Systems (NIPS) 31 (2018)
Google Scholar
Huang, Y., et al.: Diffusion model-based image editing: A survey. arXiv preprint arXiv:2402.17525 (2024)
Jayasumana, S., Ramalingam, S., Veit, A., Glasner, D., Chakrabarti, A., Kumar, S.: Rethinking fid: Towards a better evaluation metric for image generation. arXiv preprint arXiv:2401.09603 (2023)
Kuznetsova, A., et al.: The open images dataset v4: unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vis. (IJCV) 128(7), 1956–1981 (2020)
Article Google Scholar
Li, Z., Wei, P., Yin, X., Ma, Z., Kot, A.C.: Virtual try-on with pose-garment keypoints guided inpainting. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 22788–22797 (2023)
Google Scholar
Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV), pp. 740–755. Springer (2014)
Google Scholar
Liu, A., Niepert, M., Broeck, G.V.d.: Image inpainting via tractable steering of diffusion models. arXiv preprint arXiv:2401.03349 (2023)
Liu, H., Wan, Z., Huang, W., Song, Y., Han, X., Liao, J.: Pd-GAN: probabilistic diverse GAN for image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9371–9381 (2021)
Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: IEEE/CVF International Conference on Computer Vision (ICCV), December 2015
Google Scholar
Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: RePaint: Inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11461–11471 (2022)
Google Scholar
Lykon: Dreamshaper (2022). https://civitai.com/models/4384?modelVersionId=128713
Manukyan, H., Sargsyan, A., Atanyan, B., Wang, Z., Navasardyan, S., Shi, H.: Hd-painter: high-resolution and prompt-faithful text-guided image inpainting with diffusion models. arXiv preprint arXiv:2312.14091 (2023)
Meina: Meinamix (2023). https://civitai.com/models/7240?modelVersionId=119057
Peng, J., Liu, D., Xu, S., Li, H.: Generating diverse structure for image inpainting with hierarchical vq-vae. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10775–10784 (2021)
Google Scholar
von Platen, P., et al.: Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers (2022)
Quan, W., Chen, J., Liu, Y., Yan, D.M., Wonka, P.: Deep learning-based image and video inpainting: a survey. IntJ. of Computer Vision (IJCV) pp. 1–34 (2024)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (ICML), pp. 8748–8763. PMLR (2021)
Google Scholar
Razzhigaev, A., et al.: Kandinsky: an improved text-to-image synthesis with image prior and latent diffusion. arXiv preprint arXiv:2310.03502 (2023)
Ren, T., et al.: Grounded sam: assembling open-world models for diverse visual tasks. arXiv preprint arXiv:2401.14159 (2024)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695, June 2022
Google Scholar
Schuhmann, C., et al.: Laion-5b: an open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems (NIPS) 35, 25278–25294 (2022)
Google Scholar
SG161222: Realisticvision (2023). https://civitai.com/models/4201?modelVersionId=130072
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
Wang, S., et al.: Imagen editor and editbench: advancing and evaluating text-guided image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 18359–18369 (2023)
Google Scholar
Wikipedia contributors: Mean squared error — Wikipedia, the free encyclopedia (2024). https://en.wikipedia.org/w/index.php?title=Mean_squared_error&oldid=1207422018. Accessed 4 Mar 2024
Wikipedia contributors: Peak signal-to-noise ratio — Wikipedia, the free encyclopedia (2024). https://en.wikipedia.org/w/index.php?title=Peak_signal-to-noise_ratio&oldid=1210897995. Accessed 4 Mar 2024
Wu, C., et al.: GODIVA: generating open-domain videos from natural descriptions. arXiv preprint arXiv:2104.14806 (2021)
Wu, X., et al.: Human preference score v2: a solid benchmark for evaluating human preferences of text-to-image synthesis. arXiv preprint arXiv:2306.09341 (2023)
Xie, S., Zhang, Z., Lin, Z., Hinz, T., Zhang, K.: Smartbrush: text and shape guided object inpainting with diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22428–22437 (2023)
Google Scholar
Xie, S., et al.: Dreaminpainter: text-guided subject-driven image inpainting with diffusion models. arXiv preprint arXiv:2312.03771 (2023)
Xu, J., et al.: Imagereward: Learning and evaluating human preferences for text-to-image generation (2023)
Google Scholar
Xu, Z., Zhang, X., Chen, W., Yao, M., Liu, J., Xu, T., Wang, Z.: A review of image inpainting methods based on deep learning. Appl. Sci. 13(20), 11189 (2023)
Article Google Scholar
Yang, S., Chen, X., Liao, J.: Uni-paint: a unified framework for multimodal image inpainting with pretrained diffusion model. In: ACM International Conference on Multimedia (MM), pp. 3190–3199 (2023)
Google Scholar
Yang, S., Zhang, L., Ma, L., Liu, Y., Fu, J., He, Y.: Magicremover: tuning-free text-guided image inpainting with diffusion models. arXiv preprint arXiv:2310.02848 (2023)
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: Lsun: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
Yu, T., et al.: Inpaint anything: segment anything meets image inpainting. arXiv preprint arXiv:2304.06790 (2023)
Zhang, G., Ji, J., Zhang, Y., Yu, M., Jaakkola, T., Chang, S.: Towards coherent image inpainting using denoising diffusion implicit models (2023)
Google Scholar
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 586–595 (2018)
Google Scholar
Zhao, S., et al.: Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428 (2021)
Zheng, C., Cham, T.J., Cai, J.: Pluralistic image completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1438–1447 (2019)
Google Scholar
Zheng, H., et al.: Image inpainting with cascaded modulation GAN and object-aware training. In: European Conference on Computer Vision (ECCV), pp. 277–296. Springer (2022)
Google Scholar
Zhuang, J., Zeng, Y., Liu, W., Yuan, C., Chen, K.: A task is worth one word: Learning with task prompts for high-quality versatile image inpainting. arXiv preprint arXiv:2312.03594 (2023)

Download references

Acknowledgment

This work is supported in part by Research Matching Grant (CSE-7-2022)-RMG01.

Author information

Authors and Affiliations

ARC Lab, Tencent PCG, Shenzhen, China
Xuan Ju, Xian Liu, Xintao Wang & Ying Shan
The Chinese University of Hong Kong, HongKong, China
Xuan Ju, Xian Liu, Yuxuan Bian & Qiang Xu

Authors

Xuan Ju
View author publications
You can also search for this author in PubMed Google Scholar
Xian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xintao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuxuan Bian
View author publications
You can also search for this author in PubMed Google Scholar
Ying Shan
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuan Ju .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ju, X., Liu, X., Wang, X., Bian, Y., Shan, Y., Xu, Q. (2025). BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15078. Springer, Cham. https://doi.org/10.1007/978-3-031-72661-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-72661-3_9
Published: 27 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72660-6
Online ISBN: 978-3-031-72661-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion