Skip to main content

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15078))

Included in the following conference series:

  • 400 Accesses

Abstract

Image inpainting, the process of restoring corrupted images, has seen significant advancements with the advent of diffusion models (DMs). Despite these advancements, current DM adaptations for inpainting, which involve modifications to the sampling strategy or the development of inpainting-specific DMs, frequently suffer from semantic inconsistencies and reduced image quality. Addressing these challenges, our work introduces a novel paradigm: the division of masked image features and noisy latent into separate branches. This division dramatically diminishes the model’s learning load, facilitating a nuanced incorporation of essential masked image information in a hierarchical fashion. Herein, we present BrushNet, a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM, guaranteeing coherent and enhanced image inpainting outcomes. Additionally, we introduce BrushData and BrushBench to facilitate segmentation-based inpainting training and performance assessment. Our extensive experimental analysis demonstrates BrushNet’s superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The proposed BrushData and BrushBench will be released along with the codes.

References

  1. Avrahami, O., Fried, O., Lischinski, D.: Blended latent diffusion. ACM Trans. Graph. (TOG) 42(4), 1–11 (2023)

    Article  Google Scholar 

  2. Avrahami, O., Lischinski, D., Fried, O.: Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18208–18218 (2022)

    Google Scholar 

  3. Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: International Conference and Exhibition on Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 417–424 (2000)

    Google Scholar 

  4. Binghui, C., Chao, L., Chongyang, Z., Wangmeng, X., Yifeng, G., Xuansong, X.: Replaceanything as you want: Ultra-high quality content replacement (2023). https://aigcdesigngroup.github.io/replace-anything/

  5. Bińkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying MMD GANs. arXiv preprint arXiv:1801.01401 (2018)

  6. Corneanu, C., Gadde, R., Martinez, A.M.: Latentpaint: image inpainting in latent space with diffusion models. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 4334–4343 (2024)

    Google Scholar 

  7. Criminisi, A., Pérez, P., Toyama, K.: Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 13(9), 1200–1212 (2004)

    Article  Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255. IEEE (2009)

    Google Scholar 

  9. epinikion: epicrealism (2023). https://civitai.com/models/25694?modelVersionId=143906

  10. heni29833: Henmixreal (2024). https://civitai.com/models/20282?modelVersionId=305687

  11. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems (NIPS) 30 (2017)

    Google Scholar 

  12. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems (NIPS) 33, 6840–6851 (2020)

    Google Scholar 

  13. Ho, J., Salimans, T.: Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)

  14. Huang, H., He, R., Sun, Z., Tan, T., et al.: Introvae: introspective variational autoencoders for photographic image synthesis. Advances in Neural Information Processing Systems (NIPS) 31 (2018)

    Google Scholar 

  15. Huang, Y., et al.: Diffusion model-based image editing: A survey. arXiv preprint arXiv:2402.17525 (2024)

  16. Jayasumana, S., Ramalingam, S., Veit, A., Glasner, D., Chakrabarti, A., Kumar, S.: Rethinking fid: Towards a better evaluation metric for image generation. arXiv preprint arXiv:2401.09603 (2023)

  17. Kuznetsova, A., et al.: The open images dataset v4: unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vis. (IJCV) 128(7), 1956–1981 (2020)

    Article  Google Scholar 

  18. Li, Z., Wei, P., Yin, X., Ma, Z., Kot, A.C.: Virtual try-on with pose-garment keypoints guided inpainting. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 22788–22797 (2023)

    Google Scholar 

  19. Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV), pp. 740–755. Springer (2014)

    Google Scholar 

  20. Liu, A., Niepert, M., Broeck, G.V.d.: Image inpainting via tractable steering of diffusion models. arXiv preprint arXiv:2401.03349 (2023)

  21. Liu, H., Wan, Z., Huang, W., Song, Y., Han, X., Liao, J.: Pd-GAN: probabilistic diverse GAN for image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9371–9381 (2021)

    Google Scholar 

  22. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: IEEE/CVF International Conference on Computer Vision (ICCV), December 2015

    Google Scholar 

  23. Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: RePaint: Inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11461–11471 (2022)

    Google Scholar 

  24. Lykon: Dreamshaper (2022). https://civitai.com/models/4384?modelVersionId=128713

  25. Manukyan, H., Sargsyan, A., Atanyan, B., Wang, Z., Navasardyan, S., Shi, H.: Hd-painter: high-resolution and prompt-faithful text-guided image inpainting with diffusion models. arXiv preprint arXiv:2312.14091 (2023)

  26. Meina: Meinamix (2023). https://civitai.com/models/7240?modelVersionId=119057

  27. Peng, J., Liu, D., Xu, S., Li, H.: Generating diverse structure for image inpainting with hierarchical vq-vae. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10775–10784 (2021)

    Google Scholar 

  28. von Platen, P., et al.: Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers (2022)

  29. Quan, W., Chen, J., Liu, Y., Yan, D.M., Wonka, P.: Deep learning-based image and video inpainting: a survey. IntJ. of Computer Vision (IJCV) pp. 1–34 (2024)

    Google Scholar 

  30. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (ICML), pp. 8748–8763. PMLR (2021)

    Google Scholar 

  31. Razzhigaev, A., et al.: Kandinsky: an improved text-to-image synthesis with image prior and latent diffusion. arXiv preprint arXiv:2310.03502 (2023)

  32. Ren, T., et al.: Grounded sam: assembling open-world models for diverse visual tasks. arXiv preprint arXiv:2401.14159 (2024)

  33. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695, June 2022

    Google Scholar 

  34. Schuhmann, C., et al.: Laion-5b: an open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems (NIPS) 35, 25278–25294 (2022)

    Google Scholar 

  35. SG161222: Realisticvision (2023). https://civitai.com/models/4201?modelVersionId=130072

  36. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)

  37. Wang, S., et al.: Imagen editor and editbench: advancing and evaluating text-guided image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 18359–18369 (2023)

    Google Scholar 

  38. Wikipedia contributors: Mean squared error — Wikipedia, the free encyclopedia (2024). https://en.wikipedia.org/w/index.php?title=Mean_squared_error&oldid=1207422018. Accessed 4 Mar 2024

  39. Wikipedia contributors: Peak signal-to-noise ratio — Wikipedia, the free encyclopedia (2024). https://en.wikipedia.org/w/index.php?title=Peak_signal-to-noise_ratio&oldid=1210897995. Accessed 4 Mar 2024

  40. Wu, C., et al.: GODIVA: generating open-domain videos from natural descriptions. arXiv preprint arXiv:2104.14806 (2021)

  41. Wu, X., et al.: Human preference score v2: a solid benchmark for evaluating human preferences of text-to-image synthesis. arXiv preprint arXiv:2306.09341 (2023)

  42. Xie, S., Zhang, Z., Lin, Z., Hinz, T., Zhang, K.: Smartbrush: text and shape guided object inpainting with diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22428–22437 (2023)

    Google Scholar 

  43. Xie, S., et al.: Dreaminpainter: text-guided subject-driven image inpainting with diffusion models. arXiv preprint arXiv:2312.03771 (2023)

  44. Xu, J., et al.: Imagereward: Learning and evaluating human preferences for text-to-image generation (2023)

    Google Scholar 

  45. Xu, Z., Zhang, X., Chen, W., Yao, M., Liu, J., Xu, T., Wang, Z.: A review of image inpainting methods based on deep learning. Appl. Sci. 13(20), 11189 (2023)

    Article  Google Scholar 

  46. Yang, S., Chen, X., Liao, J.: Uni-paint: a unified framework for multimodal image inpainting with pretrained diffusion model. In: ACM International Conference on Multimedia (MM), pp. 3190–3199 (2023)

    Google Scholar 

  47. Yang, S., Zhang, L., Ma, L., Liu, Y., Fu, J., He, Y.: Magicremover: tuning-free text-guided image inpainting with diffusion models. arXiv preprint arXiv:2310.02848 (2023)

  48. Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: Lsun: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)

  49. Yu, T., et al.: Inpaint anything: segment anything meets image inpainting. arXiv preprint arXiv:2304.06790 (2023)

  50. Zhang, G., Ji, J., Zhang, Y., Yu, M., Jaakkola, T., Chang, S.: Towards coherent image inpainting using denoising diffusion implicit models (2023)

    Google Scholar 

  51. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: IEEE/CVF International Conference on Computer Vision (ICCV) (2023)

    Google Scholar 

  52. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 586–595 (2018)

    Google Scholar 

  53. Zhao, S., et al.: Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428 (2021)

  54. Zheng, C., Cham, T.J., Cai, J.: Pluralistic image completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1438–1447 (2019)

    Google Scholar 

  55. Zheng, H., et al.: Image inpainting with cascaded modulation GAN and object-aware training. In: European Conference on Computer Vision (ECCV), pp. 277–296. Springer (2022)

    Google Scholar 

  56. Zhuang, J., Zeng, Y., Liu, W., Yuan, C., Chen, K.: A task is worth one word: Learning with task prompts for high-quality versatile image inpainting. arXiv preprint arXiv:2312.03594 (2023)

Download references

Acknowledgment

This work is supported in part by Research Matching Grant (CSE-7-2022)-RMG01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuan Ju .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ju, X., Liu, X., Wang, X., Bian, Y., Shan, Y., Xu, Q. (2025). BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15078. Springer, Cham. https://doi.org/10.1007/978-3-031-72661-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72661-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72660-6

  • Online ISBN: 978-3-031-72661-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics