MagicEraser: Erasing Any Objects via Semantics-Aware Control

Li, Fan; Zhang, Zixiao; Huang, Yi; Liu, Jianzhuang; Pei, Renjing; Shao, Bin; Xu, Songcen

doi:10.1007/978-3-031-73390-1_13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15086))

Included in the following conference series:

European Conference on Computer Vision

178 Accesses

Abstract

The traditional image inpainting task aims to restore corrupted regions by referencing surrounding background and foreground. However, the object erasure task, which is in increasing demand, aims to erase objects and generate harmonious background. Previous GAN-based inpainting methods struggle with intricate texture generation. Emerging diffusion model-based algorithms, such as Stable Diffusion Inpainting, exhibit the capability to generate novel content, but they often produce incongruent results at the locations of the erased objects and require high-quality text prompt inputs. To address these challenges, we introduce MagicEraser, a diffusion model-based framework tailored for the object erasure task. It consists of two phases: content initialization and controllable generation. In the latter phase, we develop two plug-and-play modules called prompt tuning and semantics-aware attention refocus. Additionally, we propose a data construction strategy that generates training data specially suitable for this task. MagicEraser achieves fine and effective control of content generation while mitigating undesired artifacts. Experimental results highlight a valuable advancement of our approach in the object erasure task.

F. Li and Z. Zhang—Equal Contribution

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing

DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation

Notes

1.
https://github.com/lifan724/magic_eraser.
2.
https://github.com/runwayml/stable-diffusion.
3.
https://github.com/advimman/lama.
4.
https://github.com/facebookresearch/Mask2Former.
5.
https://github.com/haotian-liu/LLaVA.
6.
https://www.adobe.com/products/firefly.html, May 11, 2024.
7.
Google Pixel8 Build Number AP1A.240305.019.A1.

References

Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: learning to follow image editing instructions. In: CVPR, pp. 18392–18402 (2023)
Google Scholar
Cao, H., et al.: A survey on generative diffusion models. IEEE TKDE (2024)
Google Scholar
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: CVPR, pp. 1290–1299 (2022)
Google Scholar
Croitoru, F.A., Hondru, V., Ionescu, R.T., Shah, M.: Diffusion models in vision: A survey. IEEE TPAMI (2023)
Google Scholar
Epstein, D., Jabri, A., Poole, B., Efros, A.A., Holynski, A.: Diffusion self-guidance for controllable image generation. arXiv preprint arXiv:2306.00986 (2023)
Gal, R., et al.: An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022)
Goodfellow, I.J., et al.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)
Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-or, D.: Prompt-to-prompt image editing with cross-attention control. In: ICLR (2023)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS. vol. 30 (2017)
Google Scholar
Ho, J., et al.: Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. NeurIPS 33, 6840–6851 (2020)
Google Scholar
Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M., Salimans, T.: Cascaded diffusion models for high fidelity image generation. JMLR 23(1), 2249–2281 (2022)
MathSciNet Google Scholar
Hu, E.J., et al.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Huang, Y., Huang, J., Liu, J., Dong, Y., Lv, J., Chen, S.: Wavedm: wavelet-based diffusion models for image restoration. IEEE TMM (2024)
Google Scholar
Huang, Y., et al.: Diffusion model-based image editing: A survey. arXiv preprint arXiv:2402.17525 (2024)
Jiang, Y., et al.: Ssh: A self-supervised framework for image harmonization. In: ICCV, pp. 4832–4841 (2021)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. arXiv preprint arXiv:1912.04958 (2020)
Kawar, B., Elad, M., Ermon, S., Song, J.: Denoising diffusion restoration models. NeurIPS 35, 23593–23606 (2022)
Google Scholar
Kim, Y., Lee, J., Kim, J.H., Ha, J.W., Zhu, J.Y.: Dense text-to-image generation with attention modulation. arXiv preprint arXiv:2308.12964 (2023)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)
Google Scholar
Kuznetsova, A., et al.: The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV (2020)
Google Scholar
Li, W., Lin, Z., Zhou, K., Qi, L., Wang, Y., Jia, J.: Mat: Mask-aware transformer for large hole image inpainting. arXiv preprint arXiv:2203.15270 (2022)
Li, W., Yu, X., Zhou, K., Song, Y., Lin, Z.: Image inpainting via iteratively decoupled probabilistic modeling. In: ICLR (2024)
Google Scholar
Li, X., et al.: Diffusion models for image restoration and enhancement–a comprehensive survey. arXiv preprint arXiv:2308.09388 (2023)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. In: NeurIPS (2023)
Google Scholar
Liu, W., Cun, X., Pun, C.M., Xia, M., Zhang, Y., Wang, J.: Coordfill: efficient high-resolution image inpainting via parameterized coordinate querying. arXiv preprint arXiv:2303.08524 (2023)
Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint: Inpainting using denoising diffusion probabilistic models. In: CVPR, pp. 11461–11471 (2022)
Google Scholar
Meng, C., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: Sdedit: image synthesis and editing with stochastic differential equations. In: ICLR (2022)
Google Scholar
Nichol, A.Q., et al.: Glide: towards photorealistic image generation and editing with text-guided diffusion models. In: ICML, pp. 16784–16804 (2022)
Google Scholar
Özdenizci, O., Legenstein, R.: Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE TPAMI (2023)
Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022)
Ramesh, A., et al.: Zero-shot text-to-image generation. In: ICML, pp. 8821–8831 (2021)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)
Google Scholar
Saharia, C., et al.: Palette: image-to-image diffusion models. In: ACM SIGGRAPH, pp. 1–10 (2022)
Google Scholar
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: NeurIPS (2022)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)
Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: ICML (2015)
Google Scholar
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2021)
Google Scholar
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: ICLR (2021)
Google Scholar
Suvorov, R., et al.: Resolution-robust large mask inpainting with fourier convolutions. In: WACV, pp. 2149–2159 (2022)
Google Scholar
Wang, K., Yang, F., Yang, S., Butt, M.A., van de Weijer, J.: Dynamic prompt learning: Addressing cross-attention leakage for text-based image editing. In: NeurIPS (2023)
Google Scholar
Wang, S., et al.: Imagen editor and editbench: Advancing and evaluating text-guided image inpainting. In: CVPR, pp. 18359–18369 (2023)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP 13(4), 600–612 (2004)
Google Scholar
Xie, S., Zhang, Z., Lin, Z., Hinz, T., Zhang, K.: Smartbrush: text and shape guided object inpainting with diffusion model. In: CVPR, pp. 22428–22437 (2023)
Google Scholar
Yang, S., Zhang, L., Ma, L., Liu, Y., Fu, J., He, Y.: Magicremover: Tuning-free text-guided image inpainting with diffusion models. arXiv preprint arXiv:2310.02848 (2023)
Yildirim, A.B., Baday, V., Erdem, E., Erdem, A., Dundar, A.: Inst-inpaint: Instructing to remove objects with diffusion models. arXiv preprint arXiv:2304.03246 (2023)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. arXiv preprint arXiv:1801.03924 (2018)
Zhao, S., et al.: Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428 (2021)
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. TPAMI (2017)
Google Scholar
Zhuang, J., Zeng, Y., Liu, W., Yuan, C., Chen, K.: A task is worth one word: Learning with task prompts for high-quality versatile image inpainting. arXiv preprint arXiv:2312.03594 (2023)

Download references

Author information

Authors and Affiliations

Huawei Noah’s Ark Lab, Montreal, Canada
Fan Li, Zixiao Zhang, Yi Huang, Jianzhuang Liu, Renjing Pei, Bin Shao & Songcen Xu
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Fan Li, Zixiao Zhang, Yi Huang, Jianzhuang Liu, Renjing Pei, Bin Shao & Songcen Xu

Authors

Fan Li
View author publications
You can also search for this author in PubMed Google Scholar
Zixiao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhuang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Renjing Pei
View author publications
You can also search for this author in PubMed Google Scholar
Bin Shao
View author publications
You can also search for this author in PubMed Google Scholar
Songcen Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fan Li .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 41274 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, F. et al. (2025). MagicEraser: Erasing Any Objects via Semantics-Aware Control. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15086. Springer, Cham. https://doi.org/10.1007/978-3-031-73390-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-73390-1_13
Published: 31 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73389-5
Online ISBN: 978-3-031-73390-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MagicEraser: Erasing Any Objects via Semantics-Aware Control

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing

DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 41274 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

MagicEraser: Erasing Any Objects via Semantics-Aware Control

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing

DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 41274 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation