ABSTRACT
Image editing plays a vital role in computer vision field, aiming to realistically manipulate images while ensuring seamless integration. It finds numerous applications across various fields. In this work, we present EditAnything, a novel approach that empowers users with unparalleled flexibility in editing and generating image content. EditAnything introduces an array of advanced features, including cross-image dragging (e.g., try-on), region-interactive editing, controllable layout generation, and virtual character replacement. By harnessing these capabilities, users can engage in interactive and flexible editing, giving captivating outcomes that uphold the integrity of the original image. With its diverse range of tools, EditAnything caters to a wide spectrum of editing needs, pushing the boundaries of image editing and unlocking exciting new possibilities. The source code is released at https://github.com/sail-sg/EditAnything.
- Shanghua Gao, Pan Zhou, Ming-Ming Cheng, and Shuicheng Yan. 2023. Masked diffusion transformer is a strong image synthesizer. In International Conference on Computer Vision.Google ScholarCross Ref
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM, Vol. 63, 11 (2020), 139--144.Google ScholarDigital Library
- Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).Google Scholar
- Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).Google Scholar
- Lorenzo Luzi, Ali Siahkoohi, Paul M Mayer, Josue Casco-Rodriguez, and Richard Baraniuk. 2022. Boomerang: Local sampling on image manifolds using diffusion models. arXiv preprint arXiv:2210.12100 (2022).Google Scholar
- Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).Google Scholar
- Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684--10695.Google ScholarCross Ref
- Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, Vol. 35 (2022), 36479--36494.Google Scholar
- Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256--2265.Google Scholar
- Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, Vol. 32 (2019).Google Scholar
- Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations.Google Scholar
- Zike Wu, Pan Zhou, Kenji Kawaguchi, and Hanwang Zhang. 2023. Fast Diffusion Model. arxiv: 2306.06991 [cs.CV]Google Scholar
- Lvmin Zhang and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543 (2023).Google Scholar
Index Terms
- EditAnything: Empowering Unparalleled Flexibility in Image Editing and Generation
Recommendations
ImEW: A Framework for Editing Image in the Wild
LGM3A '23: Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal ApplicationsThe ability to edit images in a realistic and visually appealing manner is a fundamental requirement in various computer vision applications. In this paper, we present ImEW, a unified framework designed for solving image editing tasks. ImEW utilizes off-...
Plenoptic Image Editing
This paper presents a new class of interactive image editing operations designed to maintain consistency between multiple images of a physical 3D scene. The distinguishing feature of these operations is that edits to any one image propagate ...
UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image
Text-driven image generation methods have shown impressive results recently, allowing casual users to generate high quality images by providing textual descriptions. However, similar capabilities for editing existing images are still out of reach. Text-...
Comments