research-article

CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion Model

Authors:

Chunxia XiaoAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 3647 - 3656

https://doi.org/10.1145/3664647.3681283

Published: 28 October 2024 Publication History

Abstract

Inserting foreground objects into specific background scenes and eliminating the illumination inconsistency (eg., color, brightness) between them is an important and challenging task. It typically involves multiple processing tasks, such as image harmonization and shadow generation. In these two domains, there are already many mature solutions, but they often only focus on one of the tasks. Recently, some image composition methods have utilized diffusion models to address both of these issues simultaneously, but they cannot guarantee complete reconstruction of the foreground content. In this work, we propose CFDiffusion, which can simultaneously handle image harmonization and shadow generation. We first employ a shadow mask predictor to estimate the shadow mask of the foreground object. Next, we design a harmonization-shadow generator based on a diffusion model to harmonize the foreground and generate shadows concurrently. Additionally, we propose a foreground content enhancement module to ensure the complete preservation of foreground content at the insertion location, and we also develop an adaptive encoder to guide the harmonization process in the foreground area. The experimental results on the iHarmony4 dataset and the IH-SG dataset demonstrate the superiority of our CFDiffusion approach.

References

[1]

Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended Diffusion for Text-driven Editing of Natural Images. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr52688.2022.01767

[2]

Zhongyun Bao, Gang Fu, Zipei Chen, and Chunxia Xiao. 2024. Illuminator: Image-based illumination editing for indoor scene harmonization. Computational Visual Media (2024), 1--19.

[3]

Zhongyun Bao, Chengjiang Long, Gang Fu, Daquan Liu, Yuanzhen Li, Jiaming Wu, and Chunxia Xiao. [n.,d.]. Deep Image-based Illumination Harmonization. ( [n.,d.]).

[4]

Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large Scale GAN Training for High Fidelity Natural Image Synthesis. International Conference on Learning Representations,International Conference on Learning Representations (Sep 2018).

[5]

Zipei Chen, Chengjiang Long, Ling Zhang, and Chunxia Xiao. 2021. Canet: A context-aware network for shadow removal. In Proceedings of the IEEE/CVF international conference on computer vision. 4743--4752.

[6]

Zipei Chen, Xiao Lu, Ling Zhang, and Chunxia Xiao. 2022. Semi-supervised video shadow detection via image-assisted pseudo-label generation. In Proceedings of the 30th acm international conference on multimedia. 2700--2708.

Digital Library

[7]

Wenyan Cong, Junyan Cao, Li Niu, Chenglong Zhang, Xuesong Gao, Zhiwei Tang, and Liqing Zhang. 2021. Deep Image Harmonization by Bridging the Reality Gap. Cornell University - arXiv,Cornell University - arXiv (Mar 2021).

[8]

Wenyan Cong, Li Niu, Jianfu Zhang, Jing Liang, and Liqing Zhang. 2021. BargainNet: Background-guided domain translation for image harmonization. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.

[9]

Wenyan Cong, Xinhao Tao, Li Niu, Jing Liang, Xuesong Gao, Qihao Sun, and Liqing Zhang. 2022. High-Resolution Image Harmonization via Collaborative Dual Transformations. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr52688.2022.01792

[10]

Wenyan Cong, Jianfu Zhang, Li Niu, Liu Liu, Zhixin Ling, Weiyuan Li, and Liqing Zhang. 2020. DoveNet: Deep Image Harmonization via Domain Verification. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr42600.2020.00842

[11]

Xiaodong Cun and Chi-Man Pun. 2020. Improving the Harmony of the Composite Image by Spatial-Separated Attention Module. IEEE Transactions on Image Processing (Jan 2020), 4759--4771. https://doi.org/10.1109/tip.2020.2975979

Digital Library

[12]

JulianJorgeAndrade Guerreiro, Mitsuru Nakazawa, and Björn Stenger. [n.,d.]. PCT-Net: Full Resolution Image Harmonization Using Pixel-Wise Color Transformations. ( [n.,d.]).

[13]

Zonghui Guo, Dongsheng Guo, Haiyong Zheng, Zhaorui Gu, Bing Zheng, and Junyu Dong. 2021. Image Harmonization With Transformer. International Conference on Computer Vision,International Conference on Computer Vision (Jan 2021).

[14]

Hao Guoqing, Satoshi Iizuka, and Kiichi Fukui. 2020. Image Harmonization with Attention-based Deep Feature Modulation. British Machine Vision Conference,British Machine Vision Conference (Jan 2020).

[15]

Roy Hachnochi, Mingrui Zhao, Nadav Orzech, Rinon Gal, Ali Mahdavi-Amiri, Daniel Cohen-Or, and AmitHaim Bermano. 2023. Cross-domain Compositing with Pretrained Diffusion Models. (Feb 2023).

[16]

Yucheng Hang, Bin Xia, Wenming Yang, and Qingmin Liao. 2022. SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization. (Apr 2022).

[17]

Yucheng Hang, Bin Xia, Wenming Yang, and Qingmin Liao. 2022. Scs-co: Self-consistent style contrastive learning for image harmonization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19710--19719.

[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.123

Digital Library

[19]

Liu He, Yijuan Lu, John Corring, Dinei Florencio, and Cha Zhang. 2023. Diffusion-based Document Layout Generation. (Mar 2023).

[20]

Jonathan Ho, Ajay Jain, Pieter Abbeel, and UC Berkeley. [n.,d.]. Denoising Diffusion Probabilistic Models. ( [n.,d.]).

[21]

Yan Hong, Li Niu, and Jianfu Zhang. 2022. Shadow Generation for Composite Image in Real-World Scenes. Proceedings of the AAAI Conference on Artificial Intelligence (Jul 2022), 914--922. https://doi.org/10.1609/aaai.v36i1.19974

[22]

Xiaowei Hu, Yitong Jiang, Chi Wing Fu, and Pheng Ann Heng. 2019. Mask-ShadowGAN: Learning to Remove Shadows From Unpaired Data. In Proceedings of the IEEE International Conference on Computer Vision.

[23]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2017.632

[24]

Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, and Zhangyang Wang. 2021. SSH: A Self-Supervised Framework for Image Harmonization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4832--4841.

[25]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00453

[26]

Kevin Karsch, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, Hailin Jin, Rafael Fonte, Michael Sittig, and David Forsyth. 2014. Automatic Scene Inference for 3D Object Compositing. ACM Transactions on Graphics (May 2014), 1--15. https://doi.org/10.1145/2602146

Digital Library

[27]

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2022. Imagic: Text-Based Real Image Editing with Diffusion Models. (Oct 2022).

[28]

Zhanghan Ke, Chunyi Sun, Lei Zhu, Ke Xu, and RynsonW.H. Lau. 2022. Harmonizer: Learning to Perform White-Box Image and Video Harmonization. (Jul 2022).

[29]

Eric Kee, James F. O'brien, and Hany Farid. 2014. Exposing Photo Manipulation from Shading and Shadows. ACM Transactions on Graphics (Sep 2014), 1--21. https://doi.org/10.1145/2629646

Digital Library

[30]

Gihyun Kwon and JongChul Ye. 2022. Diffusion-based Image Translation using Disentangled Style and Content Representation. (Sep 2022).

[31]

Jean-Francois Lalonde and Alexei A. Efros. 2007. Using Color Compatibility for Assessing Image Realism. In 2007 IEEE 11th International Conference on Computer Vision. https://doi.org/10.1109/iccv.2007.4409107

[32]

Jun Ling, Han Xue, Li Song, Rong Xie, and Xiao Gu. 2021. Region-aware Adaptive Instance Normalization for Image Harmonization. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr46437.2021.00924

[33]

Daquan Liu, Chengjiang Long, Hongpan Zhang, Hanning Yu, Xinzhi Dong, and Chunxia Xiao. 2020. ARShadowGAN: Shadow Generative Adversarial Network for Augmented Reality in Single Light Scenes. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr42600.2020.00816

[34]

Qingyang Liu, Junqi You, Jianting Wang, Xinhao Tao, Bo Zhang, and Li Niu. 2024. Shadow Generation for Composite Image Using Diffusion model. arXiv preprint arXiv:2403.15234 (2024).

[35]

Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, and Trevor Darrell. 2023. More Control for Free! Image Synthesis with Semantic Diffusion Guidance. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). https://doi.org/10.1109/wacv56688.2023.00037

[36]

Lingxiao Lu, Jiangtong Li, Junyan Cao, Li Niu, and Liqing2023 Zhang. [n.,d.]. Painterly Image Harmonization using Diffusion Model. ( [n.,d.]).

[37]

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. 2022. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr52688.2022.01117

[38]

Shitong Luo and Wei Hu. 2021. Diffusion Probabilistic Models for 3D Point Cloud Generation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr46437.2021.00286

[39]

Chenlin Meng, Ya-Ling He, Song Yang, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2021. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. Cornell University - arXiv,Cornell University - arXiv (Aug 2021).

[40]

Chong Mou, Xintao Wang, Liangbin Xie, Jian Zhang, Zhongang Qi, Ying Shan, and Xiaohu Qie. 2023. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models. (Feb 2023).

[41]

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. [n.,d.]. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. ( [n.,d.]).

[42]

Ben Poole, Ajay Jain, JonathanT Barron, Ben Mildenhall, Google Research, and UC Berkeley. [n.,d.]. DREAMFUSION: TEXT-TO-3D USING 2D DIFFUSION. ( [n.,d.]).

[43]

Alec Radford, JongWook Kim, Chris Hallacy, A. Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Askell Amanda, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. Cornell University - arXiv,Cornell University - arXiv (Feb 2021).

[44]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr52688.2022.01042

[45]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2022. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. (Aug 2022).

[46]

Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. 2022. Palette: Image-to-Image Diffusion Models. In Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings. https://doi.org/10.1145/3528233.3530757

Digital Library

[47]

Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. 2022. Image Super-Resolution Via Iterative Refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence (Jan 2022), 1--14. https://doi.org/10.1109/tpami.2022.3204461

Digital Library

[48]

Mathew Salvaris, Danielle Dean, and Wee Hyong Tok. 2018. Generative Adversarial Networks. 187--208. https://doi.org/10.1007/978--1--4842--3679--6_8

[49]

Yichen Sheng, Jianming Zhang, and Bedrich Benes. 2021. SSN: Soft Shadow Network for Image Compositing. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr46437.2021.00436

[50]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising Diffusion Implicit Models. arXiv: Learning,arXiv: Learning (Oct 2020).

[51]

Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, and Daniel Aliaga. 2023. Objectstitch: Object compositing with diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18310--18319.

[52]

Kalyan Sunkavalli, Micah K. Johnson, Wojciech Matusik, and Hanspeter Pfister. 2010. Multi-scale image harmonization. ACM Transactions on Graphics (Jul 2010), 1--10. https://doi.org/10.1145/1778765.1778862

Digital Library

[53]

Linfeng Tan, Jiangtong Li, Li Niu, and Liqing Zhang. 2023. Deep Image Harmonization in Dual Color Spaces. (Aug 2023).

[54]

Xinhao Tao, Junyan Cao, Yan Hong, and Li Niu. 2024. Shadow generation with decomposed mask prediction and attentive shadow filling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 5198--5206.

Digital Library

[55]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, AidanN. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. Neural Information Processing Systems,Neural Information Processing Systems (Jun 2017).

[56]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. [n.,d.]. Non-local Neural Networks. ( [n.,d.]).

[57]

Ben Xue, Shenghui Ran, Quan Chen, Rongfei Jia, Binqiang Zhao, and Xing Tang. 2022. DCCF: Deep Comprehensible Color Filter Learning Framework for High-Resolution Image Harmonization. (Jul 2022).

[58]

Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, and Fang Wen. 2022. Paint by Example: Exemplar-based Image Editing with Diffusion Models. (Nov 2022).

[59]

Bo Zhang, Yuxuan Duan, Jun Lan, Yan Hong, Huijia Zhu, Weiqiang Wang, and Li Niu. 2023. Controlcom: Controllable image composition using diffusion model. arXiv preprint arXiv:2308.10040 (2023).

[60]

Han Zhang, Ian Goodfellow, DimitrisN. Metaxas, and Augustus Odena. 2019. Self-Attention Generative Adversarial Networks. International Conference on Machine Learning,International Conference on Machine Learning (May 2019).

[61]

Shuyang Zhang, Runze Liang, and Miao Wang. 2019. ShadowGAN: Shadow synthesis for virtual objects with conditional adversarial networks. Computational Visual Media (Mar 2019), 105--115. https://doi.org/10.1007/s41095-019-0136--1

[62]

Jing Zhou, Ziqi Yu, Zhongyun Bao, Gang Fu, Weilei He, Chao Liang, and Chunxia Xiao. 2024. Foreground Harmonization and Shadow Generation for Com- posite Image. In Proceedings of the 32nd ACM International Conference on Multimedia (MM '24).

Digital Library

Cited By

Zhou JYu ZBao ZFu GHe WLiang CXiao CCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Foreground Harmonization and Shadow Generation for Composite ImageProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681355(8267-8276)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681355

Index Terms

CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion Model
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation

Recommendations

Foreground Harmonization and Shadow Generation for Composite Image
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

We propose a method for lighting and shadow editing of outdoor disharmonious composite images, including foreground harmonization and cast shadow generation. Most existing works can only perform foreground appearance editing task or only focus on shadow ...
Intrinsic Harmonization for Illumination-Aware Image Compositing
SA '23: SIGGRAPH Asia 2023 Conference Papers

Despite significant advancements in network-based image harmonization techniques, there still exists a domain disparity between typical training pairs and real-world composites encountered during inference. Most existing methods are trained to reverse ...
Conveying Shape and Features with Image-Based Relighting
VIS '03: Proceedings of the 14th IEEE Visualization 2003 (VIS'03)

Hand-crafted illustrations are often more effective than photographs for conveying the shape and important features of an object, but they require expertise and time to produce. We describe an image compositing system and user interface that allow an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
135
Total Downloads

Downloads (Last 12 months)135
Downloads (Last 6 weeks)51

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou JYu ZBao ZFu GHe WLiang CXiao CCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Foreground Harmonization and Shadow Generation for Composite ImageProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681355(8267-8276)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681355

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten