skip to main content
10.1145/3664647.3681283acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion Model

Published: 28 October 2024 Publication History

Abstract

Inserting foreground objects into specific background scenes and eliminating the illumination inconsistency (eg., color, brightness) between them is an important and challenging task. It typically involves multiple processing tasks, such as image harmonization and shadow generation. In these two domains, there are already many mature solutions, but they often only focus on one of the tasks. Recently, some image composition methods have utilized diffusion models to address both of these issues simultaneously, but they cannot guarantee complete reconstruction of the foreground content. In this work, we propose CFDiffusion, which can simultaneously handle image harmonization and shadow generation. We first employ a shadow mask predictor to estimate the shadow mask of the foreground object. Next, we design a harmonization-shadow generator based on a diffusion model to harmonize the foreground and generate shadows concurrently. Additionally, we propose a foreground content enhancement module to ensure the complete preservation of foreground content at the insertion location, and we also develop an adaptive encoder to guide the harmonization process in the foreground area. The experimental results on the iHarmony4 dataset and the IH-SG dataset demonstrate the superiority of our CFDiffusion approach.

References

[1]
Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended Diffusion for Text-driven Editing of Natural Images. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr52688.2022.01767
[2]
Zhongyun Bao, Gang Fu, Zipei Chen, and Chunxia Xiao. 2024. Illuminator: Image-based illumination editing for indoor scene harmonization. Computational Visual Media (2024), 1--19.
[3]
Zhongyun Bao, Chengjiang Long, Gang Fu, Daquan Liu, Yuanzhen Li, Jiaming Wu, and Chunxia Xiao. [n.,d.]. Deep Image-based Illumination Harmonization. ( [n.,d.]).
[4]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large Scale GAN Training for High Fidelity Natural Image Synthesis. International Conference on Learning Representations,International Conference on Learning Representations (Sep 2018).
[5]
Zipei Chen, Chengjiang Long, Ling Zhang, and Chunxia Xiao. 2021. Canet: A context-aware network for shadow removal. In Proceedings of the IEEE/CVF international conference on computer vision. 4743--4752.
[6]
Zipei Chen, Xiao Lu, Ling Zhang, and Chunxia Xiao. 2022. Semi-supervised video shadow detection via image-assisted pseudo-label generation. In Proceedings of the 30th acm international conference on multimedia. 2700--2708.
[7]
Wenyan Cong, Junyan Cao, Li Niu, Chenglong Zhang, Xuesong Gao, Zhiwei Tang, and Liqing Zhang. 2021. Deep Image Harmonization by Bridging the Reality Gap. Cornell University - arXiv,Cornell University - arXiv (Mar 2021).
[8]
Wenyan Cong, Li Niu, Jianfu Zhang, Jing Liang, and Liqing Zhang. 2021. BargainNet: Background-guided domain translation for image harmonization. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
[9]
Wenyan Cong, Xinhao Tao, Li Niu, Jing Liang, Xuesong Gao, Qihao Sun, and Liqing Zhang. 2022. High-Resolution Image Harmonization via Collaborative Dual Transformations. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr52688.2022.01792
[10]
Wenyan Cong, Jianfu Zhang, Li Niu, Liu Liu, Zhixin Ling, Weiyuan Li, and Liqing Zhang. 2020. DoveNet: Deep Image Harmonization via Domain Verification. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr42600.2020.00842
[11]
Xiaodong Cun and Chi-Man Pun. 2020. Improving the Harmony of the Composite Image by Spatial-Separated Attention Module. IEEE Transactions on Image Processing (Jan 2020), 4759--4771. https://doi.org/10.1109/tip.2020.2975979
[12]
JulianJorgeAndrade Guerreiro, Mitsuru Nakazawa, and Björn Stenger. [n.,d.]. PCT-Net: Full Resolution Image Harmonization Using Pixel-Wise Color Transformations. ( [n.,d.]).
[13]
Zonghui Guo, Dongsheng Guo, Haiyong Zheng, Zhaorui Gu, Bing Zheng, and Junyu Dong. 2021. Image Harmonization With Transformer. International Conference on Computer Vision,International Conference on Computer Vision (Jan 2021).
[14]
Hao Guoqing, Satoshi Iizuka, and Kiichi Fukui. 2020. Image Harmonization with Attention-based Deep Feature Modulation. British Machine Vision Conference,British Machine Vision Conference (Jan 2020).
[15]
Roy Hachnochi, Mingrui Zhao, Nadav Orzech, Rinon Gal, Ali Mahdavi-Amiri, Daniel Cohen-Or, and AmitHaim Bermano. 2023. Cross-domain Compositing with Pretrained Diffusion Models. (Feb 2023).
[16]
Yucheng Hang, Bin Xia, Wenming Yang, and Qingmin Liao. 2022. SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization. (Apr 2022).
[17]
Yucheng Hang, Bin Xia, Wenming Yang, and Qingmin Liao. 2022. Scs-co: Self-consistent style contrastive learning for image harmonization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19710--19719.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.123
[19]
Liu He, Yijuan Lu, John Corring, Dinei Florencio, and Cha Zhang. 2023. Diffusion-based Document Layout Generation. (Mar 2023).
[20]
Jonathan Ho, Ajay Jain, Pieter Abbeel, and UC Berkeley. [n.,d.]. Denoising Diffusion Probabilistic Models. ( [n.,d.]).
[21]
Yan Hong, Li Niu, and Jianfu Zhang. 2022. Shadow Generation for Composite Image in Real-World Scenes. Proceedings of the AAAI Conference on Artificial Intelligence (Jul 2022), 914--922. https://doi.org/10.1609/aaai.v36i1.19974
[22]
Xiaowei Hu, Yitong Jiang, Chi Wing Fu, and Pheng Ann Heng. 2019. Mask-ShadowGAN: Learning to Remove Shadows From Unpaired Data. In Proceedings of the IEEE International Conference on Computer Vision.
[23]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2017.632
[24]
Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, and Zhangyang Wang. 2021. SSH: A Self-Supervised Framework for Image Harmonization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4832--4841.
[25]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00453
[26]
Kevin Karsch, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, Hailin Jin, Rafael Fonte, Michael Sittig, and David Forsyth. 2014. Automatic Scene Inference for 3D Object Compositing. ACM Transactions on Graphics (May 2014), 1--15. https://doi.org/10.1145/2602146
[27]
Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2022. Imagic: Text-Based Real Image Editing with Diffusion Models. (Oct 2022).
[28]
Zhanghan Ke, Chunyi Sun, Lei Zhu, Ke Xu, and RynsonW.H. Lau. 2022. Harmonizer: Learning to Perform White-Box Image and Video Harmonization. (Jul 2022).
[29]
Eric Kee, James F. O'brien, and Hany Farid. 2014. Exposing Photo Manipulation from Shading and Shadows. ACM Transactions on Graphics (Sep 2014), 1--21. https://doi.org/10.1145/2629646
[30]
Gihyun Kwon and JongChul Ye. 2022. Diffusion-based Image Translation using Disentangled Style and Content Representation. (Sep 2022).
[31]
Jean-Francois Lalonde and Alexei A. Efros. 2007. Using Color Compatibility for Assessing Image Realism. In 2007 IEEE 11th International Conference on Computer Vision. https://doi.org/10.1109/iccv.2007.4409107
[32]
Jun Ling, Han Xue, Li Song, Rong Xie, and Xiao Gu. 2021. Region-aware Adaptive Instance Normalization for Image Harmonization. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr46437.2021.00924
[33]
Daquan Liu, Chengjiang Long, Hongpan Zhang, Hanning Yu, Xinzhi Dong, and Chunxia Xiao. 2020. ARShadowGAN: Shadow Generative Adversarial Network for Augmented Reality in Single Light Scenes. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr42600.2020.00816
[34]
Qingyang Liu, Junqi You, Jianting Wang, Xinhao Tao, Bo Zhang, and Li Niu. 2024. Shadow Generation for Composite Image Using Diffusion model. arXiv preprint arXiv:2403.15234 (2024).
[35]
Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, and Trevor Darrell. 2023. More Control for Free! Image Synthesis with Semantic Diffusion Guidance. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). https://doi.org/10.1109/wacv56688.2023.00037
[36]
Lingxiao Lu, Jiangtong Li, Junyan Cao, Li Niu, and Liqing2023 Zhang. [n.,d.]. Painterly Image Harmonization using Diffusion Model. ( [n.,d.]).
[37]
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. 2022. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr52688.2022.01117
[38]
Shitong Luo and Wei Hu. 2021. Diffusion Probabilistic Models for 3D Point Cloud Generation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr46437.2021.00286
[39]
Chenlin Meng, Ya-Ling He, Song Yang, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2021. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. Cornell University - arXiv,Cornell University - arXiv (Aug 2021).
[40]
Chong Mou, Xintao Wang, Liangbin Xie, Jian Zhang, Zhongang Qi, Ying Shan, and Xiaohu Qie. 2023. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models. (Feb 2023).
[41]
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. [n.,d.]. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. ( [n.,d.]).
[42]
Ben Poole, Ajay Jain, JonathanT Barron, Ben Mildenhall, Google Research, and UC Berkeley. [n.,d.]. DREAMFUSION: TEXT-TO-3D USING 2D DIFFUSION. ( [n.,d.]).
[43]
Alec Radford, JongWook Kim, Chris Hallacy, A. Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Askell Amanda, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. Cornell University - arXiv,Cornell University - arXiv (Feb 2021).
[44]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr52688.2022.01042
[45]
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2022. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. (Aug 2022).
[46]
Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. 2022. Palette: Image-to-Image Diffusion Models. In Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings. https://doi.org/10.1145/3528233.3530757
[47]
Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. 2022. Image Super-Resolution Via Iterative Refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence (Jan 2022), 1--14. https://doi.org/10.1109/tpami.2022.3204461
[48]
Mathew Salvaris, Danielle Dean, and Wee Hyong Tok. 2018. Generative Adversarial Networks. 187--208. https://doi.org/10.1007/978--1--4842--3679--6_8
[49]
Yichen Sheng, Jianming Zhang, and Bedrich Benes. 2021. SSN: Soft Shadow Network for Image Compositing. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr46437.2021.00436
[50]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising Diffusion Implicit Models. arXiv: Learning,arXiv: Learning (Oct 2020).
[51]
Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, and Daniel Aliaga. 2023. Objectstitch: Object compositing with diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18310--18319.
[52]
Kalyan Sunkavalli, Micah K. Johnson, Wojciech Matusik, and Hanspeter Pfister. 2010. Multi-scale image harmonization. ACM Transactions on Graphics (Jul 2010), 1--10. https://doi.org/10.1145/1778765.1778862
[53]
Linfeng Tan, Jiangtong Li, Li Niu, and Liqing Zhang. 2023. Deep Image Harmonization in Dual Color Spaces. (Aug 2023).
[54]
Xinhao Tao, Junyan Cao, Yan Hong, and Li Niu. 2024. Shadow generation with decomposed mask prediction and attentive shadow filling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 5198--5206.
[55]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, AidanN. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. Neural Information Processing Systems,Neural Information Processing Systems (Jun 2017).
[56]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. [n.,d.]. Non-local Neural Networks. ( [n.,d.]).
[57]
Ben Xue, Shenghui Ran, Quan Chen, Rongfei Jia, Binqiang Zhao, and Xing Tang. 2022. DCCF: Deep Comprehensible Color Filter Learning Framework for High-Resolution Image Harmonization. (Jul 2022).
[58]
Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, and Fang Wen. 2022. Paint by Example: Exemplar-based Image Editing with Diffusion Models. (Nov 2022).
[59]
Bo Zhang, Yuxuan Duan, Jun Lan, Yan Hong, Huijia Zhu, Weiqiang Wang, and Li Niu. 2023. Controlcom: Controllable image composition using diffusion model. arXiv preprint arXiv:2308.10040 (2023).
[60]
Han Zhang, Ian Goodfellow, DimitrisN. Metaxas, and Augustus Odena. 2019. Self-Attention Generative Adversarial Networks. International Conference on Machine Learning,International Conference on Machine Learning (May 2019).
[61]
Shuyang Zhang, Runze Liang, and Miao Wang. 2019. ShadowGAN: Shadow synthesis for virtual objects with conditional adversarial networks. Computational Visual Media (Mar 2019), 105--115. https://doi.org/10.1007/s41095-019-0136--1
[62]
Jing Zhou, Ziqi Yu, Zhongyun Bao, Gang Fu, Weilei He, Chao Liang, and Chunxia Xiao. 2024. Foreground Harmonization and Shadow Generation for Com- posite Image. In Proceedings of the 32nd ACM International Conference on Multimedia (MM '24).

Cited By

View all
  • (2024)Foreground Harmonization and Shadow Generation for Composite ImageProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681355(8267-8276)Online publication date: 28-Oct-2024

Index Terms

  1. CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. image composition
    2. image harmonization
    3. shadow generation

    Qualifiers

    • Research-article

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)135
    • Downloads (Last 6 weeks)51
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Foreground Harmonization and Shadow Generation for Composite ImageProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681355(8267-8276)Online publication date: 28-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media