Abstract
In this paper, we introduce L-DiffER, a language-based diffusion model designed for the ill-posed single image reflection removal task. Although having shown impressive performance for image generation, existing language-based diffusion models struggle with precise control and faithfulness in image restoration. To overcome these limitations, we propose an iterative condition refinement strategy to resolve the problem of inaccurate control conditions. A multi-condition constraint mechanism is employed to ensure the recovery faithfulness of image color and structure while retaining the generation capability to handle low-transmitted reflections. We demonstrate the superiority of the proposed method through extensive experiments, showcasing both quantitative and qualitative improvements over existing methods.
Y. Hong and H. Zhong—Equal contributions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that the spatial conditions are varied with timesteps in the proposed method.
- 2.
Details of \(\beta _t\) and \(\gamma _t\) will be explained in the supplementary material.
- 3.
- 4.
Evaluations on reflection layers are provided in the supplementary material.
- 5.
More ablation studies are provided in the supplementary material.
References
Chang, Y., Jung, C., Sun, J.: Joint reflection removal and depth estimation from a single image. IEEE Trans. Cybern. 51(12), 5836–5849 (2020)
Chang, Y., Jung, C., Sun, J., Wang, F.: Siamese dense network for reflection removal with flash and no-flash image pairs. Int. J. Comput. Vision 128, 1673–1698 (2020)
Chang, Z., Weng, S., Li, Y., Li, S., Shi, B.: L-CoDer: language-based colorization with color-object decoupling transformer. In: Proceedings of European Conference on Computer Vision (2022)
Chang, Z., Weng, S., Zhang, P., Li, Y., Li, S., Shi, B.: L-CAD: language-based colorization with any-level descriptions using diffusion priors. In: Proceedings of Advances in Neural Information Processing Systems (2023)
Chang, Z., Weng, S., Zhang, P., Li, Y., Li, S., Shi, B.: L-CoIns: language-based colorization with instance awareness. In: Proceedings of Computer Vision and Pattern Recognition (2023)
Chen, X., et al.: Microsoft coco captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)
Diamant, Y., Schechner, Y.Y.: Overcoming visual reverberations. In: Proceedings of Computer Vision and Pattern Recognition (2008)
Dong, Z., Xu, K., Yang, Y., Bao, H., Xu, W., Lau, R.W.: Location-aware single image reflection removal. In: Proceedings of International Conference on Computer Vision (2021)
Fan, Q., Yang, J., Hua, G., Chen, B., Wipf, D.: A generic deep architecture for single image reflection removal and image smoothing. In: Proceedings of International Conference on Computer Vision (2017)
Han, B.J., Sim, J.Y.: Zero-shot learning for reflection removal of single 360-degree image. In: Proceedings of European Conference on Computer Vision (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of International Conference on Computer Vision (2015)
Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of Advances in Neural Information Processing Systems (2017)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Proceedings of Advances in Neural Information Processing Systems (2020)
Ho, J., Salimans, T.: Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)
Hong, Y., Chang, Y., Liang, J., Ma, L., Huang, T., Shi, B.: Light flickering guided reflection removal. Int. J. Comput. Vision (2024)
Hong, Y., Lyu, Y., Li, S., Cao, G., Shi, B.: Reflection removal with NIR and RGB image feature fusion. IEEE Trans. Multimedia 25, 7101–7112 (2022)
Hong, Y., Lyu, Y., Li, S., Shi, B.: Near-infrared image guided reflection removal. In: Proceedings of International Conference on Multimedia and Expo (2020)
Hong, Y., Zheng, Q., Zhao, L., Jiang, X., Kot, A.C., Shi, B.: Panoramic image reflection removal. In: Proceedings of Computer Vision and Pattern Recognition (2021)
Hong, Y., Zheng, Q., Zhao, L., Jiang, X., Kot, A.C., Shi, B.: PAR\(^2\)Net: end-to-end panoramic image reflection removal. IEEE Trans. Pattern Anal. Mach. Intell. 45(10), 12192–12205 (2023)
Hu, Q., Guo, X.: Trash or treasure? An interactive dual-stream strategy for single image reflection separation. In: Proceedings of Advances in Neural Information Processing Systems (2021)
Hu, Q., Guo, X.: Single image reflection separation via component synergy. In: Proceedings of International Conference on Computer Vision (2023)
Huynh-Thu, Q., Ghanbari, M.: Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 44(13), 800–801 (2008)
Kim, S., Huo, Y., Yoon, S.E.: Single image reflection removal with physically-based training images. In: Proceedings of Computer Vision and Pattern Recognition (2020)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kong, N., Tai, Y.W., Shin, S.Y.: A physically-based approach to reflection separation. In: Proceedings of Computer Vision and Pattern Recognition (2012)
Lei, C., Chen, Q.: Robust reflection removal with reflection-free flash-only cues. In: Proceedings of Computer Vision and Pattern Recognition (2021)
Lei, C., Huang, X., Zhang, M., Yan, Q., Sun, W., Chen, Q.: Polarized reflection removal with perfect alignment in the wild. In: Proceedings of Computer Vision and Pattern Recognition (2020)
Lei, C., Jiang, X., Chen, Q.: Robust reflection removal with flash-only cues in the wild. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Levin, A., Weiss, Y.: User assisted separation of reflections from a single image using a sparsity prior. IEEE Trans. Pattern Anal. Mach. Intell. 29(9), 1647–1654 (2007)
Li, C., Yang, Y., He, K., Lin, S., Hopcroft, J.E.: Single image reflection removal through cascaded refinement. In: Proceedings of Computer Vision and Pattern Recognition (2020)
Li, Y., Brown, M.S.: Exploiting reflection change for automatic reflection removal. In: Proceedings of International Conference on Computer Vision (2013)
Li, Y., Brown, M.S.: Single image layer separation using relative smoothness. In: Proceedings of Computer Vision and Pattern Recognition (2014)
Liu, Y.L., Lai, W.S., Yang, M.H., Chuang, Y.Y., Huang, J.B.: Learning to see through obstructions. In: Proceedings of Computer Vision and Pattern Recognition (2020)
Liu, Y.L., Lai, W.S., Yang, M.H., Chuang, Y.Y., Huang, J.B.: Learning to see through obstructions with layered decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 8387–8402 (2021)
Luo, J., et al.: 3D-SPS: single-stage 3D visual grounding via referred point progressive selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16454–16463 (2022)
Lyu, Y., Cui, Z., Li, S., Pollefeys, M., Shi, B.: Reflection separation using a pair of unpolarized and polarized images. In: Proceedings of Advances in Neural Information Processing Systems (2019)
Lyu, Y., Cui, Z., Li, S., Pollefeys, M., Shi, B.: Physics-guided reflection separation from a pair of unpolarized and polarized images. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 2151–2165 (2022)
Ma, D., Wan, R., Shi, B., Kot, A.C., Duan, L.Y.: Learning to jointly generate and separate reflections. In: Proceedings of International Conference on Computer Vision (2019)
Meng, C., et al.: SDEdit: guided image synthesis and editing with stochastic differential equations. In: Proceedings of International Conference on Learning Representations (2021)
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 20(3), 209–212 (2013)
Mou, C., et al.: T2i-adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453 (2023)
Nam, S., Brubaker, M.A., Brown, M.S.: Neural image representations for multi-image fusion and layer separation. In: Proceedings of European Conference on Computer Vision (2022)
Nayar, S.K., Fang, X.S., Boult, T.: Separation of reflection components using color and polarization. Int. J. Comput. Vision 21(3), 163–186 (1997)
Park, J., Kim, H., Park, E., Sim, J.Y.: Fully-automatic reflection removal for 360-degree images. In: Proceedings of Winter Conference on Applications of Computer Vision (2024)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Proceedings of Advances in Neural Information Processing Systems (2019)
Qiu, J., Jiang, P.T., Zhu, Y., Yin, Z.X., Cheng, M.M., Ren, B.: Looking through the glass: neural surface reconstruction against high specular reflections. In: Proceedings of Computer Vision and Pattern Recognition (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of International Conference on Machine Learning. PMLR (2021)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of Computer Vision and Pattern Recognition (2022)
Schechner, Y.Y., Kiryati, N., Basri, R.: Separation of transparent layers using focus. Int. J. Comput. Vision 39, 25–39 (2000)
Shih, Y., Krishnan, D., Durand, F., Freeman, W.T.: Reflection removal using ghosting cues. In: Proceedings of Computer Vision and Pattern Recognition (2015)
Simon, C., Kyu Park, I.: Reflection removal for in-vehicle black box videos. In: Proceedings of Computer Vision and Pattern Recognition (2015)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
Sun, H., et al.: CoSeR: bridging image and language for cognitive super-resolution. arXiv preprint arXiv:2311.16512 (2023)
Sun, J., Weng, S., Chang, Z., Li, S., Shi, B.: UniCoRN: a unified conditional image repainting network. In: Proceedings of Computer Vision and Pattern Recognition (2022)
Tang, J., Zhong, H., Weng, S., Shi, B.: LuminAIRe: illumination-aware conditional image repainting for lighting-realistic generation. In: Proceedings of Advances in Neural Information Processing Systems (2023)
Wan, R., Shi, B., Duan, L.Y., Tan, A.H., Gao, W., Kot, A.C.: Region-aware reflection removal with unified content and gradient priors. IEEE Trans. Image Process. 27(6), 2927–2941 (2018)
Wan, R., Shi, B., Duan, L.Y., Tan, A.H., Kot, A.C.: Benchmarking single-image reflection removal algorithms. In: Proceedings of International Conference on Computer Vision (2017)
Wan, R., Shi, B., Duan, L.Y., Tan, A.H., Kot, A.C.: CRRN: multi-scale guided concurrent reflection removal network. In: Proceedings of Computer Vision and Pattern Recognition (2018)
Wan, R., Shi, B., Li, H., Duan, L.Y., Kot, A.C.: Face image reflection removal. Int. J. Comput. Vision 129, 385–399 (2021)
Wan, R., Shi, B., Li, H., Duan, L.Y., Tan, A.H., Kot, A.C.: CoRRN: cooperative reflection removal network. IEEE Trans. Pattern Anal. Mach. Intell. 42(12), 2969–2982 (2019)
Wan, R., Shi, B., Li, H., Hong, Y., Duan, L.Y., Kot, A.C.: Benchmarking single-image reflection removal algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1424–1441 (2022)
Wang, Z., et al.: CRIS: CLIP-driven referring image segmentation. In: Proceedings of Computer Vision and Pattern Recognition (2022)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers (2003)
Wei, K., Yang, J., Fu, Y., Wipf, D., Huang, H.: Single image reflection removal exploiting misaligned training data and network enhancements. In: Proceedings of Computer Vision and Pattern Recognition (2019)
Wen, Q., Tan, Y., Qin, J., Liu, W., Han, G., He, S.: Single image reflection removal beyond linearity. In: Proceedings of Computer Vision and Pattern Recognition (2019)
Weng, S., Li, W., Li, D., Jin, H., Shi, B.: MISC: multi-condition injection and spatially-adaptive compositing for conditional person image synthesis. In: Proceedings of Computer Vision and Pattern Recognition (2020)
Weng, S., Shi, B.: Conditional image repainting. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Weng, S., Wu, H., Chang, Z., Tang, J., Li, S., Shi, B.: L-CoDe: language-based colorization using color-object decoupled conditions. In: Proceedings of the AAAI Conference on Artificial Intelligence (2022)
Yang, J., Gong, D., Liu, L., Shi, Q.: Seeing deeply and bidirectionally: a deep learning approach for single image reflection removal. In: Proceedings of European Conference on Computer Vision (2018)
Yang, Y., Ma, W., Zheng, Y., Cai, J.F., Xu, W.: Fast single image reflection suppression via convex optimization. In: Proceedings of Computer Vision and Pattern Recognition (2019)
Yang, Z., Wang, J., Tang, Y., Chen, K., Zhao, H., Torr, P.H.: LAVT: language-aware vision transformer for referring image segmentation. In: Proceedings of Computer Vision and Pattern Recognition (2022)
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of International Conference on Computer Vision (2023)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of Computer Vision and Pattern Recognition (2018)
Zhang, X., Ng, R., Chen, Q.: Single image reflection separation with perceptual losses. In: Proceedings of Computer Vision and Pattern Recognition (2018)
Zhang, Y.N., Shen, L., Li, Q.: Content and gradient model-driven deep network for single image reflection removal. In: Proceedings of ACM International Conference on Multimedia (2022)
Zhao, S., et al.: Uni-controlnet: all-in-one control to text-to-image diffusion models. In: Proceedings of Advances in Neural Information Processing Systems (2024)
Zheng, Q., et al.: What does plate glass reveal about camera calibration? In: Proceedings of Computer Vision and Pattern Recognition (2020)
Zheng, Q., Shi, B., Chen, J., Jiang, X., Duan, L.Y., Kot, A.C.: Single image reflection removal with absorption effect. In: Proceedings of Computer Vision and Pattern Recognition (2021)
Zhong, H., Hong, Y., Weng, S., Liang, J., Shi, B.: Language-guided image reflection separation. In: Proceedings of Computer Vision and Pattern Recognition (2024)
Acknowledgement
This work is supported by National Natural Science Foundation of China under Grant No. 62136001, 62088102.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hong, Y., Zhong, H., Weng, S., Liang, J., Shi, B. (2025). L-DiffER: Single Image Reflection Removal with Language-Based Diffusion Model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15078. Springer, Cham. https://doi.org/10.1007/978-3-031-72661-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-72661-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72660-6
Online ISBN: 978-3-031-72661-3
eBook Packages: Computer ScienceComputer Science (R0)