Abstract
This paper addresses the limitations of adverse weather image restoration approaches trained on synthetic data when applied to real-world scenarios. We formulate a semi-supervised learning framework employing vision-language models to enhance restoration performance across diverse adverse weather conditions in real-world settings. Our approach involves assessing image clearness and providing semantics using vision-language models on real data, serving as supervision signals for training restoration models. For clearness enhancement, we use real-world data, utilizing a dual-step strategy with pseudo-labels assessed by vision-language models and weather prompt learning. For semantic enhancement, we integrate real-world data by adjusting weather conditions in vision-language model descriptions while preserving semantic meaning. Additionally, we introduce an effective training strategy to bootstrap restoration performance. Our approach achieves superior results in real-world adverse weather image restoration, demonstrated through qualitative and quantitative comparisons with state-of-the-art works.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achiam, J., et al.: GPT-4 technical report (2023). arXiv preprint arXiv:2303.08774
Cai, B., Xu, X., Jia, K., Qing, C., Tao, D.: DehazeNet: an end-to-end system for single image haze removal. TIP 25(11), 5187–5198 (2016)
Chen, W.T., et al.: All snow removed: Single image desnowing algorithm using hierarchical dual-tree complex wavelet representation and contradict channel loss. In: ICCV (2021)
Chen, W.T., Huang, Z.K., Tsai, C.C., Yang, H.H., Ding, J.J., Kuo, S.Y.: Learning multiple adverse weather removal via two-stage knowledge learning and multi-contrastive regularization: toward a unified model. In: CVPR (2022)
Chen, Z., et al.: InternVL: scaling up vision foundation models and aligning for generic visual-linguistic tasks. In: CVPR (2024)
Deng, Z., et al.: Deep multi-model fusion for single-image dehazing. In: ICCV (2019)
Fu, X., Huang, J., Zeng, D., Huang, Y., Ding, X., Paisley, J.: Removing rain from single images via a deep detail network. In: CVPR (2017)
He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. TPAMI 33(12), 2341–2353 (2010)
Hu, X., Fu, C.W., Zhu, L., Heng, P.A.: Depth-attentional features for single-image rain removal. In: CVPR (2019)
Hu, X., Zhu, L., Wang, T., Fu, C.W., Heng, P.A.: Single-image real-time rain removal based on depth-guided non-local features. TIP 30, 1759–1770 (2021)
Huang, S., Wang, K., Liu, H., Chen, J., Li, Y.: Contrastive semi-supervised learning for underwater image restoration via reliable bank. In: CVPR (2023)
Jiang, K., et al.: Multi-scale progressive fusion network for single image deraining. In: CVPR (2020)
Jiang, Y., Zhang, Z., Xue, T., Gu, J.: AutoDIR: Automatic all-in-one image restoration with latent diffusion (2023). arXiv preprint arXiv:2310.10123
Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: MUSIQ: multi-scale image quality transformer. In: ICCV (2021)
Li, B., et al.: Benchmarking single-image dehazing and beyond. TIP 28(1), 492–505 (2018)
Li, R., Cheong, L.F., Tan, R.T.: Heavy rain image restoration: integrating physics model and conditional adversarial learning. In: CVPR (2019)
Li, R., Tan, R.T., Cheong, L.F.: All in one bad weather removal using architectural search. In: CVPR (2020)
Liang, Z., Li, C., Zhou, S., Feng, R., Loy, C.C.: Iterative prompt learning for unsupervised backlit image enhancement. In: ICCV (2023)
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: CVPRW (2017)
Liu, H., Li, C., Li, Y., Lee, Y.J.: Improved baselines with visual instruction tuning. In: CVPR (2024)
Liu, H., et al.: LLaVA-NeXT: Improved reasoning, OCR, and world knowledge (2024). https://llava-vl.github.io/blog/2024-01-30-llava-next/
Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. In: NeurIPS (2023)
Liu, Y., Yue, Z., Pan, J., Su, Z.: Unpaired learning for deep image deraining with rain direction regularizer. In: ICCV (2021)
Liu, Y.F., Jaw, D.W., Huang, S.C., Hwang, J.N.: DesnowNet: context-aware deep network for snow removal. TIP 27(6), 3064–3073 (2018)
Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Controlling vision-language models for universal image restoration. In: ICLR (2024)
Mann, B., et al.: Language models are few-shot learners. In: NeurIPS (2020)
Özdenizci, O., Legenstein, R.: Restoring vision in adverse weather conditions with patch-based denoising diffusion models. TPAMI 45(8), 10346–10357 (2023)
Patil, P.W., Gupta, S., Rana, S., Venkatesh, S., Murala, S.: Multi-weather image restoration via domain translation. In: ICCV (2023)
Potlapalli, V., Zamir, S.W., Khan, S., Khan, F.S.: PromptIR: prompting for all-in-one blind image restoration. In: NeruIPS (2023)
Qian, R., Tan, R.T., Yang, W., Su, J., Liu, J.: Attentive generative adversarial network for raindrop removal from a single image. In: CVPR (2018)
Qin, X., Wang, Z., Bai, Y., Xie, X., Jia, H.: FFA-Net: feature fusion attention network for single image dehazing. In: AAAI (2020)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
Richardson, E., Goldberg, K., Alaluf, Y., Cohen-Or, D.: ConceptLab: Creative generation using diffusion prior constraints (2023). arXiv preprint arXiv:2308.02669
Song, Y., He, Z., Qian, H., Du, X.: Vision transformers for single image dehazing. TIP 32, 1927–1941 (2023)
Sun, Q., et al.: Generative multimodal models are in-context learners. In: CVPR (2024)
Talebi, H., Milanfar, P.: NIMA: Neural image assessment. TIP 27(8), 3998–4011 (2018)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: NeruIPS (2017)
Timofte, R., Agustsson, E., Van Gool, L., Yang, M.H., Zhang, L.: NTIRE 2017 challenge on single image super-resolution: methods and results. In: CVPRW (2017)
Touvron, H., et al.: LLaMA: Open and efficient foundation language models (2023). arXiv preprint arXiv:2302.13971
Valanarasu, J.M.J., Yasarla, R., Patel, V.M.: TransWeather: transformer-based restoration of images degraded by adverse weather conditions. In: CVPR (2022)
Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. In: AAAI (2023)
Wang, T., Yang, X., Xu, K., Chen, S., Zhang, Q., Lau, R.W.: Spatial attentive single-image deraining with a high quality real rain dataset. In: CVPR (2019)
Wang, X., Xie, L., Yu, K., Chan, K.C., Loy, C.C., Dong, C.: BasicSR: Open source image and video restoration toolbox (2022). https://github.com/XPixelGroup/BasicSR
Wang, Y., Ma, C., Liu, J.: SmartAssign: learning a smart knowledge assignment strategy for deraining and desnowing. In: CVPR (2023)
Wei, W., Meng, D., Zhao, Q., Xu, Z., Wu, Y.: Semi-supervised transfer learning for image rain removal. In: CVPR (2019)
Wu, H., et al.: Q-bench: a benchmark for general-purpose foundation models on low-level vision. In: ICLR (2024)
Wu, H., et al.: Q-align: teaching LMMs for visual scoring via discrete text-defined levels. In: ICML (2024)
Xiao, J., Fu, X., Liu, A., Wu, F., Zha, Z.J.: Image de-raining transformer. TPAMI 45(11), 12978–12995 (2022)
Xu, J., et al.: Video dehazing via a multi-range temporal alignment network with physical prior. In: CVPR (2023)
Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth anything: unleashing the power of large-scale unlabeled data. In: CVPR (2024)
Yang, W., Tan, R.T., Feng, J., Liu, J., Guo, Z., Yan, S.: Deep joint rain detection and removal from a single image. In: CVPR (2017)
Ye, Q., et al.: mPLUG-Owl2: revolutionizing multi-modal large language model with modality collaboration. In: CVPR (2024)
Ye, T., et al.: Adverse weather removal with codebook priors. In: ICCV (2023)
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: efficient transformer for high-resolution image restoration. In: CVPR (2022)
Zhang, H., et al.: WeatherStream: light transport automation of single image deweathering. In: CVPR (2023)
Zhang, W., Zhai, G., Wei, Y., Yang, X., Ma, K.: Blind image quality assessment via vision-language correspondence: a multitask learning perspective. In: CVPR (2023)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. IJCV 130(9), 2337–2348 (2022)
Zhu, L., et al.: Learning gated non-local residual for single-image rain streak removal. TCSVT 31(6), 2147–2159 (2020)
Zhu, Y., et al.: Learning weather-general and weather-specific features for image restoration under multiple adverse weather conditions. In: CVPR (2023)
Acknowledgements
The work was supported by the National Key R&D Program of China (Grant No. 2022ZD0160100), the Research Grants Council of the Hong Kong Special Administrative Region, China (Grant No. 14201620), and the Hong Kong Innovation and Technology Fund (Grant No. MHP/092/22).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, J., Wu, M., Hu, X., Fu, CW., Dou, Q., Heng, PA. (2025). Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15076. Springer, Cham. https://doi.org/10.1007/978-3-031-72649-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-72649-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72648-4
Online ISBN: 978-3-031-72649-1
eBook Packages: Computer ScienceComputer Science (R0)