Abstract
All-in-one image restoration aims to recover various degraded images using a unified model. To adaptively reconstruct high-quality images, recent prevalent CNN and Transformer based models incorporate learnable prompts to dynamically acquire degradation-specific knowledge for different degraded images, achieving state-of-the-art restoration performance. However, existing methods exhibit limitations, including high computational burden and inadequate modeling of long-range dependencies. To address these issues, we propose a reasoning and action prompt-driven Mamba-based image restoration model, namely RamIR. Specifically, RamIR employs the Mamba block for long-range dependencies modeling with linear computational complexity relative to the feature map size. Inspired by Chain-of-Thought (CoT) prompting, we integrate Reasoning and Action (ReAct) prompts within the Mamba block. Hence, we utilize the capability of pretrained vision language (PVL) models to generate textual reasoning prompts describing the type and severity of degradations. Simultaneously, another output from PVL acts as action prompt representing the clean image caption. These prompts, employed in a CoT manner, enhance the network’s sensitivity to degradation and elicit targeted recovery actions tailored to different reasoning prompts. Additionally, we explore the seamless interaction between Mamba blocks and prompts, introducing a novel prompt-driven module (PDM) to facilitate prompt utilization. Extensive experimental results demonstrate the superior performance of RamIR, highlighting its advantages in terms of input scaling efficiency over existing benchmark models for all-in-one image restoration.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
All data cited in this manuscript have been appropriately referenced. Detailed information regarding the cited datasets can be found in the references section with their respective citations. As such, all cited data are publicly available and accessible.
References
Chen L, Chu X, Zhang X, Sun J (2022) Simple baselines for image restoration. In: Computer Vision - ECCV 2022: 17th European Conference. Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII. Springer, Berlin, Heidelberg, pp 17–33
Conde MV, Geigle G, Timofte R (2024) High-quality image restoration following human instructions. In: Proceedings of the european conference on computer vision (ECCV)
Li B, Liu X, Hu P, Wu Z, Lv J, Peng X (2022) All-in-one image restoration for unknown corruption. In: IEEE Conference on computer vision and pattern recognition. New Orleans, LA
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H (2022) Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Potlapalli V, Zamir SW, Khan S, Khan F (2023) Promptir: Prompting for all-in-one image restoration. In: Thirty-seventh conference on neural information processing systems
Zhang J, Huang J, Yao M, Yang Z, Yu H, Zhou M, Zhao F (2023) Ingredient-oriented multi-degradation learning for image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5825–5835
Gu A, Goel K, Ré C (2022) Efficiently modeling long sequences with structured state spaces. In: The International conference on learning representations (ICLR)
Gu A, Dao T (2023) Mamba: Linear-time sequence modeling with selective state spaces. arXiv:2312.00752
Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv:2401.09417
Liu Y, Tian Y, Zhao Y, Yu H, Xie L, Wang Y, Ye Q, Liu Y (2024) Vmamba: Visual state space model. arXiv:2401.10166
Luo Z, Gustafsson FK, Zhao Z, Sjölund J, Schön TB (2023) Controlling vision-language models for universal image restoration. arXiv:2310.01018
Lai X, Tian Z, Chen Y, Li Y, Yuan Y, Liu S, Jia J (2023) Lisa: Reasoning segmentation via large language model. arXiv:2308.00692
Li J, Li D, Xiong C, Hoi S (2022) Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International conference on machine learning
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I (2021) Learning transferable visual models from natural language supervision. arXiv:2103.00020
Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K, Cao Y (2022) React: Synergizing reasoning and acting in language models. arXiv:2210.03629
Ren D, Zuo W, Hu Q, Zhu P, Meng D (2019) Progressive image deraining networks: a better and simpler baseline. In: IEEE Conference on computer vision and pattern recognition
Chen Z, He Z, Lu Z-M (2024) Dea-net: Single image dehazing based on detail-enhanced convolution and content-guided attention. IEEE Trans Image Process 33:1002–1015
Song Y, He Z, Qian H, Du X (2023) Vision transformers for single image dehazing. IEEE Trans Image Process 32:1927–1941
Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans Image Process 26(7):3142–3155
Tsai F-J, Peng Y-T, Tsai C-C, Lin Y-Y, Lin C-W (2022) Banet: A blur-aware attention network for dynamic scene deblurring. IEEE Trans Image Process 31:6789–6799. https://doi.org/10.1109/TIP.2022.3216216
Deng R, Gu T (2024) Cu-mamba: Selective state space models with channel learning for image restoration. arXiv:2404.11778
Guo H, Li J, Dai T, Ouyang Z, Ren X, Xia S-T (2024) Mambair: A simple baseline for image restoration with state-space model. In: ECCV
Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: Image restoration using swin transformer. arXiv:2108.10257
Chen X, Wang X, Zhou J, Qiao Y, Dong C (2023) Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 22367–22377
Zhou K, Yang J, Loy CC, Liu Z (2022) Conditional prompt learning for vision-language models. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR)
Zhou K, Yang J, Loy C Chen Liu Z (2022) Learning to prompt for vision-language models. International Journal of Computer Vision (IJCV)
Yang H, Pan L, Yang Y, Liang W (2024) Language-driven all-in-one adverse weather removal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 24902–24912
Luo Z, Gustafsson FK, Zhao Z, Sjölund J, Schön TB (2023) Refusion: enabling large-size realistic image restoration with latent-space diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 1680–1691
Luo Z, Gustafsson FK, Zhao Z, Sjölund J, Schön TB (2023) Image restoration with mean-reverting stochastic differential equations. International conference on machine learning
Liu J, Liu A, Lu X, Welleck S, West P, Bras RL, Choi Y, Hajishirzi H (2022) Generated knowledge prompting for commonsense reasoning. arXiv:2110.08387
Lu Y, Hong Y, Wang Z, Zhou G (2023) Enhancing reasoning capabilities by instruction learning and chain-of-thoughts for implicit discourse relation recognition. In: Conference on empirical methods in natural language processing
Liu Y, Peng X, Du T, Yin J, Liu W, Zhang X (2024) Era-cot: Improving chain-of-thought through entity relationship analysis. arXiv:2403.06932
Zhang Y, Wu Y, Liu Y, Peng X (2024) Cpa-enhancer: Chain-of-thought prompted adaptive enhancer for object detection under unknown degradations. arXiv:2403.11220
Yang F, Yang H, Fu J, Lu H, Guo B (2020) Learning texture transformer network for image super-resolution. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR)
Li B, Ren W, Fu D, Tao D, Feng D, Zeng W, Wang Z (2019) Benchmarking single-image dehazing and beyond. IEEE Trans Image Process 28(1):492–505. https://doi.org/10.1109/TIP.2018.2867951
Arbeláez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916. https://doi.org/10.1109/TPAMI.2010.161
Ma K, Duanmu Z, Wu Q, Wang Z, Yong H, Li H, Zhang L (2017) Waterloo exploration database: new challenges for image quality assessment models. IEEE Trans Image Process 26:1004–1016
Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings eighth ieee international conference on computer vision. ICCV 2001, vol. 2, pp 416–4232. https://doi.org/10.1109/ICCV.2001.937655
Huang J-B, Singh A, Ahuja N (2015) Single image super-resolution from transformed self-exemplars. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 5197–5206. https://doi.org/10.1109/CVPR.2015.7299156
Wei C, Wang W, Yang, W, Liu J (2018) Deep retinex decomposition for low-light enhancement. In: British machine vision conference. British Machine Vision Association
Nah S, Hyun Kim T, Mu Lee K (2017) Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Dong Y, Liu Y, Zhang H, Chen S, Qiao Y (2020) FD-GAN: generative adversarial networks with fusion-discriminator for single image dehazing. In: The Thirty-Fourth AAAI conference on artificial intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp 10729–10736
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H, Shao L (2021) Multi-stage progressive image restoration. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR)
Fan Q, Chen D, Yuan L, Hua G, Yu N, Chen B (2021) A general decoupled learning framework for parameterized image operators. IEEE Trans Pattern Anal Mach Intell 43(1):33–47. https://doi.org/10.1109/TPAMI.2019.2925793
Chen L, Lu X, Zhang J, Chu X, Chen C (2021) Hinet: Half instance normalization network for image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) Workshops, pp 182–192
Mou C, Wang Q, Zhang J (2022) Deep generalized unfolding networks for image restoration. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H, Shao L (2020) Learning enriched features for real image restoration and enhancement. In: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV, pp. 492–511. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-030-58595-2_30
Valanarasu JMJ, Yasarla R, Patel VM (2021) Transweather: Transformer-based restoration of images degraded by adverse weather conditions. 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), 2343–2353
Liu L, Xie L, Zhang X, Yuan S, Chen X, Zhou W, Li H, Tian Q (2022) Tape: Task-agnostic prior embedding for image restoration. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T (eds) Computer Vision - ECCV 2022. Springer, Cham, pp 447–464
Yasarla R, Patel VM (2019) Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8397–8406. https://doi.org/10.1109/CVPR.2019.00860
Ren W, Ma L, Zhang J, Pan J-S, Cao X, Liu W, Yang M-H (2018) Gated fusion network for single image dehazing. 2018 IEEE/CVF Conference on computer vision and pattern recognition, 3253–3261
Li J, Li J, Fang F, Li F, Zhang G (2021) Luminance-aware pyramid network for low-light image enhancement. IEEE Trans Multimedia 23:3153–3165. https://doi.org/10.1109/TMM.2020.3021243
Chen D, He M, Fan Q, Liao J, Zhang L, Hou D, Yuan L, Hua G (2019) Gated context aggregation network for image dehazing and deraining. In: 2019 IEEE Winter conference on applications of computer vision (WACV), pp 1375–1383. https://doi.org/10.1109/WACV.2019.00151
Kupyn O, Martyniuk T, Wu J, Wang Z (2019) Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In: The IEEE International conference on computer vision (ICCV)
Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, Yang J, Zhou P, Wang Z (2021) Enlightengan: Deep light enhancement without paired supervision. IEEE Trans Image Process 30:2340–2349
Zhang J, Pan J, Ren J, Song Y, Bao L, Lau RWH, Yang M-H (2018) Dynamic scene deblurring using spatially variant recurrent neural networks. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 2521–2529. https://doi.org/10.1109/CVPR.2018.00267
Wu W, Weng J, Zhang P, Wang X, Yang W, Jiang J (2022) Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 5891–5900. https://doi.org/10.1109/CVPR52688.2022.00581
Dong J, Pan J, Yang Z, Tang J (2023) Multi-scale residual low-pass filter network for image deblurring. In: 2023 IEEE/CVF International conference on computer vision (ICCV), pp. 12311–12320. https://doi.org/10.1109/ICCV51070.2023.01134
Wang Y, Liu Z, Liu J, Xu S, Liu S (2023) Low-light image enhancement with illumination-aware gamma correction and complete image modelling network. In: 2023 IEEE/CVF International conference on computer vision (ICCV), pp 13082–13091. https://doi.org/10.1109/ICCV51070.2023.01207
Yang B, Qin L, Liu J, Liu X (2022) Ircnn: An irregular-time-distanced recurrent convolutional neural network for change detection in satellite time series. IEEE Geosci Remote Sens Lett 19:1–5. https://doi.org/10.1109/LGRS.2022.3154894
Zhang K, Zuo W, Zhang L (2018) Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Trans Image Process 27(9):4608–4622. https://doi.org/10.1109/TIP.2018.2839891
Shi Y, Xia B, Jin X, Wang X, Zhao T, Xia X, Xiao X, Yang W (2024) Vmambair: Visual state space model for image restoration. arXiv:2403.11423
Zhen Z, Hu Y, Feng Z (2024) Freqmamba: Viewing mamba from a frequency perspective for image deraining. arXiv:2404.09476
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. This manuscript is original, has not been published before, and is not currently being considered for publication elsewhere.
Ethical and informed consent for data used
This study relied on the openly accessible datasets and cited the corresponding citations. As these datasets are publicly available and devoid of personally identifiable information, individual informed consent was not sought. Our utilization of these datasets aligns with ethical guidelines and legal obligations while honoring the provisions set by the data providers.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Degradation-specific task results
We re-train the model on each single degradation dataset. In Table 6, we compare with degradation-specific methods for deraining and dehazing, and for deblurring and low-light enhancement in image in Table 7. Also, extensive comparisons for image denoising are shown in Table 8. Notably, our RamIR outperforms the PromptIR network by achieving favorable gains of 0.9dB and 1.34dB for deraining and dehazing tasks, respectively. This indicates that the Mamba-based network modeling of long-distance dependencies significantly enhances the recovery process. Furthermore, in the realm of image deblurring, our approach surpasses the CUMamba [21], a Mamba-based network, by achieving a gain of 0.2dB. The incorporation of the ReAct prompt further aids in improving noise removal capabilities.
Appendix B: Experiments on out-of-distribution degradation
Thanks to our ReAct prompt learning, RamIR demonstrates promising “unified” image restoration capacity. We report results on BSD68 and Urban100 with noise level \(\sigma = 100\), which are outside of the training dataset distribution, for evaluating degradation unseen tasks. In Table 9, our method demonstrates superior performance on this unseen dataset.
Appendix C: Experiments of Mamba-based restoration network
The power of the recently advanced state space model, specifically Mamba for image restoration, has been thoroughly explored to address the dilemma of balancing efficient computation with a global effective receptive field. Our method offers a novel perspective by integrating PVL with the Mamba-based restoration network. In Table 10, our approach outperforms the state-of-the-art Mamba-based image deraining network, showcasing immense potential in the field of image restoration.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, A., Wu, Y. & Zhang, Y. RamIR: Reasoning and action prompting with Mamba for all-in-one image restoration. Appl Intell 55, 258 (2025). https://doi.org/10.1007/s10489-024-06226-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06226-y