Skip to main content

Advertisement

RamIR: Reasoning and action prompting with Mamba for all-in-one image restoration

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

All-in-one image restoration aims to recover various degraded images using a unified model. To adaptively reconstruct high-quality images, recent prevalent CNN and Transformer based models incorporate learnable prompts to dynamically acquire degradation-specific knowledge for different degraded images, achieving state-of-the-art restoration performance. However, existing methods exhibit limitations, including high computational burden and inadequate modeling of long-range dependencies. To address these issues, we propose a reasoning and action prompt-driven Mamba-based image restoration model, namely RamIR. Specifically, RamIR employs the Mamba block for long-range dependencies modeling with linear computational complexity relative to the feature map size. Inspired by Chain-of-Thought (CoT) prompting, we integrate Reasoning and Action (ReAct) prompts within the Mamba block. Hence, we utilize the capability of pretrained vision language (PVL) models to generate textual reasoning prompts describing the type and severity of degradations. Simultaneously, another output from PVL acts as action prompt representing the clean image caption. These prompts, employed in a CoT manner, enhance the network’s sensitivity to degradation and elicit targeted recovery actions tailored to different reasoning prompts. Additionally, we explore the seamless interaction between Mamba blocks and prompts, introducing a novel prompt-driven module (PDM) to facilitate prompt utilization. Extensive experimental results demonstrate the superior performance of RamIR, highlighting its advantages in terms of input scaling efficiency over existing benchmark models for all-in-one image restoration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

All data cited in this manuscript have been appropriately referenced. Detailed information regarding the cited datasets can be found in the references section with their respective citations. As such, all cited data are publicly available and accessible.

References

  1. Chen L, Chu X, Zhang X, Sun J (2022) Simple baselines for image restoration. In: Computer Vision - ECCV 2022: 17th European Conference. Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII. Springer, Berlin, Heidelberg, pp 17–33

  2. Conde MV, Geigle G, Timofte R (2024) High-quality image restoration following human instructions. In: Proceedings of the european conference on computer vision (ECCV)

  3. Li B, Liu X, Hu P, Wu Z, Lv J, Peng X (2022) All-in-one image restoration for unknown corruption. In: IEEE Conference on computer vision and pattern recognition. New Orleans, LA

  4. Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H (2022) Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  5. Potlapalli V, Zamir SW, Khan S, Khan F (2023) Promptir: Prompting for all-in-one image restoration. In: Thirty-seventh conference on neural information processing systems

  6. Zhang J, Huang J, Yao M, Yang Z, Yu H, Zhou M, Zhao F (2023) Ingredient-oriented multi-degradation learning for image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5825–5835

  7. Gu A, Goel K, Ré C (2022) Efficiently modeling long sequences with structured state spaces. In: The International conference on learning representations (ICLR)

  8. Gu A, Dao T (2023) Mamba: Linear-time sequence modeling with selective state spaces. arXiv:2312.00752

  9. Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv:2401.09417

  10. Liu Y, Tian Y, Zhao Y, Yu H, Xie L, Wang Y, Ye Q, Liu Y (2024) Vmamba: Visual state space model. arXiv:2401.10166

  11. Luo Z, Gustafsson FK, Zhao Z, Sjölund J, Schön TB (2023) Controlling vision-language models for universal image restoration. arXiv:2310.01018

  12. Lai X, Tian Z, Chen Y, Li Y, Yuan Y, Liu S, Jia J (2023) Lisa: Reasoning segmentation via large language model. arXiv:2308.00692

  13. Li J, Li D, Xiong C, Hoi S (2022) Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International conference on machine learning

  14. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I (2021) Learning transferable visual models from natural language supervision. arXiv:2103.00020

  15. Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K, Cao Y (2022) React: Synergizing reasoning and acting in language models. arXiv:2210.03629

  16. Ren D, Zuo W, Hu Q, Zhu P, Meng D (2019) Progressive image deraining networks: a better and simpler baseline. In: IEEE Conference on computer vision and pattern recognition

  17. Chen Z, He Z, Lu Z-M (2024) Dea-net: Single image dehazing based on detail-enhanced convolution and content-guided attention. IEEE Trans Image Process 33:1002–1015

    Article  MATH  Google Scholar 

  18. Song Y, He Z, Qian H, Du X (2023) Vision transformers for single image dehazing. IEEE Trans Image Process 32:1927–1941

    Article  MATH  Google Scholar 

  19. Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans Image Process 26(7):3142–3155

    Article  MathSciNet  MATH  Google Scholar 

  20. Tsai F-J, Peng Y-T, Tsai C-C, Lin Y-Y, Lin C-W (2022) Banet: A blur-aware attention network for dynamic scene deblurring. IEEE Trans Image Process 31:6789–6799. https://doi.org/10.1109/TIP.2022.3216216

    Article  MATH  Google Scholar 

  21. Deng R, Gu T (2024) Cu-mamba: Selective state space models with channel learning for image restoration. arXiv:2404.11778

  22. Guo H, Li J, Dai T, Ouyang Z, Ren X, Xia S-T (2024) Mambair: A simple baseline for image restoration with state-space model. In: ECCV

  23. Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: Image restoration using swin transformer. arXiv:2108.10257

  24. Chen X, Wang X, Zhou J, Qiao Y, Dong C (2023) Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 22367–22377

  25. Zhou K, Yang J, Loy CC, Liu Z (2022) Conditional prompt learning for vision-language models. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR)

  26. Zhou K, Yang J, Loy C Chen Liu Z (2022) Learning to prompt for vision-language models. International Journal of Computer Vision (IJCV)

  27. Yang H, Pan L, Yang Y, Liang W (2024) Language-driven all-in-one adverse weather removal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 24902–24912

  28. Luo Z, Gustafsson FK, Zhao Z, Sjölund J, Schön TB (2023) Refusion: enabling large-size realistic image restoration with latent-space diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 1680–1691

  29. Luo Z, Gustafsson FK, Zhao Z, Sjölund J, Schön TB (2023) Image restoration with mean-reverting stochastic differential equations. International conference on machine learning

  30. Liu J, Liu A, Lu X, Welleck S, West P, Bras RL, Choi Y, Hajishirzi H (2022) Generated knowledge prompting for commonsense reasoning. arXiv:2110.08387

  31. Lu Y, Hong Y, Wang Z, Zhou G (2023) Enhancing reasoning capabilities by instruction learning and chain-of-thoughts for implicit discourse relation recognition. In: Conference on empirical methods in natural language processing

  32. Liu Y, Peng X, Du T, Yin J, Liu W, Zhang X (2024) Era-cot: Improving chain-of-thought through entity relationship analysis. arXiv:2403.06932

  33. Zhang Y, Wu Y, Liu Y, Peng X (2024) Cpa-enhancer: Chain-of-thought prompted adaptive enhancer for object detection under unknown degradations. arXiv:2403.11220

  34. Yang F, Yang H, Fu J, Lu H, Guo B (2020) Learning texture transformer network for image super-resolution. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR)

  35. Li B, Ren W, Fu D, Tao D, Feng D, Zeng W, Wang Z (2019) Benchmarking single-image dehazing and beyond. IEEE Trans Image Process 28(1):492–505. https://doi.org/10.1109/TIP.2018.2867951

    Article  MathSciNet  MATH  Google Scholar 

  36. Arbeláez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916. https://doi.org/10.1109/TPAMI.2010.161

    Article  MATH  Google Scholar 

  37. Ma K, Duanmu Z, Wu Q, Wang Z, Yong H, Li H, Zhang L (2017) Waterloo exploration database: new challenges for image quality assessment models. IEEE Trans Image Process 26:1004–1016

    Article  MathSciNet  MATH  Google Scholar 

  38. Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings eighth ieee international conference on computer vision. ICCV 2001, vol. 2, pp 416–4232. https://doi.org/10.1109/ICCV.2001.937655

  39. Huang J-B, Singh A, Ahuja N (2015) Single image super-resolution from transformed self-exemplars. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 5197–5206. https://doi.org/10.1109/CVPR.2015.7299156

  40. Wei C, Wang W, Yang, W, Liu J (2018) Deep retinex decomposition for low-light enhancement. In: British machine vision conference. British Machine Vision Association

  41. Nah S, Hyun Kim T, Mu Lee K (2017) Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  42. Dong Y, Liu Y, Zhang H, Chen S, Qiao Y (2020) FD-GAN: generative adversarial networks with fusion-discriminator for single image dehazing. In: The Thirty-Fourth AAAI conference on artificial intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp 10729–10736

  43. Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H, Shao L (2021) Multi-stage progressive image restoration. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR)

  44. Fan Q, Chen D, Yuan L, Hua G, Yu N, Chen B (2021) A general decoupled learning framework for parameterized image operators. IEEE Trans Pattern Anal Mach Intell 43(1):33–47. https://doi.org/10.1109/TPAMI.2019.2925793

    Article  MATH  Google Scholar 

  45. Chen L, Lu X, Zhang J, Chu X, Chen C (2021) Hinet: Half instance normalization network for image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) Workshops, pp 182–192

  46. Mou C, Wang Q, Zhang J (2022) Deep generalized unfolding networks for image restoration. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  47. Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H, Shao L (2020) Learning enriched features for real image restoration and enhancement. In: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV, pp. 492–511. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-030-58595-2_30

  48. Valanarasu JMJ, Yasarla R, Patel VM (2021) Transweather: Transformer-based restoration of images degraded by adverse weather conditions. 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), 2343–2353

  49. Liu L, Xie L, Zhang X, Yuan S, Chen X, Zhou W, Li H, Tian Q (2022) Tape: Task-agnostic prior embedding for image restoration. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T (eds) Computer Vision - ECCV 2022. Springer, Cham, pp 447–464

  50. Yasarla R, Patel VM (2019) Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8397–8406. https://doi.org/10.1109/CVPR.2019.00860

  51. Ren W, Ma L, Zhang J, Pan J-S, Cao X, Liu W, Yang M-H (2018) Gated fusion network for single image dehazing. 2018 IEEE/CVF Conference on computer vision and pattern recognition, 3253–3261

  52. Li J, Li J, Fang F, Li F, Zhang G (2021) Luminance-aware pyramid network for low-light image enhancement. IEEE Trans Multimedia 23:3153–3165. https://doi.org/10.1109/TMM.2020.3021243

    Article  MATH  Google Scholar 

  53. Chen D, He M, Fan Q, Liao J, Zhang L, Hou D, Yuan L, Hua G (2019) Gated context aggregation network for image dehazing and deraining. In: 2019 IEEE Winter conference on applications of computer vision (WACV), pp 1375–1383. https://doi.org/10.1109/WACV.2019.00151

  54. Kupyn O, Martyniuk T, Wu J, Wang Z (2019) Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In: The IEEE International conference on computer vision (ICCV)

  55. Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, Yang J, Zhou P, Wang Z (2021) Enlightengan: Deep light enhancement without paired supervision. IEEE Trans Image Process 30:2340–2349

    Article  Google Scholar 

  56. Zhang J, Pan J, Ren J, Song Y, Bao L, Lau RWH, Yang M-H (2018) Dynamic scene deblurring using spatially variant recurrent neural networks. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 2521–2529. https://doi.org/10.1109/CVPR.2018.00267

  57. Wu W, Weng J, Zhang P, Wang X, Yang W, Jiang J (2022) Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 5891–5900. https://doi.org/10.1109/CVPR52688.2022.00581

  58. Dong J, Pan J, Yang Z, Tang J (2023) Multi-scale residual low-pass filter network for image deblurring. In: 2023 IEEE/CVF International conference on computer vision (ICCV), pp. 12311–12320. https://doi.org/10.1109/ICCV51070.2023.01134

  59. Wang Y, Liu Z, Liu J, Xu S, Liu S (2023) Low-light image enhancement with illumination-aware gamma correction and complete image modelling network. In: 2023 IEEE/CVF International conference on computer vision (ICCV), pp 13082–13091. https://doi.org/10.1109/ICCV51070.2023.01207

  60. Yang B, Qin L, Liu J, Liu X (2022) Ircnn: An irregular-time-distanced recurrent convolutional neural network for change detection in satellite time series. IEEE Geosci Remote Sens Lett 19:1–5. https://doi.org/10.1109/LGRS.2022.3154894

    Article  MATH  Google Scholar 

  61. Zhang K, Zuo W, Zhang L (2018) Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Trans Image Process 27(9):4608–4622. https://doi.org/10.1109/TIP.2018.2839891

    Article  MathSciNet  MATH  Google Scholar 

  62. Shi Y, Xia B, Jin X, Wang X, Zhao T, Xia X, Xiao X, Yang W (2024) Vmambair: Visual state space model for image restoration. arXiv:2403.11423

  63. Zhen Z, Hu Y, Feng Z (2024) Freqmamba: Viewing mamba from a frequency perspective for image deraining. arXiv:2404.09476

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Wu.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. This manuscript is original, has not been published before, and is not currently being considered for publication elsewhere.

Ethical and informed consent for data used

This study relied on the openly accessible datasets and cited the corresponding citations. As these datasets are publicly available and devoid of personally identifiable information, individual informed consent was not sought. Our utilization of these datasets aligns with ethical guidelines and legal obligations while honoring the provisions set by the data providers.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Degradation-specific task results

We re-train the model on each single degradation dataset. In Table 6, we compare with degradation-specific methods for deraining and dehazing, and for deblurring and low-light enhancement in image in Table 7. Also, extensive comparisons for image denoising are shown in Table 8. Notably, our RamIR outperforms the PromptIR network by achieving favorable gains of 0.9dB and 1.34dB for deraining and dehazing tasks, respectively. This indicates that the Mamba-based network modeling of long-distance dependencies significantly enhances the recovery process. Furthermore, in the realm of image deblurring, our approach surpasses the CUMamba [21], a Mamba-based network, by achieving a gain of 0.2dB. The incorporation of the ReAct prompt further aids in improving noise removal capabilities.

Table 6 Deraining and Dehazing comparison: We compare with task-specific classical methods on benchmark datasets
Table 7 Quantitative comparison with state-of-the-art methods of deblurring and low-light enhancement
Table 8 Comparison on image denoising with advance methods, we report PSNR on benchmark datasets considering different \(\sigma \) noise levels

Appendix B: Experiments on out-of-distribution degradation

Table 9 Out of distribution dataset for all-in-one image restoration under S3 setting of denoising level \(\sigma =100\)

Thanks to our ReAct prompt learning, RamIR demonstrates promising “unified” image restoration capacity. We report results on BSD68 and Urban100 with noise level \(\sigma = 100\), which are outside of the training dataset distribution, for evaluating degradation unseen tasks. In Table 9, our method demonstrates superior performance on this unseen dataset.

Appendix C: Experiments of Mamba-based restoration network

Table 10 Quantitative comparison for image deraining

The power of the recently advanced state space model, specifically Mamba for image restoration, has been thoroughly explored to address the dilemma of balancing efficient computation with a global effective receptive field. Our method offers a novel perspective by integrating PVL with the Mamba-based restoration network. In Table 10, our approach outperforms the state-of-the-art Mamba-based image deraining network, showcasing immense potential in the field of image restoration.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, A., Wu, Y. & Zhang, Y. RamIR: Reasoning and action prompting with Mamba for all-in-one image restoration. Appl Intell 55, 258 (2025). https://doi.org/10.1007/s10489-024-06226-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-06226-y

Keywords