RamIR: Reasoning and action prompting with Mamba for all-in-one image restoration

Tang, Aiqiang; Wu, Yan; Zhang, Yuwei

doi:10.1007/s10489-024-06226-y

RamIR: Reasoning and action prompting with Mamba for all-in-one image restoration

Published: 03 January 2025

Volume 55, article number 258, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

129 Accesses
Explore all metrics

Abstract

All-in-one image restoration aims to recover various degraded images using a unified model. To adaptively reconstruct high-quality images, recent prevalent CNN and Transformer based models incorporate learnable prompts to dynamically acquire degradation-specific knowledge for different degraded images, achieving state-of-the-art restoration performance. However, existing methods exhibit limitations, including high computational burden and inadequate modeling of long-range dependencies. To address these issues, we propose a reasoning and action prompt-driven Mamba-based image restoration model, namely RamIR. Specifically, RamIR employs the Mamba block for long-range dependencies modeling with linear computational complexity relative to the feature map size. Inspired by Chain-of-Thought (CoT) prompting, we integrate Reasoning and Action (ReAct) prompts within the Mamba block. Hence, we utilize the capability of pretrained vision language (PVL) models to generate textual reasoning prompts describing the type and severity of degradations. Simultaneously, another output from PVL acts as action prompt representing the clean image caption. These prompts, employed in a CoT manner, enhance the network’s sensitivity to degradation and elicit targeted recovery actions tailored to different reasoning prompts. Additionally, we explore the seamless interaction between Mamba blocks and prompts, introducing a novel prompt-driven module (PDM) to facilitate prompt utilization. Extensive experimental results demonstrate the superior performance of RamIR, highlighting its advantages in terms of input scaling efficiency over existing benchmark models for all-in-one image restoration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

InstructIR: High-Quality Image Restoration Following Human Instructions

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion

Prompt-guided and degradation prior supervised transformer for adverse weather image restoration

Article 16 December 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

All data cited in this manuscript have been appropriately referenced. Detailed information regarding the cited datasets can be found in the references section with their respective citations. As such, all cited data are publicly available and accessible.

References

Chen L, Chu X, Zhang X, Sun J (2022) Simple baselines for image restoration. In: Computer Vision - ECCV 2022: 17th European Conference. Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII. Springer, Berlin, Heidelberg, pp 17–33
Conde MV, Geigle G, Timofte R (2024) High-quality image restoration following human instructions. In: Proceedings of the european conference on computer vision (ECCV)
Li B, Liu X, Hu P, Wu Z, Lv J, Peng X (2022) All-in-one image restoration for unknown corruption. In: IEEE Conference on computer vision and pattern recognition. New Orleans, LA
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H (2022) Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Potlapalli V, Zamir SW, Khan S, Khan F (2023) Promptir: Prompting for all-in-one image restoration. In: Thirty-seventh conference on neural information processing systems
Zhang J, Huang J, Yao M, Yang Z, Yu H, Zhou M, Zhao F (2023) Ingredient-oriented multi-degradation learning for image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5825–5835
Gu A, Goel K, Ré C (2022) Efficiently modeling long sequences with structured state spaces. In: The International conference on learning representations (ICLR)
Gu A, Dao T (2023) Mamba: Linear-time sequence modeling with selective state spaces. arXiv:2312.00752
Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv:2401.09417
Liu Y, Tian Y, Zhao Y, Yu H, Xie L, Wang Y, Ye Q, Liu Y (2024) Vmamba: Visual state space model. arXiv:2401.10166
Luo Z, Gustafsson FK, Zhao Z, Sjölund J, Schön TB (2023) Controlling vision-language models for universal image restoration. arXiv:2310.01018
Lai X, Tian Z, Chen Y, Li Y, Yuan Y, Liu S, Jia J (2023) Lisa: Reasoning segmentation via large language model. arXiv:2308.00692
Li J, Li D, Xiong C, Hoi S (2022) Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International conference on machine learning
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I (2021) Learning transferable visual models from natural language supervision. arXiv:2103.00020
Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K, Cao Y (2022) React: Synergizing reasoning and acting in language models. arXiv:2210.03629
Ren D, Zuo W, Hu Q, Zhu P, Meng D (2019) Progressive image deraining networks: a better and simpler baseline. In: IEEE Conference on computer vision and pattern recognition
Chen Z, He Z, Lu Z-M (2024) Dea-net: Single image dehazing based on detail-enhanced convolution and content-guided attention. IEEE Trans Image Process 33:1002–1015
Article MATH Google Scholar
Song Y, He Z, Qian H, Du X (2023) Vision transformers for single image dehazing. IEEE Trans Image Process 32:1927–1941
Article MATH Google Scholar
Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans Image Process 26(7):3142–3155
Article MathSciNet MATH Google Scholar
Tsai F-J, Peng Y-T, Tsai C-C, Lin Y-Y, Lin C-W (2022) Banet: A blur-aware attention network for dynamic scene deblurring. IEEE Trans Image Process 31:6789–6799. https://doi.org/10.1109/TIP.2022.3216216
Article MATH Google Scholar
Deng R, Gu T (2024) Cu-mamba: Selective state space models with channel learning for image restoration. arXiv:2404.11778
Guo H, Li J, Dai T, Ouyang Z, Ren X, Xia S-T (2024) Mambair: A simple baseline for image restoration with state-space model. In: ECCV
Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: Image restoration using swin transformer. arXiv:2108.10257
Chen X, Wang X, Zhou J, Qiao Y, Dong C (2023) Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 22367–22377
Zhou K, Yang J, Loy CC, Liu Z (2022) Conditional prompt learning for vision-language models. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR)
Zhou K, Yang J, Loy C Chen Liu Z (2022) Learning to prompt for vision-language models. International Journal of Computer Vision (IJCV)
Yang H, Pan L, Yang Y, Liang W (2024) Language-driven all-in-one adverse weather removal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 24902–24912
Luo Z, Gustafsson FK, Zhao Z, Sjölund J, Schön TB (2023) Refusion: enabling large-size realistic image restoration with latent-space diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 1680–1691
Luo Z, Gustafsson FK, Zhao Z, Sjölund J, Schön TB (2023) Image restoration with mean-reverting stochastic differential equations. International conference on machine learning
Liu J, Liu A, Lu X, Welleck S, West P, Bras RL, Choi Y, Hajishirzi H (2022) Generated knowledge prompting for commonsense reasoning. arXiv:2110.08387
Lu Y, Hong Y, Wang Z, Zhou G (2023) Enhancing reasoning capabilities by instruction learning and chain-of-thoughts for implicit discourse relation recognition. In: Conference on empirical methods in natural language processing
Liu Y, Peng X, Du T, Yin J, Liu W, Zhang X (2024) Era-cot: Improving chain-of-thought through entity relationship analysis. arXiv:2403.06932
Zhang Y, Wu Y, Liu Y, Peng X (2024) Cpa-enhancer: Chain-of-thought prompted adaptive enhancer for object detection under unknown degradations. arXiv:2403.11220
Yang F, Yang H, Fu J, Lu H, Guo B (2020) Learning texture transformer network for image super-resolution. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR)
Li B, Ren W, Fu D, Tao D, Feng D, Zeng W, Wang Z (2019) Benchmarking single-image dehazing and beyond. IEEE Trans Image Process 28(1):492–505. https://doi.org/10.1109/TIP.2018.2867951
Article MathSciNet MATH Google Scholar
Arbeláez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916. https://doi.org/10.1109/TPAMI.2010.161
Article MATH Google Scholar
Ma K, Duanmu Z, Wu Q, Wang Z, Yong H, Li H, Zhang L (2017) Waterloo exploration database: new challenges for image quality assessment models. IEEE Trans Image Process 26:1004–1016
Article MathSciNet MATH Google Scholar
Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings eighth ieee international conference on computer vision. ICCV 2001, vol. 2, pp 416–4232. https://doi.org/10.1109/ICCV.2001.937655
Huang J-B, Singh A, Ahuja N (2015) Single image super-resolution from transformed self-exemplars. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 5197–5206. https://doi.org/10.1109/CVPR.2015.7299156
Wei C, Wang W, Yang, W, Liu J (2018) Deep retinex decomposition for low-light enhancement. In: British machine vision conference. British Machine Vision Association
Nah S, Hyun Kim T, Mu Lee K (2017) Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Dong Y, Liu Y, Zhang H, Chen S, Qiao Y (2020) FD-GAN: generative adversarial networks with fusion-discriminator for single image dehazing. In: The Thirty-Fourth AAAI conference on artificial intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp 10729–10736
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H, Shao L (2021) Multi-stage progressive image restoration. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR)
Fan Q, Chen D, Yuan L, Hua G, Yu N, Chen B (2021) A general decoupled learning framework for parameterized image operators. IEEE Trans Pattern Anal Mach Intell 43(1):33–47. https://doi.org/10.1109/TPAMI.2019.2925793
Article MATH Google Scholar
Chen L, Lu X, Zhang J, Chu X, Chen C (2021) Hinet: Half instance normalization network for image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) Workshops, pp 182–192
Mou C, Wang Q, Zhang J (2022) Deep generalized unfolding networks for image restoration. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H, Shao L (2020) Learning enriched features for real image restoration and enhancement. In: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV, pp. 492–511. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-030-58595-2_30
Valanarasu JMJ, Yasarla R, Patel VM (2021) Transweather: Transformer-based restoration of images degraded by adverse weather conditions. 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), 2343–2353
Liu L, Xie L, Zhang X, Yuan S, Chen X, Zhou W, Li H, Tian Q (2022) Tape: Task-agnostic prior embedding for image restoration. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T (eds) Computer Vision - ECCV 2022. Springer, Cham, pp 447–464
Yasarla R, Patel VM (2019) Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8397–8406. https://doi.org/10.1109/CVPR.2019.00860
Ren W, Ma L, Zhang J, Pan J-S, Cao X, Liu W, Yang M-H (2018) Gated fusion network for single image dehazing. 2018 IEEE/CVF Conference on computer vision and pattern recognition, 3253–3261
Li J, Li J, Fang F, Li F, Zhang G (2021) Luminance-aware pyramid network for low-light image enhancement. IEEE Trans Multimedia 23:3153–3165. https://doi.org/10.1109/TMM.2020.3021243
Article MATH Google Scholar
Chen D, He M, Fan Q, Liao J, Zhang L, Hou D, Yuan L, Hua G (2019) Gated context aggregation network for image dehazing and deraining. In: 2019 IEEE Winter conference on applications of computer vision (WACV), pp 1375–1383. https://doi.org/10.1109/WACV.2019.00151
Kupyn O, Martyniuk T, Wu J, Wang Z (2019) Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In: The IEEE International conference on computer vision (ICCV)
Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, Yang J, Zhou P, Wang Z (2021) Enlightengan: Deep light enhancement without paired supervision. IEEE Trans Image Process 30:2340–2349
Article Google Scholar
Zhang J, Pan J, Ren J, Song Y, Bao L, Lau RWH, Yang M-H (2018) Dynamic scene deblurring using spatially variant recurrent neural networks. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 2521–2529. https://doi.org/10.1109/CVPR.2018.00267
Wu W, Weng J, Zhang P, Wang X, Yang W, Jiang J (2022) Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 5891–5900. https://doi.org/10.1109/CVPR52688.2022.00581
Dong J, Pan J, Yang Z, Tang J (2023) Multi-scale residual low-pass filter network for image deblurring. In: 2023 IEEE/CVF International conference on computer vision (ICCV), pp. 12311–12320. https://doi.org/10.1109/ICCV51070.2023.01134
Wang Y, Liu Z, Liu J, Xu S, Liu S (2023) Low-light image enhancement with illumination-aware gamma correction and complete image modelling network. In: 2023 IEEE/CVF International conference on computer vision (ICCV), pp 13082–13091. https://doi.org/10.1109/ICCV51070.2023.01207
Yang B, Qin L, Liu J, Liu X (2022) Ircnn: An irregular-time-distanced recurrent convolutional neural network for change detection in satellite time series. IEEE Geosci Remote Sens Lett 19:1–5. https://doi.org/10.1109/LGRS.2022.3154894
Article MATH Google Scholar
Zhang K, Zuo W, Zhang L (2018) Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Trans Image Process 27(9):4608–4622. https://doi.org/10.1109/TIP.2018.2839891
Article MathSciNet MATH Google Scholar
Shi Y, Xia B, Jin X, Wang X, Zhao T, Xia X, Xiao X, Yang W (2024) Vmambair: Visual state space model for image restoration. arXiv:2403.11423
Zhen Z, Hu Y, Feng Z (2024) Freqmamba: Viewing mamba from a frequency perspective for image deraining. arXiv:2404.09476

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Tongji University, Shanghai, 201804, China
Aiqiang Tang, Yan Wu & Yuwei Zhang

Authors

Aiqiang Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yuwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Wu.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. This manuscript is original, has not been published before, and is not currently being considered for publication elsewhere.

Ethical and informed consent for data used

This study relied on the openly accessible datasets and cited the corresponding citations. As these datasets are publicly available and devoid of personally identifiable information, individual informed consent was not sought. Our utilization of these datasets aligns with ethical guidelines and legal obligations while honoring the provisions set by the data providers.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Degradation-specific task results

We re-train the model on each single degradation dataset. In Table 6, we compare with degradation-specific methods for deraining and dehazing, and for deblurring and low-light enhancement in image in Table 7. Also, extensive comparisons for image denoising are shown in Table 8. Notably, our RamIR outperforms the PromptIR network by achieving favorable gains of 0.9dB and 1.34dB for deraining and dehazing tasks, respectively. This indicates that the Mamba-based network modeling of long-distance dependencies significantly enhances the recovery process. Furthermore, in the realm of image deblurring, our approach surpasses the CUMamba [21], a Mamba-based network, by achieving a gain of 0.2dB. The incorporation of the ReAct prompt further aids in improving noise removal capabilities.

Table 6 Deraining and Dehazing comparison: We compare with task-specific classical methods on benchmark datasets

Full size table

Table 7 Quantitative comparison with state-of-the-art methods of deblurring and low-light enhancement

Full size table

Table 8 Comparison on image denoising with advance methods, we report PSNR on benchmark datasets considering different $\sigma $ noise levels

Full size table

Appendix B: Experiments on out-of-distribution degradation

Table 9 Out of distribution dataset for all-in-one image restoration under S3 setting of denoising level $\sigma =100$

Full size table

Thanks to our ReAct prompt learning, RamIR demonstrates promising “unified” image restoration capacity. We report results on BSD68 and Urban100 with noise level $\sigma = 100$, which are outside of the training dataset distribution, for evaluating degradation unseen tasks. In Table 9, our method demonstrates superior performance on this unseen dataset.

Appendix C: Experiments of Mamba-based restoration network

Table 10 Quantitative comparison for image deraining

Full size table

The power of the recently advanced state space model, specifically Mamba for image restoration, has been thoroughly explored to address the dilemma of balancing efficient computation with a global effective receptive field. Our method offers a novel perspective by integrating PVL with the Mamba-based restoration network. In Table 10, our approach outperforms the state-of-the-art Mamba-based image deraining network, showcasing immense potential in the field of image restoration.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tang, A., Wu, Y. & Zhang, Y. RamIR: Reasoning and action prompting with Mamba for all-in-one image restoration. Appl Intell 55, 258 (2025). https://doi.org/10.1007/s10489-024-06226-y

Download citation

Accepted: 26 December 2024
Published: 03 January 2025
DOI: https://doi.org/10.1007/s10489-024-06226-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RamIR: Reasoning and action prompting with Mamba for all-in-one image restoration

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

InstructIR: High-Quality Image Restoration Following Human Instructions

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion

Prompt-guided and degradation prior supervised transformer for adverse weather image restoration

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Appendices

Appendix A: Degradation-specific task results

Appendix B: Experiments on out-of-distribution degradation

Appendix C: Experiments of Mamba-based restoration network

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

RamIR: Reasoning and action prompting with Mamba for all-in-one image restoration

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

InstructIR: High-Quality Image Restoration Following Human Instructions

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion

Prompt-guided and degradation prior supervised transformer for adverse weather image restoration

Explore related subjects

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Appendices

Appendix A: Degradation-specific task results

Appendix B: Experiments on out-of-distribution degradation

Appendix C: Experiments of Mamba-based restoration network

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation