Skip to main content
Log in

RainFormer: a pyramid transformer for single image deraining

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Rain impairs the performance of outdoor vision systems, such as automated driving systems and outdoor surveillance systems. Therefore, as an image preprocessing technique, image deraining has great potential for application. Defects of convolutional neural networks (small receptive field and non-adaptive to input content) limit the further improvement of deraining model performance. Recently, a novel neural network, transformer, has demonstrated impressive performance on natural language processing and vision tasks. However, using transformer for image deraining still has some issues: Although transformers have powerful long-range computing capabilities, it lacks the ability to model local features, which is critical for image deraining. In addition, transformer uses fixed-size patches to process images, which leads to pixels at the edges of the patches that cannot use the local features of neighboring pixels to restore rain-free images. In this paper, we propose a novel pyramid transformer for image deraining. To address the first issue, we design a residual-Dconv feed-forward network (RDFN), where depth-wise convolution improves the capability of modeling local features. To address the second issue, we introduce multi-resolution features into the transformer, which allows the transformer to obtain patches with different scales, thus enabling the boundary pixels to utilize local features. Furthermore, we propose a novel multi-scale fusion bridge (MSFB) to effectively integrate the extracted multi-scale features and capture the correlation between different scales. Extensive experiments on synthetic and real-world images demonstrate that the proposed deraining model achieves superior performance, especially the PSNR value achieves 47.55 dB on the SPA-Data dataset. We also further validate the effectiveness of the proposed model on subsequent high-level computer vision tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data and code availability

Code and data are available.

References

  1. Huang Z, Wu J, Lv C (2022) Efficient deep reinforcement learning with imitative expert priors for autonomous driving. In: IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2022.3142822

  2. Kim C-J, Lee M-J, Hwang K-H, Ha Y-G (2022) End-to-end deep learning-based autonomous driving control for high-speed environment. J Supercomput 78(2):1961–1982

    Article  Google Scholar 

  3. Li Y, Tan RT, Guo X, Lu J, Brown MS (2016) Rain streak removal using layer priors. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2736–2744

  4. Wang Y, Liu S, Chen C, Zeng B (2017) A hierarchical approach for rain or snow removing in a single color image. IEEE Trans Image Process 26(8):3936–3950

    Article  MathSciNet  MATH  Google Scholar 

  5. Luo Y, Xu Y, Ji H (2015) Removing rain from a single image via discriminative sparse coding. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3397–3405

  6. Guo Z, Zhang X, Liu C, Ji X, Jiao J, Ye Q (2022) Convex-hull feature adaptation for oriented and densely packed object detection. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2022.3140248

    Article  Google Scholar 

  7. Fang B, Fang L (2020) Concise feature pyramid region proposal network for multi-scale object detection. J Supercomput 76(5):3327–3337

    Article  Google Scholar 

  8. Polat Ö, Güngen C (2021) Classification of brain tumors from MR images using deep transfer learning. J Supercomput 77(7):7236–7252

    Article  Google Scholar 

  9. Fu X, Huang J, Ding X, Liao Y, Paisley J (2017) Clearing the skies: a deep network architecture for single-image rain removal. IEEE Trans on Image Process 26(6):2944–2956

    Article  MathSciNet  MATH  Google Scholar 

  10. Ren D, Zuo W, Hu Q, Zhu P, Meng D (2019) Progressive image deraining networks: a better and simpler baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3937–3946

  11. Wang H, Xie Q, Zhao Q, Meng D (2020) A model-driven deep neural network for single image rain removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3103–3112

  12. Chen C, Li H (2021) Robust representation learning with feedback for single image deraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7742–7751

  13. Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H, Shao L (2021) Multi-stage progressive image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14821–14831

  14. Jiang K, Wang Z, Yi P, Chen C, Wang Z, Wang X, Jiang J, Lin C-W (2021) Rain-free and residue hand-in-hand: a progressive coupled network for real-time image deraining. IEEE Trans Image Process 30:7404–7418

    Article  Google Scholar 

  15. Liu X, Suganuma M, Sun Z, Okatani T (2019) Dual residual networks leveraging the potential of paired operations for image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7007–7016

  16. Deng S, Wei M, Wang J, Feng Y, Liang L, Xie H, Wang FL, Wang M (2020) Detail-recovery image deraining via context aggregation networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14560–14569

  17. Yang H, Zhou D, Li M, Zhao Q (2022) A two-stage network with wavelet transformation for single-image deraining. The Visual Computer, pp 1–17

  18. Li X, Wu J, Lin Z, Liu H, Zha H (2018) Recurrent squeeze-and-excitation context aggregation net for single image deraining. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 254–269

  19. Yang W, Tan RT, Feng J, Liu J, Guo Z, Yan S (2017) Deep joint rain detection and removal from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1357–1366

  20. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, 30. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

  21. Otter DW, Medina JR, Kalita JK (2020) A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Sys 32(2):604–624

    Article  MathSciNet  Google Scholar 

  22. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16 x 16 words: transformers for image recognition at scale. Preprint arXiv:2010.11929

  23. Gao Y, Liu X, Li J, Fang Z, Jiang X, Huq KMS (2022) Lft-net: local feature transformer network for point clouds analysis. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2022.3140355

  24. Gou C, Zhou Y, Li D (2022) Driver attention prediction based on convolution and transformers. J Supercomput. https://doi.org/10.1007/s11227-021-04151-2

    Article  Google Scholar 

  25. Tan F, Kong Y, Fan Y, Liu F, Zhou D, Chen L, Gao L, Qian Y et al (2021) Sdnet: mutil-branch for single image deraining using swin. arXiv preprint arXiv:2105.15077

  26. Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2022) Transformers in vision: A survey. ACM computing surveys (CSUR) 54(10s):1–41. https://doi.org/10.1145/3505244

  27. Liang Y, Anwar S, Liu Y (2022) Drt: A lightweight single image deraining recursive transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp 589–598

  28. Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12299–12310

  29. Wang Z, Cun X, Bao J, Liu J (2021) Uformer: a general u-shaped transformer for image restoration. Preprint arXiv:2106.03106

  30. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022

  31. Ji H, Feng X, Pei W, Li J, Lu G (2021) U2-former: a nested u-shaped transformer for image restoration. arXiv:2112.02279

  32. Ran W, Yang Y, Lu H (2020) Single image rain removal boosting via directional gradient. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 1–6

  33. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  34. Fu X, Huang J, Zeng D, Huang Y, Ding X, Paisley J (2017) Removing rain from single images via a deep detail network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3855–3863

  35. Zhang H, Sindagi V, Patel VM (2019) Image de-raining using a conditional generative adversarial network. IEEE Trans Circuits Syst Video Technol 30(11):3943–3956

    Article  Google Scholar 

  36. Zhang H, Patel VM (2018) Density-aware single image de-raining using a multi-stream dense network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 695–704

  37. Wang H, Yue Z, Xie Q, Zhao Q, Zheng Y, Meng D (2021) From rain generation to rain removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14791–14801

  38. Wang C, Wu Y, Cai Y, Yao G, Su Z, Wang H (2020) Single image deraining via deep pyramid network with spatial contextual information aggregation. Appl Intell 50(5):1437–1447

    Article  Google Scholar 

  39. Wang T, Yang X, Xu K, Chen S, Zhang Q, Lau RW (2019) Spatial attentive single-image deraining with a high quality real rain dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12270–12279

  40. Fu X, Qi Q, Zha Z-J, Zhu Y, Ding X (2021) Rain streak removal via dual graph convolutional network. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 1–9

  41. Li Y, Monno Y, Okutomi M (2022) Single image deraining network with rain embedding consistency and layered lstm. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 4060–4069

  42. Yang W, Tan RT, Wang S, Fang Y, Liu J (2020) Single image deraining: from model-based to data-driven and beyond. IEEE Trans Pattern Anal Mach Intell 43(11):4059–4077

    Article  Google Scholar 

  43. Pei J, Cheng T, Tang H, Chen C (2022) Transformer-based efficient salient instance segmentation networks with orientative query. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2022.3141891

  44. Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6881–6890

  45. Dai X, Chen Y, Yang J, Zhang P, Yuan L, Zhang L (2021) Dynamic detr: end-to-end object detection with dynamic attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2988–2997

  46. Stoffl L, Vidal M, Mathis A (2021) End-to-end trainable multi-instance pose estimation with transformers. Preprint arXiv:2103.12115

  47. Jiang T, Camgoz NC, Bowden R (2021) Skeletor: skeletal transformers for robust body-pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3394–3402

  48. Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1833–1844

  49. Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H (2021) Restormer: efficient transformer for high-resolution image restoration. Preprint arXiv:2111.09881

  50. Valanarasu JMJ, Yasarla R, Patel VM (2021) Transweather: transformer-based restoration of images degraded by adverse weather conditions. Preprint arXiv:2111.14813

  51. Sun D, Yang X, Liu M-Y, Kautz J (2018) Pwc-net: CNNS for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8934–8943

  52. Qiao S, Chen L-C, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10213–10224

  53. Cho S-J, Ji S-W, Hong J-P, Jung S-W, Ko S-J (2021) Rethinking coarse-to-fine approach in single image deblurring. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4641–4650

  54. Fu X, Liang B, Huang Y, Ding X, Paisley J (2019) Lightweight pyramid networks for image deraining. IEEE Trans Neural Netw Learn Syst 31(6):1794–1807

    Article  Google Scholar 

  55. Jiang K, Wang Z, Yi P, Chen C, Huang B, Luo Y, Ma J, Jiang J (2020) Multi-scale progressive fusion network for single image deraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8346–8355

  56. Yasarla R, Patel VM (2019) Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8405–8414

  57. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, Springer, pp 234–241

  58. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. Preprint arXiv:1607.06450

  59. Li Y, Zhang K, Cao J, Timofte R, Van Gool L (2021) Localvit: bringing locality to vision transformers. Preprint arXiv:2104.05707

  60. Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 22–31

  61. Xu Y, Wei H, Lin M, Deng Y, Sheng K, Zhang M, Tang F, Dong W, Huang F, Xu C (2022) Transformers in computational visual media: a survey. Comput Vis Media 8(1):33–62

    Article  Google Scholar 

  62. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv:1606.08415

  63. Barron JT (2019) A general and adaptive robust loss function. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4331–4339

  64. Zhao Q, Zhou D, Yang H (2022) Cdmc-net: context-aware image deblurring using a multi-scale cascaded network. Neural Process Lett 1–22. https://doi.org/10.1007/s11063-022-10976-6

  65. Yang H, Zhou D, Cao J, Zhao Q (2022) Dpnet: detail-preserving image deraining via learning frequency domain knowledge. Digit Sig Process 130:103740. https://doi.org/10.1016/j.dsp.2022.103740

    Article  Google Scholar 

  66. Wan Y, Lu T, Yang W, Huang W (2015) A novel image segmentation algorithm via neighborhood principal component analysis and laplace operator. In: 2015 International Conference on Network and Information Systems for Computers, IEEE, pp 273–276

  67. Huynh-Thu Q, Ghanbari M (2008) Scope of validity of PSNR in image/video quality assessment. Electron Lett 44(13):800–801

    Article  Google Scholar 

  68. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  69. Zhang L, Zhang L, Mou X, Zhang D (2011) FSIM: a feature similarity index for image quality assessment. IEEE Trans Image Process 20(8):2378–2386

    Article  MathSciNet  MATH  Google Scholar 

  70. Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444

    Article  Google Scholar 

  71. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. Preprint arXiv:2004.10934

  72. Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1314–1324

  73. Jifara W, Jiang F, Rho S, Cheng M, Liu S (2019) Medical image denoising using convolutional neural network: a residual learning approach. J Supercomput 75(2):704–718

    Article  Google Scholar 

  74. Zhang Y, Ma SY, Zhang X, Li L, Ip WH, Yung KL (2020) Edgan: motion deblurring algorithm based on enhanced generative adversarial networks. J Supercomput 76(11):8922–8937

    Article  Google Scholar 

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grants 62066047, 61966037.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongming Zhou.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Ethical approval

The experiments in this article are all realized through program operation, which will not cause harm to humans and animals and will not cause moral and ethical problems.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, H., Zhou, D., Cao, J. et al. RainFormer: a pyramid transformer for single image deraining. J Supercomput 79, 6115–6140 (2023). https://doi.org/10.1007/s11227-022-04895-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04895-5

Keywords

Navigation