RainFormer: a pyramid transformer for single image deraining

Yang, Hao; Zhou, Dongming; Cao, Jinde; Zhao, Qian; Li, Miao

doi:10.1007/s11227-022-04895-5

RainFormer: a pyramid transformer for single image deraining

Published: 31 October 2022

Volume 79, pages 6115–6140, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Hao Yang¹,
Dongming Zhou¹,
Jinde Cao^2,3,
Qian Zhao¹ &
…
Miao Li¹

742 Accesses
4 Citations
Explore all metrics

Abstract

Rain impairs the performance of outdoor vision systems, such as automated driving systems and outdoor surveillance systems. Therefore, as an image preprocessing technique, image deraining has great potential for application. Defects of convolutional neural networks (small receptive field and non-adaptive to input content) limit the further improvement of deraining model performance. Recently, a novel neural network, transformer, has demonstrated impressive performance on natural language processing and vision tasks. However, using transformer for image deraining still has some issues: Although transformers have powerful long-range computing capabilities, it lacks the ability to model local features, which is critical for image deraining. In addition, transformer uses fixed-size patches to process images, which leads to pixels at the edges of the patches that cannot use the local features of neighboring pixels to restore rain-free images. In this paper, we propose a novel pyramid transformer for image deraining. To address the first issue, we design a residual-Dconv feed-forward network (RDFN), where depth-wise convolution improves the capability of modeling local features. To address the second issue, we introduce multi-resolution features into the transformer, which allows the transformer to obtain patches with different scales, thus enabling the boundary pixels to utilize local features. Furthermore, we propose a novel multi-scale fusion bridge (MSFB) to effectively integrate the extracted multi-scale features and capture the correlation between different scales. Extensive experiments on synthetic and real-world images demonstrate that the proposed deraining model achieves superior performance, especially the PSNR value achieves 47.55 dB on the SPA-Data dataset. We also further validate the effectiveness of the proposed model on subsequent high-level computer vision tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Methods for image denoising using convolutional neural network: a review

Article Open access 10 June 2021

Learning a Deep Convolutional Network for Image Super-Resolution

Data and code availability

Code and data are available.

References

Huang Z, Wu J, Lv C (2022) Efficient deep reinforcement learning with imitative expert priors for autonomous driving. In: IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2022.3142822
Kim C-J, Lee M-J, Hwang K-H, Ha Y-G (2022) End-to-end deep learning-based autonomous driving control for high-speed environment. J Supercomput 78(2):1961–1982
Article Google Scholar
Li Y, Tan RT, Guo X, Lu J, Brown MS (2016) Rain streak removal using layer priors. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2736–2744
Wang Y, Liu S, Chen C, Zeng B (2017) A hierarchical approach for rain or snow removing in a single color image. IEEE Trans Image Process 26(8):3936–3950
Article MathSciNet MATH Google Scholar
Luo Y, Xu Y, Ji H (2015) Removing rain from a single image via discriminative sparse coding. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3397–3405
Guo Z, Zhang X, Liu C, Ji X, Jiao J, Ye Q (2022) Convex-hull feature adaptation for oriented and densely packed object detection. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2022.3140248
Article Google Scholar
Fang B, Fang L (2020) Concise feature pyramid region proposal network for multi-scale object detection. J Supercomput 76(5):3327–3337
Article Google Scholar
Polat Ö, Güngen C (2021) Classification of brain tumors from MR images using deep transfer learning. J Supercomput 77(7):7236–7252
Article Google Scholar
Fu X, Huang J, Ding X, Liao Y, Paisley J (2017) Clearing the skies: a deep network architecture for single-image rain removal. IEEE Trans on Image Process 26(6):2944–2956
Article MathSciNet MATH Google Scholar
Ren D, Zuo W, Hu Q, Zhu P, Meng D (2019) Progressive image deraining networks: a better and simpler baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3937–3946
Wang H, Xie Q, Zhao Q, Meng D (2020) A model-driven deep neural network for single image rain removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3103–3112
Chen C, Li H (2021) Robust representation learning with feedback for single image deraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7742–7751
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H, Shao L (2021) Multi-stage progressive image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14821–14831
Jiang K, Wang Z, Yi P, Chen C, Wang Z, Wang X, Jiang J, Lin C-W (2021) Rain-free and residue hand-in-hand: a progressive coupled network for real-time image deraining. IEEE Trans Image Process 30:7404–7418
Article Google Scholar
Liu X, Suganuma M, Sun Z, Okatani T (2019) Dual residual networks leveraging the potential of paired operations for image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7007–7016
Deng S, Wei M, Wang J, Feng Y, Liang L, Xie H, Wang FL, Wang M (2020) Detail-recovery image deraining via context aggregation networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14560–14569
Yang H, Zhou D, Li M, Zhao Q (2022) A two-stage network with wavelet transformation for single-image deraining. The Visual Computer, pp 1–17
Li X, Wu J, Lin Z, Liu H, Zha H (2018) Recurrent squeeze-and-excitation context aggregation net for single image deraining. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 254–269
Yang W, Tan RT, Feng J, Liu J, Guo Z, Yan S (2017) Deep joint rain detection and removal from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1357–1366
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, 30. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Otter DW, Medina JR, Kalita JK (2020) A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Sys 32(2):604–624
Article MathSciNet Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16 x 16 words: transformers for image recognition at scale. Preprint arXiv:2010.11929
Gao Y, Liu X, Li J, Fang Z, Jiang X, Huq KMS (2022) Lft-net: local feature transformer network for point clouds analysis. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2022.3140355
Gou C, Zhou Y, Li D (2022) Driver attention prediction based on convolution and transformers. J Supercomput. https://doi.org/10.1007/s11227-021-04151-2
Article Google Scholar
Tan F, Kong Y, Fan Y, Liu F, Zhou D, Chen L, Gao L, Qian Y et al (2021) Sdnet: mutil-branch for single image deraining using swin. arXiv preprint arXiv:2105.15077
Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2022) Transformers in vision: A survey. ACM computing surveys (CSUR) 54(10s):1–41. https://doi.org/10.1145/3505244
Liang Y, Anwar S, Liu Y (2022) Drt: A lightweight single image deraining recursive transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp 589–598
Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12299–12310
Wang Z, Cun X, Bao J, Liu J (2021) Uformer: a general u-shaped transformer for image restoration. Preprint arXiv:2106.03106
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022
Ji H, Feng X, Pei W, Li J, Lu G (2021) U2-former: a nested u-shaped transformer for image restoration. arXiv:2112.02279
Ran W, Yang Y, Lu H (2020) Single image rain removal boosting via directional gradient. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 1–6
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Fu X, Huang J, Zeng D, Huang Y, Ding X, Paisley J (2017) Removing rain from single images via a deep detail network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3855–3863
Zhang H, Sindagi V, Patel VM (2019) Image de-raining using a conditional generative adversarial network. IEEE Trans Circuits Syst Video Technol 30(11):3943–3956
Article Google Scholar
Zhang H, Patel VM (2018) Density-aware single image de-raining using a multi-stream dense network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 695–704
Wang H, Yue Z, Xie Q, Zhao Q, Zheng Y, Meng D (2021) From rain generation to rain removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14791–14801
Wang C, Wu Y, Cai Y, Yao G, Su Z, Wang H (2020) Single image deraining via deep pyramid network with spatial contextual information aggregation. Appl Intell 50(5):1437–1447
Article Google Scholar
Wang T, Yang X, Xu K, Chen S, Zhang Q, Lau RW (2019) Spatial attentive single-image deraining with a high quality real rain dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12270–12279
Fu X, Qi Q, Zha Z-J, Zhu Y, Ding X (2021) Rain streak removal via dual graph convolutional network. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 1–9
Li Y, Monno Y, Okutomi M (2022) Single image deraining network with rain embedding consistency and layered lstm. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 4060–4069
Yang W, Tan RT, Wang S, Fang Y, Liu J (2020) Single image deraining: from model-based to data-driven and beyond. IEEE Trans Pattern Anal Mach Intell 43(11):4059–4077
Article Google Scholar
Pei J, Cheng T, Tang H, Chen C (2022) Transformer-based efficient salient instance segmentation networks with orientative query. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2022.3141891
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6881–6890
Dai X, Chen Y, Yang J, Zhang P, Yuan L, Zhang L (2021) Dynamic detr: end-to-end object detection with dynamic attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2988–2997
Stoffl L, Vidal M, Mathis A (2021) End-to-end trainable multi-instance pose estimation with transformers. Preprint arXiv:2103.12115
Jiang T, Camgoz NC, Bowden R (2021) Skeletor: skeletal transformers for robust body-pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3394–3402
Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1833–1844
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H (2021) Restormer: efficient transformer for high-resolution image restoration. Preprint arXiv:2111.09881
Valanarasu JMJ, Yasarla R, Patel VM (2021) Transweather: transformer-based restoration of images degraded by adverse weather conditions. Preprint arXiv:2111.14813
Sun D, Yang X, Liu M-Y, Kautz J (2018) Pwc-net: CNNS for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8934–8943
Qiao S, Chen L-C, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10213–10224
Cho S-J, Ji S-W, Hong J-P, Jung S-W, Ko S-J (2021) Rethinking coarse-to-fine approach in single image deblurring. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4641–4650
Fu X, Liang B, Huang Y, Ding X, Paisley J (2019) Lightweight pyramid networks for image deraining. IEEE Trans Neural Netw Learn Syst 31(6):1794–1807
Article Google Scholar
Jiang K, Wang Z, Yi P, Chen C, Huang B, Luo Y, Ma J, Jiang J (2020) Multi-scale progressive fusion network for single image deraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8346–8355
Yasarla R, Patel VM (2019) Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8405–8414
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, Springer, pp 234–241
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. Preprint arXiv:1607.06450
Li Y, Zhang K, Cao J, Timofte R, Van Gool L (2021) Localvit: bringing locality to vision transformers. Preprint arXiv:2104.05707
Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 22–31
Xu Y, Wei H, Lin M, Deng Y, Sheng K, Zhang M, Tang F, Dong W, Huang F, Xu C (2022) Transformers in computational visual media: a survey. Comput Vis Media 8(1):33–62
Article Google Scholar
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv:1606.08415
Barron JT (2019) A general and adaptive robust loss function. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4331–4339
Zhao Q, Zhou D, Yang H (2022) Cdmc-net: context-aware image deblurring using a multi-scale cascaded network. Neural Process Lett 1–22. https://doi.org/10.1007/s11063-022-10976-6
Yang H, Zhou D, Cao J, Zhao Q (2022) Dpnet: detail-preserving image deraining via learning frequency domain knowledge. Digit Sig Process 130:103740. https://doi.org/10.1016/j.dsp.2022.103740
Article Google Scholar
Wan Y, Lu T, Yang W, Huang W (2015) A novel image segmentation algorithm via neighborhood principal component analysis and laplace operator. In: 2015 International Conference on Network and Information Systems for Computers, IEEE, pp 273–276
Huynh-Thu Q, Ghanbari M (2008) Scope of validity of PSNR in image/video quality assessment. Electron Lett 44(13):800–801
Article Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Zhang L, Zhang L, Mou X, Zhang D (2011) FSIM: a feature similarity index for image quality assessment. IEEE Trans Image Process 20(8):2378–2386
Article MathSciNet MATH Google Scholar
Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444
Article Google Scholar
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. Preprint arXiv:2004.10934
Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1314–1324
Jifara W, Jiang F, Rho S, Cheng M, Liu S (2019) Medical image denoising using convolutional neural network: a residual learning approach. J Supercomput 75(2):704–718
Article Google Scholar
Zhang Y, Ma SY, Zhang X, Li L, Ip WH, Yung KL (2020) Edgan: motion deblurring algorithm based on enhanced generative adversarial networks. J Supercomput 76(11):8922–8937
Article Google Scholar

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grants 62066047, 61966037.

Author information

Authors and Affiliations

School of Information Science and Engineering, Yunnan University, Kunming, 650504, China
Hao Yang, Dongming Zhou, Qian Zhao & Miao Li
School of Mathematics, Southeast University, Nanjing, 210096, China
Jinde Cao
Yonsei Frontier Lab, Yonsei University, Seoul, 03722, South Korea
Jinde Cao

Authors

Hao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dongming Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jinde Cao
View author publications
You can also search for this author in PubMed Google Scholar
Qian Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Miao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongming Zhou.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Ethical approval

The experiments in this article are all realized through program operation, which will not cause harm to humans and animals and will not cause moral and ethical problems.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, H., Zhou, D., Cao, J. et al. RainFormer: a pyramid transformer for single image deraining. J Supercomput 79, 6115–6140 (2023). https://doi.org/10.1007/s11227-022-04895-5

Download citation

Accepted: 12 October 2022
Published: 31 October 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11227-022-04895-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RainFormer: a pyramid transformer for single image deraining

Abstract

Access this article

Similar content being viewed by others

A review of convolutional neural networks in computer vision

Methods for image denoising using convolutional neural network: a review

Learning a Deep Convolutional Network for Image Super-Resolution

Data and code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RainFormer: a pyramid transformer for single image deraining

Abstract

Access this article

Similar content being viewed by others

A review of convolutional neural networks in computer vision

Methods for image denoising using convolutional neural network: a review

Learning a Deep Convolutional Network for Image Super-Resolution

Data and code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation