Skip to main content
Log in

Image denoising using channel attention residual enhanced Swin Transformer

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Transformers have achieved remarkable results in high-level vision tasks, but their application in low-level computer vision tasks such as image denoising remains largely unexplored. In this paper, we propose a novel channel attention residual enhanced Swin Transformer denoising network (CARSTDn), which is an efficient and effective Transformer-based architecture. CARSTDn consists of three modules: shallow feature extraction, deep feature extraction, and image reconstruction modules. The deep feature extraction module is the core of CARSTDn, and it employs a channel attention residual Swin Transformer block (CARSTB). Our benchmarking results demonstrate that CARSTDn outperforms existing state-of-the-art methods, showcasing its superiority. We hope that our work will inspire further research into the use of Transformer-based architectures for image denoising tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Agustsson E, Timofte R (2017) Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 126–135

  2. Ahmad S, Mehfuz S, Mebarek-Oudina F, Beg J (2022) Rsm analysis based cloud access security broker: a systematic literature review. Clust Comput 25(5):3733–3763

    Article  Google Scholar 

  3. Aljadaany R, Pal DK, Savvides M (2019) Proximal splitting networks for image restoration. In: International Conference on Image Analysis and Recognition, pp. 3–17. Springer

  4. Anwar S, Barnes N (2019) Real image denoising with feature attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3155–3164

  5. Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2021) Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537

  6. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer

  7. Chen Y, Pock T (2016) Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Trans Pattern Anal Mach Intell 39(6):1256–1272

    Article  PubMed  Google Scholar 

  8. Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12299–12310

  9. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  10. Gu S, Zhang L, Zuo W, Feng X (2014) Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2862–2869

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778

  12. Huang JB, Singh A, Ahuja N (2015) Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206

  13. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141

  14. Jia X, Liu S, Feng X, Zhang L (2019) Focnet: A fractional optimal control network for image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6054–6063

  15. Lebrun M (2012) An analysis and implementation of the bm3d image denoising method. Image Processing On Line 2012:175–213

    Article  Google Scholar 

  16. Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: Image restoration using swin transformer. arXiv preprint arXiv:2108.10257

  17. Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: A survey. Int J Comput Vision 128(2):261–318

    Article  Google Scholar 

  18. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030

  19. Liu Y, Sun G, Qiu Y, Zhang L, Chhatkuli A, Van Gool L (2021) Transformer in convolutional neural networks. arXiv preprint arXiv:2106.03180

  20. Liu P, Zhang H, Zhang K, Lin L, Zuo W (2018) Multi-level wavelet-cnn for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782

  21. Li Y, Zhang K, Cao J, Timofte R, Van Gool, L (2021) Localvit: Bringing locality to vision transformers. arXiv preprint arXiv:2104.05707

  22. Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101

  23. Nyo, M.T., Mebarek-Oudina F, Hlaing SS, Khan NA (2022) Otsu’s thresholding technique for mri image brain tumor segmentation. Multimed Tools Appl 1–13

  24. Plötz T, Roth S (2018) Neural nearest neighbors networks. Adv Neural Inf Process Syst 31

  25. Quan Y, Chen Y, Shao Y, Teng H, Xu Y, Ji H (2021) Image denoising using complex-valued deep cnn. Pattern Recogn 111:107639

    Article  Google Scholar 

  26. Ramachandran P, Parmar N, Vaswani A, Bello I, Levskaya A, Shlens J (2019) Studying stand-alone self-attention in vision models

  27. Roth S, Black MJ (2005) Fields of experts: A framework for learning image priors. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 860–867. IEEE

  28. Shi Q, Tang X, Yang T, Liu R, Zhang L (2021) Hyperspectral image denoising using a 3-d attention denoising network. IEEE Transactions on Geoscience and Remote Sensing

  29. Tian C, Xu Y, Fei L, Wang J, Wen J, Luo N (2019) Enhanced cnn for image denoising. CAAI Transactions on Intelligence Technology 4(1):17–23

    Article  Google Scholar 

  30. Tian C, Xu Y, Zuo W (2020) Image denoising using deep cnn with batch renormalization. Neural Netw 121:461–473

    Article  PubMed  Google Scholar 

  31. Tian C, Xu Y, Li Z, Zuo W, Fei L, Liu H (2020) Attention-guided cnn for image denoising. Neural Netw 124:117–129

    Article  PubMed  Google Scholar 

  32. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR

  33. Vaswani A, Ramachandran P, Srinivas A, Parmar N, Hechtman B, Shlens J (2021) Scaling local self-attention for parameter efficient visual backbones. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12894–12904

  34. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008

  35. Wang Z, Cun X, Bao J, Liu J (2021) Uformer: A general u-shaped transformer for image restoration. arXiv preprint arXiv:2106.03106

  36. Wu H, Xiao B, Codella, N, Liu M, Dai X, Yuan L, Zhang, L (2021) Cvt: Introducing convolutions to vision transformers. arXiv preprint arXiv:2103.15808

  37. Wu B, Xu C, Dai X, Wan A, Zhang P, Yan Z, Tomizuka M, Gonzalez J, Keutzer K, Vajda P (2020) Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677

  38. Xiao J, Zhao R, Lam K-M (2021) Bayesian sparse hierarchical model for image denoising. Signal Processing: Image Communication 96:116299

    Google Scholar 

  39. Xu J, Zhang L, Zuo W, Zhang D, Feng X (2015) Patch group based nonlocal self-similarity prior learning for image denoising. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 244–252

  40. Yuan K, Guo S, Liu Z, Zhou A, Yu F, Wu W (2021) Incorporating convolution designs into visual transformers. arXiv preprint arXiv:2103.11816

  41. Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans Image Process 26(7):3142–3155

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  42. Zhang K, Zuo W, Zhang L (2018) Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Trans Image Process 27(9):4608–4622

    Article  ADS  MathSciNet  Google Scholar 

  43. Zhang Y, Li K, Li K, Zhong B, Fu Y (2019) Residual non-local attention networks for image restoration. arXiv preprint arXiv:1903.10082

  44. Zhang K, Li Y, Zuo W, Zhang L, Van Gool L, Timofte R (2021) Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence

  45. Zhang K, Zuo W, Gu S, Zhang L (2017) Learning deep cnn denoiser prior for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929–3938

  46. Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH, et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890

  47. Zoran D, Weiss Y (2011) From learning models of natural image patches to whole image restoration. In: 2011 International Conference on Computer Vision, pp. 479–486. IEEE

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their critical and constructive comments and suggestions.This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under [Grant No. 19KJA550002], by the Six Talent Peak Project of Jiangsu Province of China under [Grant No. XYDXX-054], by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Zhang.

Ethics declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, Q., Cheng, X. & Zhang, L. Image denoising using channel attention residual enhanced Swin Transformer. Multimed Tools Appl 83, 19041–19059 (2024). https://doi.org/10.1007/s11042-023-16209-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16209-9

Keywords

Navigation