Image denoising using channel attention residual enhanced Swin Transformer

Dai, Qiang; Cheng, Xi; Zhang, Li

doi:10.1007/s11042-023-16209-9

Image denoising using channel attention residual enhanced Swin Transformer

Published: 25 July 2023

Volume 83, pages 19041–19059, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

400 Accesses
Explore all metrics

Abstract

Transformers have achieved remarkable results in high-level vision tasks, but their application in low-level computer vision tasks such as image denoising remains largely unexplored. In this paper, we propose a novel channel attention residual enhanced Swin Transformer denoising network (CARSTDn), which is an efficient and effective Transformer-based architecture. CARSTDn consists of three modules: shallow feature extraction, deep feature extraction, and image reconstruction modules. The deep feature extraction module is the core of CARSTDn, and it employs a channel attention residual Swin Transformer block (CARSTB). Our benchmarking results demonstrate that CARSTDn outperforms existing state-of-the-art methods, showcasing its superiority. We hope that our work will inspire further research into the use of Transformer-based architectures for image denoising tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new multi-scale CNN with pixel-wise attention for image denoising

Article 29 December 2023

MRDA-Net: Multiscale Residual Dense Attention Network for Image Denoising

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

References

Agustsson E, Timofte R (2017) Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 126–135
Ahmad S, Mehfuz S, Mebarek-Oudina F, Beg J (2022) Rsm analysis based cloud access security broker: a systematic literature review. Clust Comput 25(5):3733–3763
Article Google Scholar
Aljadaany R, Pal DK, Savvides M (2019) Proximal splitting networks for image restoration. In: International Conference on Image Analysis and Recognition, pp. 3–17. Springer
Anwar S, Barnes N (2019) Real image denoising with feature attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3155–3164
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2021) Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer
Chen Y, Pock T (2016) Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Trans Pattern Anal Mach Intell 39(6):1256–1272
Article PubMed Google Scholar
Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12299–12310
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Gu S, Zhang L, Zuo W, Feng X (2014) Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2862–2869
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
Huang JB, Singh A, Ahuja N (2015) Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141
Jia X, Liu S, Feng X, Zhang L (2019) Focnet: A fractional optimal control network for image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6054–6063
Lebrun M (2012) An analysis and implementation of the bm3d image denoising method. Image Processing On Line 2012:175–213
Article Google Scholar
Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: Image restoration using swin transformer. arXiv preprint arXiv:2108.10257
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: A survey. Int J Comput Vision 128(2):261–318
Article Google Scholar
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030
Liu Y, Sun G, Qiu Y, Zhang L, Chhatkuli A, Van Gool L (2021) Transformer in convolutional neural networks. arXiv preprint arXiv:2106.03180
Liu P, Zhang H, Zhang K, Lin L, Zuo W (2018) Multi-level wavelet-cnn for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782
Li Y, Zhang K, Cao J, Timofte R, Van Gool, L (2021) Localvit: Bringing locality to vision transformers. arXiv preprint arXiv:2104.05707
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101
Nyo, M.T., Mebarek-Oudina F, Hlaing SS, Khan NA (2022) Otsu’s thresholding technique for mri image brain tumor segmentation. Multimed Tools Appl 1–13
Plötz T, Roth S (2018) Neural nearest neighbors networks. Adv Neural Inf Process Syst 31
Quan Y, Chen Y, Shao Y, Teng H, Xu Y, Ji H (2021) Image denoising using complex-valued deep cnn. Pattern Recogn 111:107639
Article Google Scholar
Ramachandran P, Parmar N, Vaswani A, Bello I, Levskaya A, Shlens J (2019) Studying stand-alone self-attention in vision models
Roth S, Black MJ (2005) Fields of experts: A framework for learning image priors. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 860–867. IEEE
Shi Q, Tang X, Yang T, Liu R, Zhang L (2021) Hyperspectral image denoising using a 3-d attention denoising network. IEEE Transactions on Geoscience and Remote Sensing
Tian C, Xu Y, Fei L, Wang J, Wen J, Luo N (2019) Enhanced cnn for image denoising. CAAI Transactions on Intelligence Technology 4(1):17–23
Article Google Scholar
Tian C, Xu Y, Zuo W (2020) Image denoising using deep cnn with batch renormalization. Neural Netw 121:461–473
Article PubMed Google Scholar
Tian C, Xu Y, Li Z, Zuo W, Fei L, Liu H (2020) Attention-guided cnn for image denoising. Neural Netw 124:117–129
Article PubMed Google Scholar
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR
Vaswani A, Ramachandran P, Srinivas A, Parmar N, Hechtman B, Shlens J (2021) Scaling local self-attention for parameter efficient visual backbones. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12894–12904
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008
Wang Z, Cun X, Bao J, Liu J (2021) Uformer: A general u-shaped transformer for image restoration. arXiv preprint arXiv:2106.03106
Wu H, Xiao B, Codella, N, Liu M, Dai X, Yuan L, Zhang, L (2021) Cvt: Introducing convolutions to vision transformers. arXiv preprint arXiv:2103.15808
Wu B, Xu C, Dai X, Wan A, Zhang P, Yan Z, Tomizuka M, Gonzalez J, Keutzer K, Vajda P (2020) Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677
Xiao J, Zhao R, Lam K-M (2021) Bayesian sparse hierarchical model for image denoising. Signal Processing: Image Communication 96:116299
Google Scholar
Xu J, Zhang L, Zuo W, Zhang D, Feng X (2015) Patch group based nonlocal self-similarity prior learning for image denoising. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 244–252
Yuan K, Guo S, Liu Z, Zhou A, Yu F, Wu W (2021) Incorporating convolution designs into visual transformers. arXiv preprint arXiv:2103.11816
Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans Image Process 26(7):3142–3155
Article ADS MathSciNet PubMed Google Scholar
Zhang K, Zuo W, Zhang L (2018) Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Trans Image Process 27(9):4608–4622
Article ADS MathSciNet Google Scholar
Zhang Y, Li K, Li K, Zhong B, Fu Y (2019) Residual non-local attention networks for image restoration. arXiv preprint arXiv:1903.10082
Zhang K, Li Y, Zuo W, Zhang L, Van Gool L, Timofte R (2021) Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence
Zhang K, Zuo W, Gu S, Zhang L (2017) Learning deep cnn denoiser prior for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929–3938
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH, et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890
Zoran D, Weiss Y (2011) From learning models of natural image patches to whole image restoration. In: 2011 International Conference on Computer Vision, pp. 479–486. IEEE

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their critical and constructive comments and suggestions.This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under [Grant No. 19KJA550002], by the Six Talent Peak Project of Jiangsu Province of China under [Grant No. XYDXX-054], by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, China
Qiang Dai & Li Zhang
School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, 210003, China
Xi Cheng
Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, 999077, China
Xi Cheng

Authors

Qiang Dai
View author publications
You can also search for this author in PubMed Google Scholar
Xi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang.

Ethics declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dai, Q., Cheng, X. & Zhang, L. Image denoising using channel attention residual enhanced Swin Transformer. Multimed Tools Appl 83, 19041–19059 (2024). https://doi.org/10.1007/s11042-023-16209-9

Download citation

Received: 11 February 2022
Revised: 23 April 2023
Accepted: 04 July 2023
Published: 25 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-16209-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image denoising using channel attention residual enhanced Swin Transformer

Abstract

Access this article

Similar content being viewed by others

A new multi-scale CNN with pixel-wise attention for image denoising

MRDA-Net: Multiscale Residual Dense Attention Network for Image Denoising

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Image denoising using channel attention residual enhanced Swin Transformer

Abstract

Access this article

Similar content being viewed by others

A new multi-scale CNN with pixel-wise attention for image denoising

MRDA-Net: Multiscale Residual Dense Attention Network for Image Denoising

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation