Skip to main content
Log in

Fusing Convolution and Self-Attention Parallel in Frequency Domain for Image Deblurring

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Deblurring is a challenging problem in image restoration. It’s important to use both local details and global information of the image for deblurring. Therefore, this paper proposes a deblurring model that integrates Convolution and Transformer in a parallel manner. Unlike existing methods that use single operators or serial combinations, the convolution operation and self-attention mechanism in this model are learned in parallel and extract features separately, and then the features are fused in the frequency domain. The convolution operation is beneficial in extracting local information while the self-attention mechanism focuses more on global information. The parallel structure enables the model to capture both local and global information simultaneously. Additionally, the frequency domain fusion module is proposed based on the analysis of the mathematical model of image blur, and the results indicate that the proposed model is reasonable. Experiments on multiple deblurring datasets verify that the proposed parallel structure of Convolution-Transformer and frequency domain fusion method are effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availibility

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Li J, Tan W, Yan B (2021) Perceptual variousness motion deblurring with light global context refinement. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp. 4096–4105 https://doi.org/10.1109/ICCV48922.2021.00408

  2. Agrawal A, Raskar R (2009) Optimal single image capture for motion deblurring. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 2560–2567 https://doi.org/10.1109/CVPR.2009.5206546

  3. Xu L, Zheng S, Jia J (2013) Unnatural L0 sparse representation for natural image deblurring. In: 2013 IEEE conference on computer vision and pattern recognition, Portland, OR, USA, June 23-28, pp. 1107–1114. https://doi.org/10.1109/CVPR.2013.147

  4. Ge X, Tan J, Zhang L (2021) Blind image deblurring using a non-linear channel prior based on dark and bright channels. IEEE Trans Image Process 30:6970–6984. https://doi.org/10.1109/TIP.2021.3101154

    Article  MathSciNet  Google Scholar 

  5. Tao X, Gao H, Shen X, Wang J, Jia J (2018) Scale-recurrent network for deep image deblurring. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, pp. 8174–8182. https://doi.org/10.1109/CVPR.2018.00853

  6. Cho S, Ji, S, Hong J, Jung S, Ko S (2021) Rethinking coarse-to-fine approach in single image deblurring. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp. 4621–4630. https://doi.org/10.1109/ICCV48922.2021.00460

  7. Zamir SW, Arora A, Khan SH, Hayat M, Khan FS, Yang M, Shao L (2021) Multi-stage progressive image restoration. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, Virtual, June 19-25, pp. 14821–14831. https://doi.org/10.1109/WACV48630.2021.00275

  8. Zhang H, Dai Y, Li H, Koniusz P (2019) Deep stacked hierarchical multi-patch network for image deblurring. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, pp. 5978–5986. https://doi.org/10.1109/CVPR.2019.00613

  9. Fu Z, Zheng Y, Ma T, Ye H, Yang J, He L (2022) Edge-aware deep image deblurring. Neurocomputing 502:37–47. https://doi.org/10.1016/j.neucom.2022.06.051

    Article  Google Scholar 

  10. Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2020) Pre-trained image processing transformer. CoRR arXiv:2012.00364

  11. Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: image restoration using swin transformer. In: proceedings of the IEEE/CVF international conference on computer vision, pp. 1833–1844

  12. Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: Introducing convolutions to vision transformers. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, pp. 22–31. https://doi.org/10.1109/ICCV48922.2021.00009

  13. Yuan K, Guo S, Liu Z, Zhou A, Yu F, Wu W (2021) Incorporating convolution designs into visual transformers. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, pp. 559–568. https://doi.org/10.1109/ICCV48922.2021.00062

  14. Li M, Shan L, Li X, Bai Y, Zhou D, Wang W, Lv K, Luo B, Chen S (2020) Global-local attention network for semantic segmentation in aerial images. In: 25th international conference on pattern recognition, ICPR 2020, Virtual Event / Milan, Italy, January 10-15, pp. 5704–5711. https://doi.org/10.1109/ICPR48806.2021.9412089. https://doi.org/10.1109/ICPR48806.2021.9412089

  15. Cheng HK, Chung J, Tai Y, Tang C (2020) Cascadepsp: Toward class-agnostic and very high-resolution segmentation via global and local refinement. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, IEEE, pp. 8887–8896. https://doi.org/10.1109/CVPR42600.2020.00891

  16. Song M, Song W, Yang G, Chen C (2022) Improving RGB-D salient object detection via modality-aware decoder. IEEE Trans Image Process 31:6124–6138. https://doi.org/10.1109/TIP.2022.3205747

    Article  Google Scholar 

  17. Wang G, Chen C, Fan D, Hao A, Qin H (2021) From semantic categories to fixations: A novel weakly-supervised visual-auditory saliency detection approach. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, Virtual, June 19-25, 2021, pp. 15119–15128. Computer Vision Foundation / IEEE, https://doi.org/10.1109/CVPR46437.2021.01487. https://openaccess.thecvf.com/content/CVPR2021/html/Wang_From_Semantic_Categories_to_Fixations_A_Novel_Weakly-Supervised_Visual-Auditory_Saliency_CVPR_2021_paper.html

  18. Ma G, Li S, Chen C, Hao A, Qin H (2021) Rethinking image salient object detection: object-level semantic saliency reranking first, pixelwise saliency refinement later. IEEE Trans Image Process 30:4238–4252. https://doi.org/10.1109/TIP.2021.3068649

    Article  Google Scholar 

  19. Wang Z, Cun X, Bao J, Zhou W, Liu J, Li H (2022) Uformer: A general u-shaped transformer for image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 17683-17693

  20. Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang MH (2022) Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5728-5739

  21. Wang T, Zhang X, Jiang R, Zhao L, Chen H, Luo W (2021) Video deblurring via spatiotemporal pyramid network and adversarial gradient prior. Comput Vis Image Underst 203:103135. https://doi.org/10.1016/j.cviu.2020.103135

    Article  Google Scholar 

  22. Zhang K, Luo W, Zhong Y, Ma L, Liu W, Li H (2019) Adversarial Spatio-temporal learning for video deblurring. IEEE Trans Image Process 28(1):291–301. https://doi.org/10.1109/TIP.2018.2867733

    Article  MathSciNet  Google Scholar 

  23. Li Z, Kovachki NB, Azizzadenesheli K, Liu B, Bhattacharya K, Stuart AM, Anandkumar A (2021) Fourier neural operator for parametric partial differential equations. In: 9th international conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3-7

  24. Guibas J, Mardani M, Li Z, Tao A, Anandkumar A, Catanzaro B (2021) Efficient token mixing for transformers via adaptive fourier neural operators. In: The Tenth International Conference on Learning Representations, (ICLR) 2022, Virtual event, April 25–29, 2022. OpenReview.net

  25. Rao Y, Zhao W, Zhu Z, Lu J, Zhou J (2021) Global filter networks for image classification. In: Ranzato M, Beygelzimer A, Dauphin YN, Liang P, Vaughan JW (eds.) Advances in neural information processing systems 34: annual conference on neural information processing systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 980–993

  26. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International Conference on Learning Representations, (ICLR) 2021, Virtual event, Austria, May 3–7, 2021. OpenReview.net. https://openreview.net/forum?id=YicbFdNTTy

  27. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, pp. 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986

  28. Kupyn O, Martyniuk T, Wu J, Wang Z (2019) Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, pp. 8877–8886. https://doi.org/10.1109/ICCV.2019.00897

  29. Kupyn O, Budzan V, Mykhailych M, Mishkin D, Matas J (2018) Deblurgan: Blind motion deblurring using conditional adversarial networks. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, pp. 8183–8192

  30. Pan J, Dong J, Liu Y, Zhang J, Ren JSJ, Tang J, Tai Y, Yang M (2021) Physics-based generative adversarial models for image restoration and beyond. IEEE Trans Pattern Anal Mach Intell 43(7):2449–2462. https://doi.org/10.1109/TPAMI.2020.2969348

    Article  Google Scholar 

  31. Vahdat A, Kautz J (2020) NVAE: A deep hierarchical variational autoencoder. In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (eds.) Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual

  32. Zhou S, Zhang J, Zuo W, Xie H, Pan J, Ren JS (2019) Davanet: Stereo deblurring with view aggregation. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, pp. 10996–11005. https://doi.org/10.1109/CVPR.2019.01125

  33. Yan Y, Wu Q, Xu B, Zhang J, Ren W (2020) Vdflow: Joint learning for optical flow and video deblurring. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR Workshops 2020, Seattle, WA, USA, June 14-19, pp. 3808–3816. https://doi.org/10.1109/CVPRW50498.2020.00444

  34. Pan J, Ren W, Hu Z, Yang M (2019) Learning to deblur images with exemplars. IEEE Trans Pattern Anal Mach Intell 41(6):1412–1425. https://doi.org/10.1109/TPAMI.2018.2832125

    Article  Google Scholar 

  35. Tsai F, Peng Y, Lin Y, Tsai C, Lin C (2021) Banet: Blur-aware attention networks for dynamic scene deblurring. CoRR arxiv:2101.07518

  36. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds.) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, Long Beach, CA, USA, pp. 5998–6008

  37. Xu W, Xu Y, Chang TA, Tu Z (2021) Co-scale conv-attentional image transformers. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, pp. 9961–9970. https://doi.org/10.1109/ICCV48922.2021.00983

  38. Yang F, Xiao L, Yang J (2020) Video deblurring via 3d CNN and fourier accumulation learning. In: 2020 IEEE international conference on acoustics, speech and signal processing, ICASSP 2020, Barcelona, Spain, May 4-8, pp. 2443–2447. https://doi.org/10.1109/ICASSP40776.2020.9054514

  39. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, III WMW, Frangi AF (eds.) Medical image computing and computer-assisted intervention - miccai 2015 - 18th international conference munich, Germany, October 5 - 9, Proceedings, Part III, pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28

  40. McGillem CD, Cooper GR (1991) Continuous and discrete signal and system analysis. Oxford University Press, Oxford

    MATH  Google Scholar 

  41. Nah S, Kim TH, Lee KM (2017) Deep multi-scale convolutional neural network for dynamic scene deblurring. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, pp. 257–265. https://doi.org/10.1109/CVPR.2017.35

  42. Shen Z, Wang W, Lu X, Shen J, Ling H, Xu T, Shao L (2019) Human-aware motion deblurring. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, pp. 5571–5580. https://doi.org/10.1109/ICCV.2019.00567

  43. Charbonnier P, Blanc-Féraud L, Aubert G, Barlaud M (1994) Two deterministic half-quadratic regularization algorithms for computed imaging. In: Proceedings 1994 international conference on image processing, Austin, Texas, USA, November 13-16, pp. 168–172. https://doi.org/10.1109/ICIP.1994.413553

  44. Chen L, Lu X, Zhang J, Chu X, Chen C (2021) Hinet: Half instance normalization network for image restoration. In: IEEE conference on computer vision and pattern recognition workshops, CVPR Workshops 2021, Virtual, June 19-25, pp. 182–192. https://doi.org/10.1109/CVPRW53098.2021.00027

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by XH. The first draft of the manuscript was written by XH and JH commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to JingSong He.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, X., He, J. Fusing Convolution and Self-Attention Parallel in Frequency Domain for Image Deblurring. Neural Process Lett 55, 9811–9829 (2023). https://doi.org/10.1007/s11063-023-11228-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-023-11228-x

Keywords

Navigation