Abstract
Deblurring is a challenging problem in image restoration. It’s important to use both local details and global information of the image for deblurring. Therefore, this paper proposes a deblurring model that integrates Convolution and Transformer in a parallel manner. Unlike existing methods that use single operators or serial combinations, the convolution operation and self-attention mechanism in this model are learned in parallel and extract features separately, and then the features are fused in the frequency domain. The convolution operation is beneficial in extracting local information while the self-attention mechanism focuses more on global information. The parallel structure enables the model to capture both local and global information simultaneously. Additionally, the frequency domain fusion module is proposed based on the analysis of the mathematical model of image blur, and the results indicate that the proposed model is reasonable. Experiments on multiple deblurring datasets verify that the proposed parallel structure of Convolution-Transformer and frequency domain fusion method are effective.
Similar content being viewed by others
Data Availibility
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Li J, Tan W, Yan B (2021) Perceptual variousness motion deblurring with light global context refinement. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp. 4096–4105 https://doi.org/10.1109/ICCV48922.2021.00408
Agrawal A, Raskar R (2009) Optimal single image capture for motion deblurring. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 2560–2567 https://doi.org/10.1109/CVPR.2009.5206546
Xu L, Zheng S, Jia J (2013) Unnatural L0 sparse representation for natural image deblurring. In: 2013 IEEE conference on computer vision and pattern recognition, Portland, OR, USA, June 23-28, pp. 1107–1114. https://doi.org/10.1109/CVPR.2013.147
Ge X, Tan J, Zhang L (2021) Blind image deblurring using a non-linear channel prior based on dark and bright channels. IEEE Trans Image Process 30:6970–6984. https://doi.org/10.1109/TIP.2021.3101154
Tao X, Gao H, Shen X, Wang J, Jia J (2018) Scale-recurrent network for deep image deblurring. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, pp. 8174–8182. https://doi.org/10.1109/CVPR.2018.00853
Cho S, Ji, S, Hong J, Jung S, Ko S (2021) Rethinking coarse-to-fine approach in single image deblurring. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp. 4621–4630. https://doi.org/10.1109/ICCV48922.2021.00460
Zamir SW, Arora A, Khan SH, Hayat M, Khan FS, Yang M, Shao L (2021) Multi-stage progressive image restoration. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, Virtual, June 19-25, pp. 14821–14831. https://doi.org/10.1109/WACV48630.2021.00275
Zhang H, Dai Y, Li H, Koniusz P (2019) Deep stacked hierarchical multi-patch network for image deblurring. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, pp. 5978–5986. https://doi.org/10.1109/CVPR.2019.00613
Fu Z, Zheng Y, Ma T, Ye H, Yang J, He L (2022) Edge-aware deep image deblurring. Neurocomputing 502:37–47. https://doi.org/10.1016/j.neucom.2022.06.051
Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2020) Pre-trained image processing transformer. CoRR arXiv:2012.00364
Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: image restoration using swin transformer. In: proceedings of the IEEE/CVF international conference on computer vision, pp. 1833–1844
Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: Introducing convolutions to vision transformers. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, pp. 22–31. https://doi.org/10.1109/ICCV48922.2021.00009
Yuan K, Guo S, Liu Z, Zhou A, Yu F, Wu W (2021) Incorporating convolution designs into visual transformers. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, pp. 559–568. https://doi.org/10.1109/ICCV48922.2021.00062
Li M, Shan L, Li X, Bai Y, Zhou D, Wang W, Lv K, Luo B, Chen S (2020) Global-local attention network for semantic segmentation in aerial images. In: 25th international conference on pattern recognition, ICPR 2020, Virtual Event / Milan, Italy, January 10-15, pp. 5704–5711. https://doi.org/10.1109/ICPR48806.2021.9412089. https://doi.org/10.1109/ICPR48806.2021.9412089
Cheng HK, Chung J, Tai Y, Tang C (2020) Cascadepsp: Toward class-agnostic and very high-resolution segmentation via global and local refinement. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, IEEE, pp. 8887–8896. https://doi.org/10.1109/CVPR42600.2020.00891
Song M, Song W, Yang G, Chen C (2022) Improving RGB-D salient object detection via modality-aware decoder. IEEE Trans Image Process 31:6124–6138. https://doi.org/10.1109/TIP.2022.3205747
Wang G, Chen C, Fan D, Hao A, Qin H (2021) From semantic categories to fixations: A novel weakly-supervised visual-auditory saliency detection approach. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, Virtual, June 19-25, 2021, pp. 15119–15128. Computer Vision Foundation / IEEE, https://doi.org/10.1109/CVPR46437.2021.01487. https://openaccess.thecvf.com/content/CVPR2021/html/Wang_From_Semantic_Categories_to_Fixations_A_Novel_Weakly-Supervised_Visual-Auditory_Saliency_CVPR_2021_paper.html
Ma G, Li S, Chen C, Hao A, Qin H (2021) Rethinking image salient object detection: object-level semantic saliency reranking first, pixelwise saliency refinement later. IEEE Trans Image Process 30:4238–4252. https://doi.org/10.1109/TIP.2021.3068649
Wang Z, Cun X, Bao J, Zhou W, Liu J, Li H (2022) Uformer: A general u-shaped transformer for image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 17683-17693
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang MH (2022) Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5728-5739
Wang T, Zhang X, Jiang R, Zhao L, Chen H, Luo W (2021) Video deblurring via spatiotemporal pyramid network and adversarial gradient prior. Comput Vis Image Underst 203:103135. https://doi.org/10.1016/j.cviu.2020.103135
Zhang K, Luo W, Zhong Y, Ma L, Liu W, Li H (2019) Adversarial Spatio-temporal learning for video deblurring. IEEE Trans Image Process 28(1):291–301. https://doi.org/10.1109/TIP.2018.2867733
Li Z, Kovachki NB, Azizzadenesheli K, Liu B, Bhattacharya K, Stuart AM, Anandkumar A (2021) Fourier neural operator for parametric partial differential equations. In: 9th international conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3-7
Guibas J, Mardani M, Li Z, Tao A, Anandkumar A, Catanzaro B (2021) Efficient token mixing for transformers via adaptive fourier neural operators. In: The Tenth International Conference on Learning Representations, (ICLR) 2022, Virtual event, April 25–29, 2022. OpenReview.net
Rao Y, Zhao W, Zhu Z, Lu J, Zhou J (2021) Global filter networks for image classification. In: Ranzato M, Beygelzimer A, Dauphin YN, Liang P, Vaughan JW (eds.) Advances in neural information processing systems 34: annual conference on neural information processing systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 980–993
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International Conference on Learning Representations, (ICLR) 2021, Virtual event, Austria, May 3–7, 2021. OpenReview.net. https://openreview.net/forum?id=YicbFdNTTy
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, pp. 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986
Kupyn O, Martyniuk T, Wu J, Wang Z (2019) Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, pp. 8877–8886. https://doi.org/10.1109/ICCV.2019.00897
Kupyn O, Budzan V, Mykhailych M, Mishkin D, Matas J (2018) Deblurgan: Blind motion deblurring using conditional adversarial networks. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, pp. 8183–8192
Pan J, Dong J, Liu Y, Zhang J, Ren JSJ, Tang J, Tai Y, Yang M (2021) Physics-based generative adversarial models for image restoration and beyond. IEEE Trans Pattern Anal Mach Intell 43(7):2449–2462. https://doi.org/10.1109/TPAMI.2020.2969348
Vahdat A, Kautz J (2020) NVAE: A deep hierarchical variational autoencoder. In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (eds.) Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual
Zhou S, Zhang J, Zuo W, Xie H, Pan J, Ren JS (2019) Davanet: Stereo deblurring with view aggregation. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, pp. 10996–11005. https://doi.org/10.1109/CVPR.2019.01125
Yan Y, Wu Q, Xu B, Zhang J, Ren W (2020) Vdflow: Joint learning for optical flow and video deblurring. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR Workshops 2020, Seattle, WA, USA, June 14-19, pp. 3808–3816. https://doi.org/10.1109/CVPRW50498.2020.00444
Pan J, Ren W, Hu Z, Yang M (2019) Learning to deblur images with exemplars. IEEE Trans Pattern Anal Mach Intell 41(6):1412–1425. https://doi.org/10.1109/TPAMI.2018.2832125
Tsai F, Peng Y, Lin Y, Tsai C, Lin C (2021) Banet: Blur-aware attention networks for dynamic scene deblurring. CoRR arxiv:2101.07518
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds.) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, Long Beach, CA, USA, pp. 5998–6008
Xu W, Xu Y, Chang TA, Tu Z (2021) Co-scale conv-attentional image transformers. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, pp. 9961–9970. https://doi.org/10.1109/ICCV48922.2021.00983
Yang F, Xiao L, Yang J (2020) Video deblurring via 3d CNN and fourier accumulation learning. In: 2020 IEEE international conference on acoustics, speech and signal processing, ICASSP 2020, Barcelona, Spain, May 4-8, pp. 2443–2447. https://doi.org/10.1109/ICASSP40776.2020.9054514
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, III WMW, Frangi AF (eds.) Medical image computing and computer-assisted intervention - miccai 2015 - 18th international conference munich, Germany, October 5 - 9, Proceedings, Part III, pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
McGillem CD, Cooper GR (1991) Continuous and discrete signal and system analysis. Oxford University Press, Oxford
Nah S, Kim TH, Lee KM (2017) Deep multi-scale convolutional neural network for dynamic scene deblurring. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, pp. 257–265. https://doi.org/10.1109/CVPR.2017.35
Shen Z, Wang W, Lu X, Shen J, Ling H, Xu T, Shao L (2019) Human-aware motion deblurring. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, pp. 5571–5580. https://doi.org/10.1109/ICCV.2019.00567
Charbonnier P, Blanc-Féraud L, Aubert G, Barlaud M (1994) Two deterministic half-quadratic regularization algorithms for computed imaging. In: Proceedings 1994 international conference on image processing, Austin, Texas, USA, November 13-16, pp. 168–172. https://doi.org/10.1109/ICIP.1994.413553
Chen L, Lu X, Zhang J, Chu X, Chen C (2021) Hinet: Half instance normalization network for image restoration. In: IEEE conference on computer vision and pattern recognition workshops, CVPR Workshops 2021, Virtual, June 19-25, pp. 182–192. https://doi.org/10.1109/CVPRW53098.2021.00027
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by XH. The first draft of the manuscript was written by XH and JH commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, X., He, J. Fusing Convolution and Self-Attention Parallel in Frequency Domain for Image Deblurring. Neural Process Lett 55, 9811–9829 (2023). https://doi.org/10.1007/s11063-023-11228-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11228-x