Reference guided image super-resolution via efficient dense warping and adaptive fusion
Graphical abstract
Introduction
Image Super-resolution (SR) aims to super-resolve a low-resolution (LR) input image to a high-resolution (HR) version. With the wide application of convolutional neural networks (CNNs), the single image based SR (SISR) methods are greatly improved [1], [2], [3], [4], [5], [6], [7], [8], [9]. To improve the performance, networks of SISR methods are becoming more complex. [5]. However, there is a big challenge to further improve the performance especially the visual performance due to the limitation of available information. State-of-the-art SISR methods obtain high PSNR values but results are smooth [10]. In this situation, the reference guided SR (RefSR), which can boost the SR performance by stealing high frequency (HF) details from the references, are emerging [11], [12], [13], [14], [15], [16], [17]. With the rapid development of internet and digital storage, it is more likely to find similar high-quality images from photo albums and internet. Therefore, RefSR becomes more feasible to super-solve the LR images, whose resolution is low due to lighting conditions, movement or less advanced camera hardware.
Although the reference images and LR input have textural (or structural) similarities, they also have misalignments since the reference images are usually captured with different viewpoints, scales, and illuminations. Therefore, how to align the reference images with the LR input and how to efficiently take advantage of the warped reference images are two main challenges in RefSR. A common strategy for reference warping is patch matching in pixel level [11], [13] or feature level [14]. However, these methods have high computing complexity since the exhaustive patch searching is time consuming. In addition, the straightforward patch matching cannot handle the non-rigid image deformation well. Another strategy is dense flow based warping [12], [15], [16], which is much faster than patch matching. Moreover, they can handle non-rigid deformation due to the access to pixel-wise dense flow. For example, the work in [16] obtains the dense flow via a revised spatial transformer network [18] to super-solve LR faces. Zheng et al. [12] utilize FlowNetS [19] to estimate the dense flow. However, they can only handle small misalignments.
Inspired by the efficiency of dense flow matching, we combine and improve the existing optical flow algorithm, so that it can matching the same target with large displacement, light changes and perspective changes. In this paper, we propose a coarse-to-fine image warping method. The coarse warping is realized by an image level homography warping, and the fine warping is realized based on the optical flow method named CPM [20], which builds a feature pyramid for layer-by-layer matching. Note that the homography transform is necessary even if we utilize CPM for the dense warping, since CPM can only handle misalignments in a restricted region and CPM cannot work well if there is scale difference between the reference and LR input.
After warping, we adopt the encoder–decoder network structure to super-resolve the LR image with the warped references. Since the references have different similarities with the LR input, we further propose a feature fusion strategy to fuse the reference features together. Quantitative and qualitative experiments on three benchmark datasets demonstrate the superiority over state-of-the-art single SR and RefSR methods. Our main contributions are summarized as follows:
- •
To handle large displacement, we propose a coarse-to-fine dense warping strategy. We utilize global homography transform for the coarse warping and improve the fast optical flow method CPM to make it suitable for the fine warping of reference images. Compared with using only coarse or fine warping, the proposed method can deal with both large and small scale misalignments well, outperforming both the traditional patch matching method [13] and CNN based dense warping [12].
- •
Since different reference images have different similarities with the LR input, we propose an adaptive feature fusion method to fuse the complementary information of different references according to the cosine similarity between the LR features and the corresponding reference features. Compared with the cosine similarity based patch matching used in SRNTT, the proposed method can effectively select the most useful reference pixel for each position.
- •
Our method achieves the best SR performance compared with six state-of-the-art SR methods. In addition, our method is about seven and five times faster than state-of-the-art general RefSR methods IENet [13] and SRNTT [14], respectively.
In the following, we introduce related work in Section 2 and our method in detail in Section 3. We then compare our method with state-of-the-art SR methods quantitatively and qualitatively in Section 4. In Section 5, we conclude our method.
Section snippets
Related work
In this section, we give an overview of related work on single image super-resolution, reference image guided super-resolution and optical flow.
The proposed method
As shown in Fig. 1, our method consists two modules: reference image warping and warped reference guided SR. The warping module includes image-level global warping and pixel-level dense warping. The SR module is realized by the encoder–decoder network structure and we integrate an adaptive fusion module to improve the SR performance. In the following, we give details for these modules.
Dataset and training details
Our training set contains 100 groups of landmark images, among which 84 groups of landmark images are from Oxford building Dataset [52] and Google Images, and 16 groups are captured by ourselves. We construct training data from the above three sources so as to contain different similarity levels and different scenes. For each group of the images, one is the target image and the other three are reference images, which depict the same object but are captured under different viewpoints, scales,
Conclusion
This paper proposes a novel multi-reference based SR method. We first propose a coarse-to-fine warping method to warp the references with the LR input to make the warped reference be similar with the LR input in pixel level. Hereafter, we utilize an encoder–decoder network to super-resolve the LR input with the guidance of warped references. Since the reference images have different similarities with the LR input, we propose an effective max fusion method based on the cosine similarity between
CRediT authorship contribution statement
Huanjing Yue: Conceptualization, Methodology, Resources, Writing - review & editing, Funding acquisition. Tong Zhou: Conceptualization, Methodology,Software, Data curation, Writing - original draft. Zhongyu Jiang: Resources, Writing - Review & Editing. Jingyu Yang: Resources, Project administration, Funding acquisition. Chunping Hou: Resources, Investigation, Validation, Supervision.
Declaration of Competing Interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: This work was supported in part by the National Natural Science Foundation of China under Grant 61672378, Grant 62072331, and Grant 61771339.
References (54)
- et al.
A deep learning method for image super-resolution based on geometric similarity
Signal Process., Image Commun.
(2019) - et al.
Single image super resolution using local smoothness and nonlocal self-similarity priors
Signal Process. Image Commun.
(2016) - et al.
An edge-guided image interpolation algorithm via directional filtering and data fusion
IEEE Trans. Image Process.
(2006) - et al.
Image super-resolution using deep convolutional networks
IEEE Trans. Pattern Anal. Mach. Intell.
(2015) - J. Kim, J. Kwon Lee, K. Mu Lee, Accurate image super-resolution using very deep convolutional networks, in: Proceedings...
- W.-S. Lai, J.-B. Huang, N. Ahuja, M.-H. Yang, Deep laplacian pyramid networks for fast and accurate super-resolution,...
- B. Lim, S. Son, H. Kim, S. Nah, K.M. Lee, Enhanced deep residual networks for single image super-resolution, in: 2017...
- Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, Y. Fu, Image super-resolution using very deep residual channel attention...
- C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al....
- M.S. Sajjadi, B. Scholkopf, M. Hirsch, Enhancenet: Single image super-resolution through automated texture synthesis,...
Landmark image super-resolution by retrieving web images
IEEE Trans. Image Process.
IENet: Internal and external patch matching convnet for web image guided denoising
IEEE Trans. Circuits Syst. Video Technol.
Reference guided deep super-resolution via manifold localized external compensation
IEEE Trans. Circuits Syst. Video Technol.
Spatial transformer networks
Coarse-to-fine patchMatch for dense correspondence
IEEE Trans. Circuits Syst. Video Technol.
Accelerating the super-resolution convolutional neural network
Total variation super resolution using a variational approach
Cited by (5)
Channel rearrangement multi-branch network for image super-resolution
2022, Digital Signal Processing: A Review JournalCitation Excerpt :So SISR is an ill-posed problem since one LR image may have multiple HR solutions. In order to recover the HR image from the LR image, SISR methods are mainly divided into three categories: interpolation-based methods [1–3], reconstruction-based methods [4–9] and learning-based methods [10–33]. Interpolation-based methods assume that the gray value of the image changes continuously.
Enhanced image prior for unsupervised remoting sensing super-resolution
2021, Neural NetworksCitation Excerpt :The rationale of these algorithms can be summarized as follows: a deep learning model is trained to learn the mapping between the corrupted LR images (degraded from their original versions) and corresponding HR ones via a convolutional neural network (CNN). Based on these methods, reference-based image super-resolution (RefSR) methods, another research hotspot in recent years, introduce additional information from a reference image, such as Dong, Zhang, and Fu (2021), Yue, Zhou, Jiang, Yang, and Hou (2021). The superior performance of supervised learning substantially depends on the pixel-by-pixel supervision of the ground truth.
Learning Discrete Representations From Reference Images for Large Scale Factor Image Super-Resolution
2022, IEEE Transactions on Image Processing