Reference guided image super-resolution via efficient dense warping and adaptive fusion

https://doi.org/10.1016/j.image.2020.116062Get rights and content

Highlights

  • We propose a coarse-to-fine dense warping strategy.

  • We propose an adaptive feature fusion method to fuse different references.

  • Our method achieves the best SR performance compared with six state-of-the-art SR methods.

Abstract

Due to the limited improvement of single-image based super-resolution (SR) methods in recent years, the reference based image SR (RefSR) methods, which super-resolve the low-resolution (LR) input with the guidance of similar high-resolution (HR) reference images are emerging. There are two main challenges in RefSR, i.e. reference image warping and exploring the guidance information from the warped references. For reference warping, we propose an efficient dense warping method to deal with large displacements, which is much faster than traditional patch (or texture) matching strategy. For the SR process, since different reference images complement each other, and have different similarities with the LR image, we further propose a similarity based feature fusion strategy to take advantage of the most similar reference regions. The SR process is realized by an encoder–decoder network and trained with pixel-level reconstruction loss, degradation loss and feature-level perceptual loss. Extensive experiments on three benchmark datasets demonstrate that the proposed method outperforms state-of-the-art SR methods in both subjective and objective measurements.

Introduction

Image Super-resolution (SR) aims to super-resolve a low-resolution (LR) input image to a high-resolution (HR) version. With the wide application of convolutional neural networks (CNNs), the single image based SR (SISR) methods are greatly improved [1], [2], [3], [4], [5], [6], [7], [8], [9]. To improve the performance, networks of SISR methods are becoming more complex. [5]. However, there is a big challenge to further improve the performance especially the visual performance due to the limitation of available information. State-of-the-art SISR methods obtain high PSNR values but results are smooth [10]. In this situation, the reference guided SR (RefSR), which can boost the SR performance by stealing high frequency (HF) details from the references, are emerging [11], [12], [13], [14], [15], [16], [17]. With the rapid development of internet and digital storage, it is more likely to find similar high-quality images from photo albums and internet. Therefore, RefSR becomes more feasible to super-solve the LR images, whose resolution is low due to lighting conditions, movement or less advanced camera hardware.

Although the reference images and LR input have textural (or structural) similarities, they also have misalignments since the reference images are usually captured with different viewpoints, scales, and illuminations. Therefore, how to align the reference images with the LR input and how to efficiently take advantage of the warped reference images are two main challenges in RefSR. A common strategy for reference warping is patch matching in pixel level [11], [13] or feature level [14]. However, these methods have high computing complexity since the exhaustive patch searching is time consuming. In addition, the straightforward patch matching cannot handle the non-rigid image deformation well. Another strategy is dense flow based warping [12], [15], [16], which is much faster than patch matching. Moreover, they can handle non-rigid deformation due to the access to pixel-wise dense flow. For example, the work in [16] obtains the dense flow via a revised spatial transformer network [18] to super-solve LR faces. Zheng et al. [12] utilize FlowNetS [19] to estimate the dense flow. However, they can only handle small misalignments.

Inspired by the efficiency of dense flow matching, we combine and improve the existing optical flow algorithm, so that it can matching the same target with large displacement, light changes and perspective changes. In this paper, we propose a coarse-to-fine image warping method. The coarse warping is realized by an image level homography warping, and the fine warping is realized based on the optical flow method named CPM [20], which builds a feature pyramid for layer-by-layer matching. Note that the homography transform is necessary even if we utilize CPM for the dense warping, since CPM can only handle misalignments in a restricted region and CPM cannot work well if there is scale difference between the reference and LR input.

After warping, we adopt the encoder–decoder network structure to super-resolve the LR image with the warped references. Since the references have different similarities with the LR input, we further propose a feature fusion strategy to fuse the reference features together. Quantitative and qualitative experiments on three benchmark datasets demonstrate the superiority over state-of-the-art single SR and RefSR methods. Our main contributions are summarized as follows:

  • To handle large displacement, we propose a coarse-to-fine dense warping strategy. We utilize global homography transform for the coarse warping and improve the fast optical flow method CPM to make it suitable for the fine warping of reference images. Compared with using only coarse or fine warping, the proposed method can deal with both large and small scale misalignments well, outperforming both the traditional patch matching method [13] and CNN based dense warping [12].

  • Since different reference images have different similarities with the LR input, we propose an adaptive feature fusion method to fuse the complementary information of different references according to the cosine similarity between the LR features and the corresponding reference features. Compared with the cosine similarity based patch matching used in SRNTT, the proposed method can effectively select the most useful reference pixel for each position.

  • Our method achieves the best SR performance compared with six state-of-the-art SR methods. In addition, our method is about seven and five times faster than state-of-the-art general RefSR methods IENet [13] and SRNTT [14], respectively.

In the following, we introduce related work in Section 2 and our method in detail in Section 3. We then compare our method with state-of-the-art SR methods quantitatively and qualitatively in Section 4. In Section 5, we conclude our method.

Section snippets

Related work

In this section, we give an overview of related work on single image super-resolution, reference image guided super-resolution and optical flow.

The proposed method

As shown in Fig. 1, our method consists two modules: reference image warping and warped reference guided SR. The warping module includes image-level global warping and pixel-level dense warping. The SR module is realized by the encoder–decoder network structure and we integrate an adaptive fusion module to improve the SR performance. In the following, we give details for these modules.

Dataset and training details

Our training set contains 100 groups of landmark images, among which 84 groups of landmark images are from Oxford building Dataset [52] and Google Images, and 16 groups are captured by ourselves. We construct training data from the above three sources so as to contain different similarity levels and different scenes. For each group of the images, one is the target image and the other three are reference images, which depict the same object but are captured under different viewpoints, scales,

Conclusion

This paper proposes a novel multi-reference based SR method. We first propose a coarse-to-fine warping method to warp the references with the LR input to make the warped reference be similar with the LR input in pixel level. Hereafter, we utilize an encoder–decoder network to super-resolve the LR input with the guidance of warped references. Since the reference images have different similarities with the LR input, we propose an effective max fusion method based on the cosine similarity between

CRediT authorship contribution statement

Huanjing Yue: Conceptualization, Methodology, Resources, Writing - review & editing, Funding acquisition. Tong Zhou: Conceptualization, Methodology,Software, Data curation, Writing - original draft. Zhongyu Jiang: Resources, Writing - Review & Editing. Jingyu Yang: Resources, Project administration, Funding acquisition. Chunping Hou: Resources, Investigation, Validation, Supervision.

Declaration of Competing Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: This work was supported in part by the National Natural Science Foundation of China under Grant 61672378, Grant 62072331, and Grant 61771339.

References (54)

  • LuJ. et al.

    A deep learning method for image super-resolution based on geometric similarity

    Signal Process., Image Commun.

    (2019)
  • ChenH. et al.

    Single image super resolution using local smoothness and nonlocal self-similarity priors

    Signal Process. Image Commun.

    (2016)
  • ZhangL. et al.

    An edge-guided image interpolation algorithm via directional filtering and data fusion

    IEEE Trans. Image Process.

    (2006)
  • DongC. et al.

    Image super-resolution using deep convolutional networks

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • J. Kim, J. Kwon Lee, K. Mu Lee, Accurate image super-resolution using very deep convolutional networks, in: Proceedings...
  • W.-S. Lai, J.-B. Huang, N. Ahuja, M.-H. Yang, Deep laplacian pyramid networks for fast and accurate super-resolution,...
  • B. Lim, S. Son, H. Kim, S. Nah, K.M. Lee, Enhanced deep residual networks for single image super-resolution, in: 2017...
  • Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, Y. Fu, Image super-resolution using very deep residual channel attention...
  • C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al....
  • M.S. Sajjadi, B. Scholkopf, M. Hirsch, Enhancenet: Single image super-resolution through automated texture synthesis,...
  • X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, C. Change Loy, Esrgan: Enhanced super-resolution generative...
  • M. Haris, G. Shakhnarovich, N. Ukita, Deep back-projection networks for super-resolution, in: Proceedings of the IEEE...
  • YueH. et al.

    Landmark image super-resolution by retrieving web images

    IEEE Trans. Image Process.

    (2013)
  • H. Zheng, M. Ji, H. Wang, Y. Liu, L. Fang, Crossnet: An end-to-end reference-based super resolution network using...
  • YueH. et al.

    IENet: Internal and external patch matching convnet for web image guided denoising

    IEEE Trans. Circuits Syst. Video Technol.

    (2019)
  • Z. Zhang, Z. Wang, Z. Lin, H. Qi, Image super-resolution by neural texture transfer, in: Proceedings of the IEEE...
  • X. Li, M. Liu, Y. Ye, W. Zuo, L. Lin, R. Yang, Learning warped guidance for blind face restoration, in: Proceedings of...
  • B. Dogan, S. Gu, R. Timofte, Exemplar guided face image super-resolution without facial landmarks, in: Proceedings of...
  • YangW. et al.

    Reference guided deep super-resolution via manifold localized external compensation

    IEEE Trans. Circuits Syst. Video Technol.

    (2018)
  • JaderbergM. et al.

    Spatial transformer networks

  • A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, T. Brox, Flownet:...
  • LiY. et al.

    Coarse-to-fine patchMatch for dense correspondence

    IEEE Trans. Circuits Syst. Video Technol.

    (2017)
  • DongC. et al.

    Accelerating the super-resolution convolutional neural network

  • T. Tong, G. Li, X. Liu, Q. Gao, Image super-resolution using dense skip connections, in: 2017 IEEE International...
  • G. Huang, Z. Liu, V.D.M. Laurens, K.Q. Weinberger, Densely connected convolutional networks, in: Proceedings of the...
  • Z. Li, J. Yang, Z. Liu, X. Yang, G. Jeon, W. Wu, Feedback network for image super-resolution, in: Proceedings of the...
  • BabacanS.D. et al.

    Total variation super resolution using a variational approach

  • Cited by (5)

    • Channel rearrangement multi-branch network for image super-resolution

      2022, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      So SISR is an ill-posed problem since one LR image may have multiple HR solutions. In order to recover the HR image from the LR image, SISR methods are mainly divided into three categories: interpolation-based methods [1–3], reconstruction-based methods [4–9] and learning-based methods [10–33]. Interpolation-based methods assume that the gray value of the image changes continuously.

    • Enhanced image prior for unsupervised remoting sensing super-resolution

      2021, Neural Networks
      Citation Excerpt :

      The rationale of these algorithms can be summarized as follows: a deep learning model is trained to learn the mapping between the corrupted LR images (degraded from their original versions) and corresponding HR ones via a convolutional neural network (CNN). Based on these methods, reference-based image super-resolution (RefSR) methods, another research hotspot in recent years, introduce additional information from a reference image, such as Dong, Zhang, and Fu (2021), Yue, Zhou, Jiang, Yang, and Hou (2021). The superior performance of supervised learning substantially depends on the pixel-by-pixel supervision of the ground truth.

    View full text