Skip to main content

Self-supervised Reference-Based Image Super-Resolution with Conditional Diffusion Model

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2025)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15522))

Included in the following conference series:

  • 246 Accesses

Abstract

Reference-based super-resolution (RefSR) has gained attention for its superior performance due to the introduction of high-quality external priors. However, existing RefSR methods all rely on paired images and require manual selection of reference images in practical applications. To address these challenges, this paper proposes a framework for generating high-quality reference images, thereby overcoming the difficulties associated with manual selection in RefSR. We propose a novel Self-Supervised Reference-based Image Super-Resolution method (SSR-SR), which employs a conditional diffusion model and self-supervised learning (SSL) representations to generate reference images with a high degree of semantic similarity to the input image. Since reference images can prioritize perceptual quality over fidelity, we enhance these reference images using a diffusion-based super-resolution approach. The framework also includes a dynamic aggregation module and a contrastive alignments network to ensure precise texture transfer and robust alignment between the low-resolution (LR) input and the high-resolution (HR) reference. Experimental results on multiple benchmarks demonstrate that our proposed SSR-SR achieves competitive results without relying on paired data. This work highlights the potential of diffusion models and SSL representations in advancing the field of image super-resolution.

This work was supported by the National Natural Science Foundation of China under Grant 61906009, the Scientific Research Common Program of Beijing Municipal Commission of Education KM202010005018, the Beijing Municipal Natural Science Foundation (Project No. 4232017), and the International Research Cooperation Seed Fund of Beijing University of Technology (Project No. 2021B06).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bordes, F., Balestriero, R., Vincent, P.: High fidelity visualization of what your self-supervised representation knows about. arXiv preprint arXiv:2112.09164 (2021)

  2. Cao, J., et al.: Reference-based image super-resolution with deformable attention transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 325–342. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_19

  3. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9650–9660, October 2021. https://doi.org/10.1109/iccv48922.2021.00951

  4. Casanova, A., Careil, M., Verbeek, J., Drozdzal, M., Romero Soriano, A.: Instance-conditioned GAN. In: Advances in Neural Information Processing Systems, vol. 34, pp. 27517–27529 (2021)

    Google Scholar 

  5. Chen, C., Xiong, Z., Tian, X., Zha, Z.J., Wu, F.: Camera lens super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1652–1660 (2019)

    Google Scholar 

  6. Dai, T., Cai, J., Zhang, Y., Xia, S.T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)

    Google Scholar 

  7. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794 (2021)

    Google Scholar 

  8. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)

    Article  MATH  Google Scholar 

  9. Goodfellow, I., et al.: Generative adversarial networks, vol. 63, pp. 139–144. ACM, New York, NY, USA (2020)

    Google Scholar 

  10. Gunturk, B.K., Batur, A.U., Altunbasak, Y., Hayes, M.H., Mersereau, R.M.: Eigenface-domain super-resolution for face recognition. IEEE Trans. Image Process. 12(5), 597–606 (2003)

    Article  Google Scholar 

  11. Haris, M., Shakhnarovich, G., Ukita, N.: Task-driven super resolution: object detection in low-resolution images. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. CCIS, vol. 1516, pp. 387–395. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92307-5_45

    Chapter  Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  13. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851 (2020)

    Google Scholar 

  14. Huang, L., Chen, D., Liu, Y., Shen, Y., Zhao, D., Zhou, J.: Composer: creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778 (2023)

  15. Huang, Y., et al.: Task decoupled framework for reference-based super-resolution, pp. 5931–5940 (2022)

    Google Scholar 

  16. Jiang, Y., Chan, K.C., Wang, X., Loy, C.C., Liu, Z.: Robust reference-based super-resolution via C2-matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2103–2112 (2021)

    Google Scholar 

  17. Kawar, B., Elad, M., Ermon, S., Song, J.: Denoising diffusion restoration models. In: Advances in Neural Information Processing Systems, vol. 35, pp. 23593–23606 (2022)

    Google Scholar 

  18. Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)

    Google Scholar 

  19. Kim, J., Lee, J.K., Lee, K.M.: Deeply-recursive convolutional network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637–1645 (2016)

    Google Scholar 

  20. Li, J., Li, D., Xiong, C., Hoi, S.: BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation, pp. 12888–12900 (2022)

    Google Scholar 

  21. Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: image restoration using Swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844 (2021)

    Google Scholar 

  22. Lin, X., et al.: DiffBIR: towards blind image restoration with generative diffusion prior. arXiv preprint arXiv:2308.15070 (2023)

  23. Lu, L., Li, W., Tao, X., Lu, J., Jia, J.: MASA-SR: matching acceleration and spatial adaptation for reference-based image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6368–6377 (2021)

    Google Scholar 

  24. Mou, C., et al.: T2I-adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models 38(5), 4296–4304 (2024)

    Google Scholar 

  25. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with CLIP Latents. arXiv preprint arXiv:2204.061251(2), 3 (2022)

  26. Ramesh, A., et al.: Zero-shot text-to-image generation, pp. 8821–8831 (2021)

    Google Scholar 

  27. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  28. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

    Google Scholar 

  29. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation, pp. 234–241 (2015)

    Google Scholar 

  30. Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4713–4726 (2022)

    MATH  Google Scholar 

  31. Shermeyer, J., Van Etten, A.: The effects of super-resolution on object detection performance in satellite imagery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  32. Shim, G., Park, J., Kweon, I.S.: Robust reference-based super-resolution with similarity-aware deformable convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8425–8434 (2020)

    Google Scholar 

  33. Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images 37(2), 2555–2563 (2023)

    Google Scholar 

  34. Wang, J., Yue, Z., Zhou, S., Chan, K.C., Loy, C.C.: Exploiting diffusion prior for real-world image super-resolution. Int. J. Comput. Vis., 1–21 (2024)

    Google Scholar 

  35. Wang, L., Li, D., Zhu, Y., Tian, L., Shan, Y.: Dual super-resolution learning for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3774–3783 (2020)

    Google Scholar 

  36. Wang, Y., Yu, J., Zhang, J.: Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490 (2022)

  37. Yang, F., Yang, H., Fu, J., Lu, H., Guo, B.: Learning texture transformer network for image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5791–5800 (2020)

    Google Scholar 

  38. Yang, T., Ren, P., Xie, X., Zhang, L.: Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization. arXiv preprint arXiv:2308.14469 (2023)

  39. Yue, H., Sun, X., Yang, J., Wu, F.: Landmark image super-resolution by retrieving web images. IEEE Trans. Image Process. 22(12), 4865–4878 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  40. Zhang, Z., Wang, Z., Lin, Z., Qi, H.: Image super-resolution by neural texture transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7982–7991 (2019)

    Google Scholar 

  41. Zheng, H., Ji, M., Wang, H., Liu, Y., Fang, L.: CrossNet: an end-to-end reference-based super resolution network using cross-scale warping, pp. 88–104 (2018)

    Google Scholar 

  42. Zhou, Y., Li, Z., Guo, C.L., Bai, S., Cheng, M.M., Hou, Q.: SRFormer: permuted self-attention for single image super-resolution, pp. 12780–12791 (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Na Qi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shi, S., Qi, N., Li, Y., Zhu, Q. (2025). Self-supervised Reference-Based Image Super-Resolution with Conditional Diffusion Model. In: Ide, I., et al. MultiMedia Modeling. MMM 2025. Lecture Notes in Computer Science, vol 15522. Springer, Singapore. https://doi.org/10.1007/978-981-96-2064-7_32

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-2064-7_32

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-2063-0

  • Online ISBN: 978-981-96-2064-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics