Loading [a11y]/accessibility-menu.js
Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution | IEEE Journals & Magazine | IEEE Xplore

Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution


Abstract:

Recently, diffusion models have achieved remarkable success in blind image super-resolution. However, most existing methods rely solely on uni-modal degraded low-resoluti...Show More

Abstract:

Recently, diffusion models have achieved remarkable success in blind image super-resolution. However, most existing methods rely solely on uni-modal degraded low-resolution images to guide diffusion models for restoring high-fidelity images, resulting in inferior realism. In this letter, we propose a Multi-modal Prior-Guided diffusion model for blind image Super-Resolution (MPGSR), which fine-tunes Stable Diffusion (SD) by utilizing the superior visual-and-textual guidance for restoring realistic high-resolution images. Specifically, our MPGSR involves two stages, i.e., multi-modal guidance extraction and adaptive guidance injection. For the former, we propose a composited transformer and further incorporate it with GPT-CLIP to extract the representative visual-and-textual guidance. For the latter, we design a feature calibration ControlNet to inject the visual guidance and employ the cross-attention layer provided by the frozen SD to inject the textual guidance, thus effectively activating the powerful text-to-image generation potential. Extensive experiments show that our MPGSR outperforms state-of-the-art methods in restoration quality and convergence time.
Published in: IEEE Signal Processing Letters ( Volume: 32)
Page(s): 316 - 320
Date of Publication: 12 December 2024

ISSN Information:

Funding Agency:


References

References is not available for this document.