Abstract
Video super-resolution(videoSR) usually involves several steps: motion estimation, motion compensation, fusion, and upsampling. Here, we propose a novel architecture for video SR. First, in place of motion estimation and compensation, this architecture is based on a specially designed deformable convolution shared-assignment network. The model does not require warp operation and uses a three-layer pyramid deformable convolution network. Second, inspired by the idea of back-projection and Encoder-Decoder structure, we propose a deep recursive fusion network that fuses multi-frame information for the target frame. The fusion network adopts a Decoder-Encoder structure with shared weights to construct the back-projection network, and concatenates the output of each back-projection layer. This design not only reduces the network requirements, but also deepens the network structure so that it can extract deeper image features and achieve fusion. Extensive evaluations and comparisons with previous methods validate the strengths of this approach and demonstrate that the proposed framework is able to significantly outperform the current state of the art.
Similar content being viewed by others
Data availability
The datasets analysed during the current study are available in the Vimeo repository: http://data.csail.mit.edu/tofu/dataset/vimeo_septuplet.
References
Abbass MY, Kwon KC, Alam MS, Piao YL, Lee KY, Kim N (2021) Image super resolution based on residual dense CNN and guided filters. Multimed Tools Appl 80:5403–5421
Ahn N, Kang B, Sohn K (2018) Photo-realistic image super-resolution with fast and lightweight cascading residual network. The European Conference on Computer Vision (ECCV), pp 252–268
Arjovsky M, Chintala S, Bottou L (2018) Wasserstein GAN. arXiv:1701.07875
Berthelot D, Schumm T, Metz L (2017) BEGAN: boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703. 10717
Bin H, Chen WH, Wu XM (2017) High- quality face image super resolution using conditional generative adversarial networks. arXiv preprint arXiv:1707.00737
Caballero J, Ledig C, Aitken A et al (2017) Real-time video super resolution with spatio-temporal networks and motion compensation. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 4778–4787
Chu M, Xie Y, Laura LT (2019) Temporally Coherent GANs for Video Super-Resolution(TecoGAN). arXiv:1811.09393
Dai J, Qi H, Xiong Y et al (2017) Deformable convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp 764–773
Dong C, Loy CC, He K et al (2014) Learning a deep convolutianal network for image super-resolution. In: European Conference on Computer Vision(ECCV), pp 184–199
Fu L, Sun X, Zhao Y, Chen RJ, Chen H, Zhao R (2021) Video super-resolution reconstruction method based on deep Back projection and motion feature fusion. Multimed Tools Appl 80:11423–11441
Haris M, Shakhnarovich G, Ukita N (2019) Recurrent back-projection network for video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 3897–3906
Haris M, Shakhnarovich G, Ukita N (2021) Deep Back-ProjectiNetworks for single image super-resolution. IEEE Trans Pattern Anal Mach Intell 43(12):4323–4337
Hu XC, Mu HY, Zhang X et al (2019) Meta-SR: a magnification-arbitrary network for super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 1575–1584
Isobe T, Li SJ, Jia X et al (2020) Video super-resolution with temporal group attention. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 8005–8014
Isobe T, Jia X, Gu S (2020) Video super-resolution with recurrent structure- detail network. arXiv:2008.00455v1
Jiang K, Wang Z, Yi P, Wang G, Lu T, Jiang J (2019) Edge-enhanced GAN for remote sensing image Superresolution. IEEE Trans Geosci Remote Sens 8(57):5799–5812
Jiang K, Wang Z, Yi P (2020) Hierarchical dense recursive network for image super-resolution. Pattern Recognit 107:107475
Jo Y, Wug S, Kang J et al (2018) Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 3224–3232
Kim J, Lee JK, Lee KM (2016) Deeply-recursive convolutional network for image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 1637–1645
Ledig C, Theis L, Huszar F et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 4681–4690
Li Z, Yang J, Liu Z et al (2019) Feedback network for image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 3867–3876
Li S, He FX, Du B et al (2019) Fast spatio-temporal residual network for video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10522–10533
Li F, Bai HH, Zhao Y (2020) Learning a deep dual attention network for video super-resolution. IEEE Trans Image Process 29:4474–4488
Lim B, Son S, Kim H et al (2017) Enhanced deep residual networks for single image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 136–144
Maalouf A, Larabi M (2012) Colour image super-resolution using geometric grouplets. IET Image Process 6(2):168–180
Mehdi SM, Vemulapalli R, Brown M (2018) Frame-recurrent video super- resolution. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 6626–6634
Min L, Yang P, Xu B et al (2019) Multi-image blind super-resolution in variational Bayesian framework. Opto-Electronic Engineering
Shi W, Caballero J, Huszar F et al (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 1874–1883
Sun W, Zhang YN (2020) Attention-guided dual spatial-temporal non-local network for video super-resolution. Neurocomputing 406:24–33
Sun C, Lu J et al (2017) Method of rapid image super-resolution based on deconvolution. Acta Optica Sinica 37(12):1210004
Tai Y, Yang J, Liu X (2017) Image super-resolution via deep recursive residual network. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 3147–3155
Tai Y, Yang J, Liu X (2017) Memnet: a persistent memory network for image restoration. In: IEEE International Conference on Computer Vision (ICCV), pp 4549–4557
Tian Y, Zhang Y, Fu Y, Xu C (2020) TDAN: Temporally-deformable alignment network for video super-resolution. 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3357–3366
Wang XT, Yu K, Dong C et al (2018) Recovering realistic texture in image super-resolution by deep spatial feature transform. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 606–615
Wang XT, Yu K, Wu SX et al (2018) ESRGAN: enhanced super-resolution generative adversarial networks. The European Conference on Computer Vision (ECCV), pp 1–16
Wang X, Chan KCK, Yu K et al (2019) EDVR: video restoration with enhanced deformable convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 1954–1963
Wang L, Guo Y, Liu L, Lin Z, Deng X, An W (2020) Deep video super-resolution using HR optical flow estimation. IEEE Trans Image Process 29:4323–4336
Wang S, Zhou T, Lu Y, Di H (2022) Detail-preserving transformer for light field image super-resolution. In: Association for the Advance of Artificial Intelligence (AAAI)
Wang S, Zhou T, Lu Y, Di H (2022) Contextual transformation network for lightweight remote-sensing image super-resolution. IEEE Trans Geosci Remote Sens 60:1–13
Yi P, Wang ZY, Jiang K et al (2019) Progressive fusion video superresolution network via exploiting non-local spatio-temporal correlations. In: IEEE International Conference on Computer Vision (ICCV), pp 3106–3115
Yi P, Wang Z, Jiang K et al (2020) A progressive fusion generative adversarial network for realistic and consistent video super-resolution. IEEE Trans Pattern Anal Mach Intell 5(44):2264–2280
Yi P, Wang Z, Jiang K, Shao Z, Ma J (2020) Multi- temporal ultra dense memory network for video super-resolution. IEEE Trans Circuits Syst Video Technol 8(30):2503–2516
Yoon Y, Jeon H, Yoo D et al (2015) Learning a deep convolutional network for light-field image super-resolution. In: IEEE International Conference on Computer Vision Workshop, vol 17, pp 57–65
Zhang YL, Tian YP, Kong Y et al (2018) Residual dense network for image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 2472–2481
Zhang S, Lin Y, Sheng H (2019) Residual networks for light field image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 11046–11055
Zhou T, Li J, Wang S, Tao R, Shen J (2020) MATNet: motion-attentive transition network for zero-shot video object segmentation. IEEE Trans Image Process 29:8326–8338
Zhou T, Wang W, Liu S et al (2021) Differentiable multi-granularity human representation learning for instance-aware human semantic parsing. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1622–1631
Zhou T, Li J, Li X, Shao L (2021) Target-aware object discovery and association for unsupervised video multi-object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6981–6990
Funding
This study was funded the National Natural Science Foundation Youth Fund (61601404), General Scientific Research Projects of Zhejiang Education Department (Y201840087), and the Opening Foundation of State Key Laboratory of Cognitive Intelligence, iFLYTEK (CIOS-2022SC06).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares he has no confict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mu, S., Zhang, Y. & Jiang, Y. DRN-VideoSR: a deep recursive network for video super-resolution based on a deformable convolution shared-assignment network. Multimed Tools Appl 82, 14019–14035 (2023). https://doi.org/10.1007/s11042-022-13818-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13818-8