Skip to main content
Log in

FDDCC-VSR: a lightweight video super-resolution network based on deformable 3D convolution and cheap convolution

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Currently, the mainstream deep video super-resolution (VSR) models typically employ deeper neural network layers or larger receptive fields. This approach increases computational requirements, making network training difficult and inefficient. Therefore, this paper proposes a VSR model called fusion of deformable 3D convolution and cheap convolution (FDDCC-VSR).In FDDCC-VSR, we first divide the detailed features of each frame in VSR into dynamic features of visual moving objects and details of static backgrounds. This division allows for the use of fewer specialized convolutions in feature extraction, resulting in a lightweight network that is easier to train. Furthermore, FDDCC-VSR incorporates multiple D-C CRBs (Convolutional Residual Blocks), which establish a lightweight spatial attention mechanism to aid deformable 3D convolution. This enables the model to focus on learning the corresponding feature details. Finally, we employ an improved bicubic interpolation combined with subpixel techniques to enhance the PSNR (Peak Signal-to-Noise Ratio) value of the original image. Detailed experiments demonstrate that FDDCC-VSR outperforms the most advanced algorithms in terms of both subjective visual effects and objective evaluation criteria. Additionally, our model exhibits a small parameter and calculation overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

No datasets were generated or analyzed during the current study.

References

  1. Greenspan, H.: Super-Resolution in Medical Imaging[J]. Comput. J. 52(1), 43–63 (2009)

    Article  MATH  Google Scholar 

  2. Lillesand, T., Kiefer, R.W. and Chipman, J.: Remote sensing and image interpretation, (2014)

  3. Lobanov, A. P.: Resolution limits in astronomical images, arXiv preprint astro-ph/0503225 (2005)

  4. Caballero, J., Ledig, C., Aitken, A., et al.: Real-time video super-resolution with spatio-temporal networks and motion compensation: U.S. Patent 10,701,394[P]. (2020)

  5. Khan, A., Sargano, A.B., Habib, Z.: DSTnet: deformable spatio-temporal convolutional residual network for video super-resolution[J]. Mathematics 9(22), 2873 (2021)

    Article  MATH  Google Scholar 

  6. Song, C., Lin, Y., Guo, S., et al.: Spatial-temporal synchronous graph convolutional networks: a new framework for spatial-temporal network data forecasting[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 34(01): 914–921 (2020)

  7. Yang, X., Li, H., Li, X.: Lightweight image super-resolution with feature cheap convolution and attention mechanism[J]. Cluster Computing, 1–16 (2022)

  8. Yang, X., Zhang, Y., Guo, Y., et al.: An image super-resolution deep learning network based on multi-level feature extraction module[J]. Multimedia Tools and Applications 80(5), 7063–7075 (2021)

    Article  MATH  Google Scholar 

  9. Xie, Z., Zhang, W., Sheng, B., et al.: BaGFN: broad attentive graph fusion network for high-order feature interactions[J]. IEEE Trans. Neural Netw. Learn. Syst. (2021)

  10. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, (2014)

  11. Lin, X., Sun, S., Huang, W., et al.: EAPT: efficient attention pyramid transformer for image processing[J]. IEEE Transactions on Multimedia, (2021)

  12. Jiang, N., Sheng, B., Li, P., et al.: PhotoHelper: portrait photographing guidance via deep feature retrieval and fusion[J]. IEEE Transactions on Multimedia, (2022)

  13. Liu, S., Huang, D.: Receptive field block net for accurate and fast object detection[C]//Proceedings of the European conference on computer vision (ECCV). 385–400, (2018)

  14. Ying, X., Wang, L., Wang, Y., et al.: Deformable 3d convolution for video super-resolution[J]. IEEE Signal Process. Lett. 27, 1500–1504 (2020)

    Article  MATH  Google Scholar 

  15. Dai, J., Qi, H., Xiong, Y., et al.: Deformable convolutional networks[C]//Proceedings of the IEEE international conference on computer vision. 764–773 (2017)

  16. Wang, R., Shivanna, R., Cheng, D., et al. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems[C]//Proceedings of the Web Conference 2021.1785–1797 (2021)

  17. Zheng, Y., Zhang, M., Lu, F.: Optical flow in the dark[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6749–6757 (2020)

  18. Han, K., Wang, Y., Tian, Q., et al.: Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1580–1589 (2020)

  19. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2(7) (2015)

  20. Gui, S., Wang, H., Yu, C., et al.: Adversarially trained model compression: when robustness meets efficiency[J]. arXiv preprint arXiv:1902.03538, (2019)

  21. Xue, T., Chen, B., Wu, J., et al.: Video enhancement with task-oriented flow[J]. Int. J. Comput. Vision 127(8), 1106–1125 (2019)

    Article  MATH  Google Scholar 

  22. Lim, B., Son, S., Kim, H., et al.: Enhanced deep residual networks for single image super-resolution[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 136–144 (2017)

  23. Tao, X., Gao, H., Liao, R., et al.: Detail-revealing deep video super-resolution[C]//Proceedings of the IEEE International Conference on Computer Vision. 4472–4480 (2017)

  24. Haris, M., Shakhnarovich, G., Ukita, N.: Deep back-projection networks for super-resolution[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 1664–1673 (2018)

  25. Zhang, Y., Li, K., Li, K., et al. Image super-resolution using very deep residual channel attention networks[C]//Proceedings of the European conference on computer vision (ECCV). 286–301 (2018)

  26. Yi, P., Wang, Z., Jiang, K., et al.: Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations[C]//Proceedings of the IEEE/CVF international conference on computer vision. 3106–3115 (2019)

  27. Wang, X., Chan, K.C.K., Yu, K., et al.: Edvr: Video restoration with enhanced deformable convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0 (2019)

  28. Wang, L., Guo, Y., Liu, L., et al.: Deep video super-resolution using HR optical flow estimation[J]. IEEE Trans. Image Process. 29, 4323–4336 (2020)

    Article  MATH  Google Scholar 

  29. Tian, Y., Zhang, Y., Fu, Y., et al.: Tdan: Temporally-deformable alignment network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3360–3369 (2020)

  30. Yi, P., Wang, Z., Jiang, K., et al.: Omniscient video super-resolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 4429–4438 (2021)

  31. Xiang, X., Tian, Y., Zhang, Y., et al.: Zooming slow-mo: Fast and accurate one-stage space-time video super-resolution[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3370–3379 (2020)

  32. Geng, Z., Liang, L., Ding, T., et al.: RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17441–17451 (2022)

  33. Wang, H., Xiang, X., Tian, Y., et al.: STDAN: Deformable Attention Network for Space-Time Video Super-Resolution[J]. arXiv preprint arXiv:2203.06841, (2022)

  34. Xu, G., Xu, J., Li, Z., et al.: Temporal modulation network for controllable space-time video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6388–6397 (2021)

  35. Crowley, E.J., Gray, G., Storkey, A.J.: Moonshine: Distilling with cheap convolutions[J]. Advances in Neural Information Processing Systems, 31 (2018)

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (61573182, 62073164), and by the Fundamental Research Funds for the Central Universities (NS2020025).

Author information

Authors and Affiliations

Authors

Contributions

Xiaohu Wang contributed significantly to analysis and manuscript preparation, and performed the data analyses; Xin Yang contributed to the conception of the study, and wrote the manuscript; Hengrui Li performed the experiment; Tao Li helped perform the analysis with constructive discussions.

Corresponding author

Correspondence to Xin Yang.

Ethics declarations

Conflict of interest

Xiaohu Wang declares that he has no conflict of interest. Xin Yang declares that he has no conflict of interest. Hengrui Li declares that she has no conflict of interest. Tao Li declares that he has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Yang, X., Li, H. et al. FDDCC-VSR: a lightweight video super-resolution network based on deformable 3D convolution and cheap convolution. Vis Comput 41, 3581–3593 (2025). https://doi.org/10.1007/s00371-024-03621-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-024-03621-x

Keywords