Skip to main content
Log in

Learning a spatial-temporal symmetry network for video super-resolution

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The video super-resolution (VSR) method is designed to estimate and restore high-resolution (HR) sequences from low-resolution (LR) input. For the past few years, many VSR methods with machine learning have been proposed that combine both the convolutional neural network (CNN) and motion compensation. Most mainstream approaches are based on optical flow or deformation convolution, and both need accurate estimates for motion compensation. However, most previous methods have not been able to fully utilize the spatial-temporal symmetrical information from input sequences. Moreover, much computation is consumed by aligning every neighbouring frame to the reference frame separately. Furthermore, many methods reconstruct HR results on only a single scale, which limits the reconstruction accuracy of the network and its performance in complex scenes. In this study, we propose a spatial-temporal symmetry network (STSN) to solve the above deficiencies. STSN includes four parts: prefusion, alignment, postfusion and reconstruction. First, a two-stage fusion strategy is applied to reduce the computation consumption of the network. Furthermore, ConvGRU is utilized in the prefusion module, the redundant features between neighbouring frames are eliminated, and several neighbouring frames are fused and condensed into two parts. To generate accurate offset maps, we present a spatial-temporal symmetry attention block (STSAB). This component exploits the symmetry of spatial-temporal combined spatial attention. In the reconstruction module, we propose an SR multiscale residual block (SR-MSRB) to enhance reconstruction performance. Abundant experimental results that test several datasets show that our method possesses better effects and efficiency in both quantitative and qualitative measurement indices compared with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Lu E, Hu X (2021) Image super-resolution via channel attention and spatial attention. Appl Intell:1–9

  2. Liu Z, Huang J, Zhu C, Peng X, Du X (2021) Residual attention network using multi-channel dense connections for image super-resolution. Appl Intell 51(1):85–99

    Article  Google Scholar 

  3. Zhang Y, Sun Y, Liu S (2021) Deformable and residual convolutional network for image super-resolution. Appl Intell:1–10

  4. Xiong C, Shi X, Gao Z, Wang G (2021) Attention augmented multi-scale network for single image super-resolution. Appl Intell 51(2):935–951

    Article  Google Scholar 

  5. Chen W, Yao P, Gai S, Da F (2021) Multi-scale feature aggregation network for image super-resolution. Appl Intell:1–10

  6. Keys R (1981) Cubic convolution interpolation for digital image processing. IEEE Trans Acoust Speech Signal Process 29(6):1153–1160

    Article  MathSciNet  MATH  Google Scholar 

  7. Dong C, Loy C C, He K, Tang X (2014) Learning a deep convolutional network for image super-resolution. In: European conference on computer vision. Springer, pp 184–199

  8. Dong C, Loy C C, Tang X (2016) Accelerating the super-resolution convolutional neural network. In: European conference on computer vision. Springer, pp 391–407

  9. Shi W, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D, Wang Z (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1874–1883

  10. Dong C, Loy C C, He K, Tang X (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307

    Article  Google Scholar 

  11. Yu J, Fan Y, Yang J, Xu N, Wang Z, Wang X, Huang T (2018) Wide activation for efficient and accurate image super-resolution. arXiv:1808.08718

  12. Caballero J, Ledig C, Aitken A, Acosta A, Totz J, Wang Z, Shi W (2017) Real-time video super-resolution with spatio-temporal networks and motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4778–4787

  13. Tao X, Gao H, Liao R, Wang J, Jia J (2017) Detail-revealing deep video super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4472–4480

  14. Sajjadi MSM, Vemulapalli R, Brown M (2018) Frame-recurrent video super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6626–6634

  15. Wang L, Guo Y, Lin Z, Deng X, An W (2018) Learning for video super-resolution through hr optical flow estimation. In: Asian Conference on Computer Vision. Springer, pp 514–529

  16. Xue T, Chen B, Wu J, Wei D, Freeman W T (2019) Video enhancement with task-oriented flow. Int J Comput Vis 127(8):1106–1125

    Article  Google Scholar 

  17. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773

  18. Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: More deformable, better results. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9308–9316

  19. Tian Y, Zhang Y, Fu Y, Xu C (2020) Tdan: Temporally-deformable alignment network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3360–3369

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  21. Ballas N, Yao L, Pal C, Courville A (2015) Delving deeper into convolutional networks for learning video representations. arXiv:1511.06432

  22. Kim J, Kwon Lee J, Mu Lee K (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1646–1654

  23. Lai W-S, Huang J-B, Ahuja N, Yang M-H (2017) Deep laplacian pyramid networks for fast and accurate super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 624–632

  24. Huang Y, Wang W, Wang L (2015) Bidirectional recurrent convolutional networks for multi-frame super-resolution. In: Advances in Neural Information Processing Systems, pp 235–243

  25. Kappeler A, Yoo S, Dai Q, Katsaggelos A K (2016) Video super-resolution with convolutional neural networks. IEEE Trans Comput Imaging 2(2):109–122

    Article  MathSciNet  Google Scholar 

  26. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  27. Luong M-T, Pham H, Manning C D (2015) Effective approaches to attention-based neural machine translation. arXiv:1508.04025

  28. Yin W, Schütze H, Xiang B, Zhou B (2016) Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist 4:259–272

    Article  Google Scholar 

  29. Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 842–850

  30. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  31. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  32. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  33. Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  34. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-c (2015) Convolutional lstm network: A machine learning approach for precipitation nowcasting. arXiv:1506.04214

  35. Sajjadi MSM, Vemulapalli R, Brown M (2018) Frame-recurrent video super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6626–6634

  36. Yi P, Wang Z, Jiang K, Shao Z, Ma J (2019) Multi-temporal ultra dense memory network for video super-resolution. IEEE Trans Circ Syst Video Technol 30(8):2503–2516

    Article  Google Scholar 

  37. Wang X, Chan KCK, Yu K, Dong C, Change Loy C (2019) Edvr: Video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0

  38. Zhang D, Shao J, Liang Z, Liu X, Shen H T (2020) Multi-branch networks for video super-resolution with dynamic reconstruction strategy. IEEE Transactions on Circuits and Systems for Video Technology

  39. Gao S, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr PHS (2019) Res2net: A new multi-scale backbone architecture. IEEE transactions on pattern analysis and machine intelligence

  40. Ding X, Guo Y, Ding G, Han J (2019) Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1911–1920

  41. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122

  42. Liu C, Sun D (2013) On bayesian adaptive video super resolution. IEEE Trans Pattern Anal Mach Intell 36(2):346–360

    Article  Google Scholar 

  43. Nah S, Baik S, Hong S, Moon G, Son S, Timofte R, Mu Lee K (2019) Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0

  44. Zhu X, Li Z, Lou J, Shen Q (2021) Video super-resolution based on a spatio-temporal matching network. Pattern Recogn 110:107619

    Article  Google Scholar 

  45. Li F, Bai H, Zhao Y (2020) Learning a deep dual attention network for video super-resolution. IEEE Trans Image Process 29:4474–4488

    Article  MATH  Google Scholar 

  46. López-Tapia S, Lucas A, Molina R, Katsaggelos A K (2020) A single video super-resolution gan for multiple downsampling operators based on pseudo-inverse image formation models. Digital Signal Process 104:102801

    Article  Google Scholar 

  47. Cao Y, Wang C, Song C, Tang Y, Li H (2021) Real-time super-resolution system of 4k-video based on deep learning. In: 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, pp 69–76

  48. Ying X, Wang L, Wang Y, Sheng W, An W, Guo Y (2020) Deformable 3d convolution for video super-resolution. IEEE Signal Process Lett 27:1500–1504

    Article  Google Scholar 

  49. Li D, Wang Z (2017) Video superresolution via motion compensation and deep residual learning. IEEE Trans Comput Imaging 3(4):749–762

    Article  MathSciNet  Google Scholar 

  50. Lai W-S, Huang J-B, Ahuja N, Yang M-H (2018) Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE Trans Pattern Anal Mach Intell 41(11):2599– 2613

    Article  Google Scholar 

  51. Chu M, Xie Y, Mayer J, Leal-Taixé L, Thuerey N (2020) Learning temporal coherence via self-supervision for gan-based video generation. ACM Trans Graph (TOG) 39(4):75–1

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingliang Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Liu, M. & Wei, P. Learning a spatial-temporal symmetry network for video super-resolution. Appl Intell 53, 3530–3544 (2023). https://doi.org/10.1007/s10489-022-03603-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03603-3

Keywords

Navigation