Abstract
The video super-resolution (VSR) method is designed to estimate and restore high-resolution (HR) sequences from low-resolution (LR) input. For the past few years, many VSR methods with machine learning have been proposed that combine both the convolutional neural network (CNN) and motion compensation. Most mainstream approaches are based on optical flow or deformation convolution, and both need accurate estimates for motion compensation. However, most previous methods have not been able to fully utilize the spatial-temporal symmetrical information from input sequences. Moreover, much computation is consumed by aligning every neighbouring frame to the reference frame separately. Furthermore, many methods reconstruct HR results on only a single scale, which limits the reconstruction accuracy of the network and its performance in complex scenes. In this study, we propose a spatial-temporal symmetry network (STSN) to solve the above deficiencies. STSN includes four parts: prefusion, alignment, postfusion and reconstruction. First, a two-stage fusion strategy is applied to reduce the computation consumption of the network. Furthermore, ConvGRU is utilized in the prefusion module, the redundant features between neighbouring frames are eliminated, and several neighbouring frames are fused and condensed into two parts. To generate accurate offset maps, we present a spatial-temporal symmetry attention block (STSAB). This component exploits the symmetry of spatial-temporal combined spatial attention. In the reconstruction module, we propose an SR multiscale residual block (SR-MSRB) to enhance reconstruction performance. Abundant experimental results that test several datasets show that our method possesses better effects and efficiency in both quantitative and qualitative measurement indices compared with state-of-the-art methods.
Similar content being viewed by others
References
Lu E, Hu X (2021) Image super-resolution via channel attention and spatial attention. Appl Intell:1–9
Liu Z, Huang J, Zhu C, Peng X, Du X (2021) Residual attention network using multi-channel dense connections for image super-resolution. Appl Intell 51(1):85–99
Zhang Y, Sun Y, Liu S (2021) Deformable and residual convolutional network for image super-resolution. Appl Intell:1–10
Xiong C, Shi X, Gao Z, Wang G (2021) Attention augmented multi-scale network for single image super-resolution. Appl Intell 51(2):935–951
Chen W, Yao P, Gai S, Da F (2021) Multi-scale feature aggregation network for image super-resolution. Appl Intell:1–10
Keys R (1981) Cubic convolution interpolation for digital image processing. IEEE Trans Acoust Speech Signal Process 29(6):1153–1160
Dong C, Loy C C, He K, Tang X (2014) Learning a deep convolutional network for image super-resolution. In: European conference on computer vision. Springer, pp 184–199
Dong C, Loy C C, Tang X (2016) Accelerating the super-resolution convolutional neural network. In: European conference on computer vision. Springer, pp 391–407
Shi W, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D, Wang Z (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1874–1883
Dong C, Loy C C, He K, Tang X (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307
Yu J, Fan Y, Yang J, Xu N, Wang Z, Wang X, Huang T (2018) Wide activation for efficient and accurate image super-resolution. arXiv:1808.08718
Caballero J, Ledig C, Aitken A, Acosta A, Totz J, Wang Z, Shi W (2017) Real-time video super-resolution with spatio-temporal networks and motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4778–4787
Tao X, Gao H, Liao R, Wang J, Jia J (2017) Detail-revealing deep video super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4472–4480
Sajjadi MSM, Vemulapalli R, Brown M (2018) Frame-recurrent video super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6626–6634
Wang L, Guo Y, Lin Z, Deng X, An W (2018) Learning for video super-resolution through hr optical flow estimation. In: Asian Conference on Computer Vision. Springer, pp 514–529
Xue T, Chen B, Wu J, Wei D, Freeman W T (2019) Video enhancement with task-oriented flow. Int J Comput Vis 127(8):1106–1125
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: More deformable, better results. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9308–9316
Tian Y, Zhang Y, Fu Y, Xu C (2020) Tdan: Temporally-deformable alignment network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3360–3369
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Ballas N, Yao L, Pal C, Courville A (2015) Delving deeper into convolutional networks for learning video representations. arXiv:1511.06432
Kim J, Kwon Lee J, Mu Lee K (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1646–1654
Lai W-S, Huang J-B, Ahuja N, Yang M-H (2017) Deep laplacian pyramid networks for fast and accurate super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 624–632
Huang Y, Wang W, Wang L (2015) Bidirectional recurrent convolutional networks for multi-frame super-resolution. In: Advances in Neural Information Processing Systems, pp 235–243
Kappeler A, Yoo S, Dai Q, Katsaggelos A K (2016) Video super-resolution with convolutional neural networks. IEEE Trans Comput Imaging 2(2):109–122
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Luong M-T, Pham H, Manning C D (2015) Effective approaches to attention-based neural machine translation. arXiv:1508.04025
Yin W, Schütze H, Xiang B, Zhou B (2016) Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist 4:259–272
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 842–850
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-c (2015) Convolutional lstm network: A machine learning approach for precipitation nowcasting. arXiv:1506.04214
Sajjadi MSM, Vemulapalli R, Brown M (2018) Frame-recurrent video super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6626–6634
Yi P, Wang Z, Jiang K, Shao Z, Ma J (2019) Multi-temporal ultra dense memory network for video super-resolution. IEEE Trans Circ Syst Video Technol 30(8):2503–2516
Wang X, Chan KCK, Yu K, Dong C, Change Loy C (2019) Edvr: Video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0
Zhang D, Shao J, Liang Z, Liu X, Shen H T (2020) Multi-branch networks for video super-resolution with dynamic reconstruction strategy. IEEE Transactions on Circuits and Systems for Video Technology
Gao S, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr PHS (2019) Res2net: A new multi-scale backbone architecture. IEEE transactions on pattern analysis and machine intelligence
Ding X, Guo Y, Ding G, Han J (2019) Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1911–1920
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122
Liu C, Sun D (2013) On bayesian adaptive video super resolution. IEEE Trans Pattern Anal Mach Intell 36(2):346–360
Nah S, Baik S, Hong S, Moon G, Son S, Timofte R, Mu Lee K (2019) Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0
Zhu X, Li Z, Lou J, Shen Q (2021) Video super-resolution based on a spatio-temporal matching network. Pattern Recogn 110:107619
Li F, Bai H, Zhao Y (2020) Learning a deep dual attention network for video super-resolution. IEEE Trans Image Process 29:4474–4488
López-Tapia S, Lucas A, Molina R, Katsaggelos A K (2020) A single video super-resolution gan for multiple downsampling operators based on pseudo-inverse image formation models. Digital Signal Process 104:102801
Cao Y, Wang C, Song C, Tang Y, Li H (2021) Real-time super-resolution system of 4k-video based on deep learning. In: 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, pp 69–76
Ying X, Wang L, Wang Y, Sheng W, An W, Guo Y (2020) Deformable 3d convolution for video super-resolution. IEEE Signal Process Lett 27:1500–1504
Li D, Wang Z (2017) Video superresolution via motion compensation and deep residual learning. IEEE Trans Comput Imaging 3(4):749–762
Lai W-S, Huang J-B, Ahuja N, Yang M-H (2018) Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE Trans Pattern Anal Mach Intell 41(11):2599– 2613
Chu M, Xie Y, Mayer J, Leal-Taixé L, Thuerey N (2020) Learning temporal coherence via self-supervision for gan-based video generation. ACM Trans Graph (TOG) 39(4):75–1
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, X., Liu, M. & Wei, P. Learning a spatial-temporal symmetry network for video super-resolution. Appl Intell 53, 3530–3544 (2023). https://doi.org/10.1007/s10489-022-03603-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03603-3