skip to main content
10.1145/3582649.3582662acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicigpConference Proceedingsconference-collections
research-article

SPSTT: Second-Order Propagation Spatial Temporal Transformer Network for Space-Time Video Super-Resolution

Published: 07 April 2023 Publication History

Abstract

In this paper, we explored the space-time video super-resolution task, which aims to generate high frame rate (HFR) and high resolution (HR) videos from low frame rate (LFR) and low resolution (LR) videos. Most of the existing space-time video super-resolution methods simply combine two sub-tasks of video interpolation (VFI) and video super-resolution (VSR). And these methods usually use recursive propagation structures, but their structures are complex, very time-consuming, and do not make full use of feature information. To address these problems, we proposed a single-stage space-time super-resolution architecture that is based on the Swin Transformer and second-order network propagation. The Swin Transformer allows a natural combination of two subtasks into a single task, and then the second-order network propagation enhances information propagation and efficiently utilizes the information of all the input video frames. We also introduced a dataset pre-cleaning module, which can not only alleviate the image degradation before being propagated, but also suppress the artifacts in the model output, and improve the reconstruction performance of the proposed model. The experimental results show that compared with the related two-stage network, our proposed model is lighter and the reasoning speed is faster with competitive performance.

References

[1]
Shechtman E, Caspi Y, Irani M. Space-time super-resolution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(4): 531-545.
[2]
Zheng S, Lu J, Zhao H, Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 6881-6890.
[3]
Niklaus S, Mai L, Liu F. Video frame interpolation via adaptive separable convolution[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 261-270.
[4]
Xue T, Chen B, Wu J, Video enhancement with task-oriented flow[J]. International Journal of Computer Vision, 2019, 127(8): 1106-1125.
[5]
Vaswani A, Shazier N, Parmar N, Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30.
[6]
Wang X, Chan K C K, Yu K, Edvard: Video restoration with enhanced deformable convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019: 1954-1963.
[7]
Haris M, Shakhnarovich G, Ukita N. Recurrent back-projection network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3897-3906.
[8]
Tian Y, Zhang Y, Fu Y, Tdan: Temporally-deformable alignment network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 3360-3369.
[9]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[10]
Jiang H, Sun D, Jampani V, Super slo-mo: High-quality estimation of multiple intermediate frames for video interpolation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 9000-9008.
[11]
Geng Z, Liang L, Ding T, RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 17441-17451.
[12]
Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[13]
Bao W, Lai W S, Ma C, Depth-aware video frame interpolation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3703-3712.
[14]
Zhang Y, Li K, Li K, Image super-resolution using very deep residual channel attention networks[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 286-301.
[15]
Shechtman E, Caspi Y, Irani M. Increasing space-time resolution in video[C]// Proceedings of the European Conference on Computer Vision(ECCV).2002: 753-768.
[16]
Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984 (6): 721-741.
[17]
Takeda H, Beek P, Milanfar P. Spatiotemporal video upscaling using motion-assisted steering kernel (mask) regression[M]//High-Quality Visual Experience. Springer, Berlin, Heidelberg, 2010: 245-274.
[18]
Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.
[19]
Qadir M, Soh Y, Ashraf H, A Hybridized Disparity Computation Fusing Disparity Space Image and Multi Resolution Image Segmentation[J]. International Journal of Computer Theory and Engineering, 2014, 6(5).
[20]
Alaoui R, El Alaoui S O, Meknassi M. An efficient similarity indexing by ordering permutations for Spatial Multi-Resolution images[J]. International Journal of Computer Theory and Engineering, 2009, 1(3): 244.
[21]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[22]
He K, Zhang X, Ren S, Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[23]
Liu Z, Lin Y, Cao Y, Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 10012-10022.
[24]
Chen M, Radford A, Child R, Generative pretraining from pixels[C]//International Conference on Machine Learning. PMLR, 2020: 1691-1703.

Index Terms

  1. SPSTT: Second-Order Propagation Spatial Temporal Transformer Network for Space-Time Video Super-Resolution

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICIGP '23: Proceedings of the 2023 6th International Conference on Image and Graphics Processing
    January 2023
    246 pages
    ISBN:9781450398572
    DOI:10.1145/3582649
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 April 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep Learning
    2. Pre-Cleaning
    3. Second-Order Propagation
    4. Space-Time Video Super-Resolution
    5. Swin Transformer

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICIGP 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 74
      Total Downloads
    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media