research-article

SPSTT: Second-Order Propagation Spatial Temporal Transformer Network for Space-Time Video Super-Resolution

Authors:

Hongchao ZhouAuthors Info & Claims

ICIGP '23: Proceedings of the 2023 6th International Conference on Image and Graphics Processing

Pages 204 - 209

https://doi.org/10.1145/3582649.3582662

Published: 07 April 2023 Publication History

Abstract

In this paper, we explored the space-time video super-resolution task, which aims to generate high frame rate (HFR) and high resolution (HR) videos from low frame rate (LFR) and low resolution (LR) videos. Most of the existing space-time video super-resolution methods simply combine two sub-tasks of video interpolation (VFI) and video super-resolution (VSR). And these methods usually use recursive propagation structures, but their structures are complex, very time-consuming, and do not make full use of feature information. To address these problems, we proposed a single-stage space-time super-resolution architecture that is based on the Swin Transformer and second-order network propagation. The Swin Transformer allows a natural combination of two subtasks into a single task, and then the second-order network propagation enhances information propagation and efficiently utilizes the information of all the input video frames. We also introduced a dataset pre-cleaning module, which can not only alleviate the image degradation before being propagated, but also suppress the artifacts in the model output, and improve the reconstruction performance of the proposed model. The experimental results show that compared with the related two-stage network, our proposed model is lighter and the reasoning speed is faster with competitive performance.

References

[1]

Shechtman E, Caspi Y, Irani M. Space-time super-resolution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(4): 531-545.

Digital Library

[2]

Zheng S, Lu J, Zhao H, Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 6881-6890.

[3]

Niklaus S, Mai L, Liu F. Video frame interpolation via adaptive separable convolution[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 261-270.

[4]

Xue T, Chen B, Wu J, Video enhancement with task-oriented flow[J]. International Journal of Computer Vision, 2019, 127(8): 1106-1125.

Digital Library

[5]

Vaswani A, Shazier N, Parmar N, Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30.

[6]

Wang X, Chan K C K, Yu K, Edvard: Video restoration with enhanced deformable convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019: 1954-1963.

[7]

Haris M, Shakhnarovich G, Ukita N. Recurrent back-projection network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3897-3906.

[8]

Tian Y, Zhang Y, Fu Y, Tdan: Temporally-deformable alignment network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 3360-3369.

[9]

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.

[10]

Jiang H, Sun D, Jampani V, Super slo-mo: High-quality estimation of multiple intermediate frames for video interpolation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 9000-9008.

[11]

Geng Z, Liang L, Ding T, RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 17441-17451.

[12]

Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

[13]

Bao W, Lai W S, Ma C, Depth-aware video frame interpolation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3703-3712.

[14]

Zhang Y, Li K, Li K, Image super-resolution using very deep residual channel attention networks[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 286-301.

[15]

Shechtman E, Caspi Y, Irani M. Increasing space-time resolution in video[C]// Proceedings of the European Conference on Computer Vision(ECCV).2002: 753-768.

[16]

Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984 (6): 721-741.

Digital Library

[17]

Takeda H, Beek P, Milanfar P. Spatiotemporal video upscaling using motion-assisted steering kernel (mask) regression[M]//High-Quality Visual Experience. Springer, Berlin, Heidelberg, 2010: 245-274.

[18]

Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.

[19]

Qadir M, Soh Y, Ashraf H, A Hybridized Disparity Computation Fusing Disparity Space Image and Multi Resolution Image Segmentation[J]. International Journal of Computer Theory and Engineering, 2014, 6(5).

[20]

Alaoui R, El Alaoui S O, Meknassi M. An efficient similarity indexing by ordering permutations for Spatial Multi-Resolution images[J]. International Journal of Computer Theory and Engineering, 2009, 1(3): 244.

[21]

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.

[22]

He K, Zhang X, Ren S, Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.

[23]

Liu Z, Lin Y, Cao Y, Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 10012-10022.

[24]

Chen M, Radford A, Child R, Generative pretraining from pixels[C]//International Conference on Machine Learning. PMLR, 2020: 1691-1703.

Index Terms

SPSTT: Second-Order Propagation Spatial Temporal Transformer Network for Space-Time Video Super-Resolution
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Complementary Dual-Branch Network for Space-Time Video Super-Resolution
Pattern Recognition
Abstract
Space-time video super-resolution aims to simultaneously increase the space-time resolution of low-resolution and low frame-rate videos. Existing deep learning-based methods have made notable strides, predominantly achieving space-time video super-...
Combining optical flow and Swin Transformer for Space-Time video super-resolution
Abstract
Space–time video super-resolution is a task that aims to interpolate low frame rate, low resolution videos to high frame rate, high resolution ones. While existing Transformer-based methods have achieved results comparable to convolutional neural ...
Deeply feature fused video super-resolution network using temporal grouping
Abstract
The video super-resolution (VSR) task refers to the use of corresponding low-resolution frames and multiple neighboring frames to generate high-resolution (HR) frames. An important step in VSR is to fuse the features of the reference frame with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICIGP '23: Proceedings of the 2023 6th International Conference on Image and Graphics Processing

January 2023

246 pages

ISBN:9781450398572

DOI:10.1145/3582649

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICIGP 2023

ICIGP 2023: 2023 The 6th International Conference on Image and Graphics Processing

January 6 - 8, 2023

Chongqing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
74
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten