research-article

Omniscient Video Super-Resolution with Explicit-Implicit Alignment

Authors:
Peng Yi

School of Computer Science, Wuhan University, China

School of Computer Science, Wuhan University, China

0000-0001-9366-951X
View Profile

,
Zhongyuan Wang

School of Computer Science, Wuhan University, China

School of Computer Science, Wuhan University, China

0000-0003-3268-6177
View Profile

,
Laigan Luo

The Electronic Information School, Wuhan University, China

The Electronic Information School, Wuhan University, China

0000-0001-6960-5006
View Profile

,
Kui Jiang

School of Computer Science, Wuhan University, China

School of Computer Science, Wuhan University, China

0000-0002-4055-7503
View Profile

,
Zheng He

School of Computer Science, Wuhan University, China

School of Computer Science, Wuhan University, China

0000-0002-7700-0901
View Profile

,
Junjun Jiang

The School of Computer Science and Technology, Harbin Institute of Technology, China

The School of Computer Science and Technology, Harbin Institute of Technology, China

0000-0002-5694-505X
View Profile

,
Tao Lu

The School of Computer Science and Engineering, Wuhan Institute of Technology, China

The School of Computer Science and Engineering, Wuhan Institute of Technology, China

0000-0001-8117-2012
View Profile

,
Jiayi Ma

The Electronic Information School, Wuhan University, China

The Electronic Information School, Wuhan University, China

0000-0002-0150-4601
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20 Issue 5Article No.: 150pp 1–23https://doi.org/10.1145/3640346

Published:07 February 2024Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

When considering the temporal relationships, most previous video super-resolution (VSR) methods follow the iterative or recurrent framework. The iterative framework adopts neighboring low-resolution (LR) frames from a sliding window, while the recurrent framework utilizes the output generated in the previous SR procedure. The hybrid framework combines them but still cannot fully leverage the temporal relationships. Meanwhile, the existing methods are limited in the receptive field of the optical flow or lack semantic constrains on motion information. In this work, we propose an omniscient framework to fully explore the temporal relationships in the video, which encompasses both LR frames and SR outputs from the past, present, and future. The omniscient framework is more generic because the iterative, recurrent, and hybrid frameworks can be regarded as its special cases. Besides, when addressing the motion information, most previous VSR methods adopt the explicit motion estimation and compensation, while many recent methods turn to implicit alignment. In implicit alignment methods, because basic non-local means suffers from heavy computational costs, we improve it by capturing the non-local correlations in a relatively local manner to reduce the complexity. Moreover, we integrate the explicit and implicit methods into an explicit-implicit alignment module to better utilize motion information. We have conducted extensive experiments on public datasets, which show that our method is superior over the state-of-the-art methods in objective metrics, subjective visual quality, and complexity. In particular, on datasets of Vid4 and UDM10, our method improves PSNR by 0.19 dB, 0.49 dB against the most advanced method BasicVSR++, respectively.

REFERENCES

[1] Bao Wenbo, Lai Wei-Sheng, Zhang Xiaoyun, Gao Zhiyong, and Yang Ming-Hsuan. 2021. MEMC-Net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2021), 933–948.Google ScholarCross Ref
[2] Belekos Stefanos P., Galatsanos Nikolaos P., and Katsaggelos Aggelos K.. 2010. Maximum a posteriori video super-resolution using a new multichannel image prior. IEEE Transactions on Image Processing 19, 6 (2010), 1451–1464.Google ScholarDigital Library
[3] Caballero Jose, Ledig Christian, Aitken Andrew Peter, Acosta Alejandro, Totz Johannes, Wang Zehan, and Shi Wenzhe. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2848–2857.Google Scholar
[4] Chan Kelvin C.K., Wang Xintao, Yu Ke, Dong Chao, and Loy Chen Change. 2021. BasicVSR: The search for essential components in video super-resolution and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4947–4956.Google ScholarCross Ref
[5] Chan Kelvin C.K., Zhou Shangchen, Xu Xiangyu, and Loy Chen Change. 2022. BasicVSR++: Improving video super-resolution with enhanced propagation and alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5972–5981.Google ScholarCross Ref
[6] Dai Jifeng, Qi Haozhi, Xiong Yuwen, Li Yi, Zhang Guodong, Hu Han, and Wei Yichen. 2017. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 764–773.Google ScholarCross Ref
[7] Dong Chao, Chen Change Loy, He Kaiming, and Tang Xiaoou. 2016. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 2 (2016), 295–307.Google ScholarDigital Library
[8] Dong Chao, Chen Change Loy, and Tang Xiaoou. 2016. Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision (ECCV). 391–407.Google ScholarCross Ref
[9] Farrugia Reuben A. and Guillemot Christine. 2020. Light field super-resolution using a low-rank prior and deep convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 5 (2020), 1162–1175.Google Scholar
[10] Fuoli Dario, Gu Shuhang, and Timofte Radu. 2019. Efficient video super-resolution through recurrent latent space propagation. In Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCVW). 3476–3485.Google ScholarCross Ref
[11] Haris Muhammad, Shakhnarovich Gregory, and Ukita Norimichi. 2018. Deep back-projection networks for super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1664–1673.Google ScholarCross Ref
[12] Haris Muhammad, Shakhnarovich Gregory, and Ukita Norimichi. 2019. Recurrent back-projection network for video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3892–3901.Google ScholarCross Ref
[13] Hu Mengshun, Jiang Kui, Wang Zheng, Bai Xiang, and Hu Ruimin. 2023. CycMuNet+: Cycle-projected mutual learning for spatial-temporal video super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 11 (2023), 13376–13392.Google ScholarDigital Library
[14] Isobe Takashi, Jia Xu, Gu Shuhang, Li Songjiang, Wang Shengjin, and Tian Qi. 2020. Video super-resolution with recurrent structure-detail network. In Proceedings of the European Conference on Computer Vision (ECCV). 645–660.Google ScholarDigital Library
[15] Isobe Takashi, Li Songjiang, Jia Xu, Yuan Shanxin, Slabaugh Gregory, Xu Chunjing, Li Ya-Li, Wang Shengjin, and Tian Qi. 2020. Video super-resolution with temporal group attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8008–8017.Google ScholarCross Ref
[16] Jiang Kui, Wang Zhongyuan, Yi Peng, Lu Tao, Jiang Junjun, and Xiong Zixiang. 2020. Dual-path deep fusion network for face image hallucination. IEEE Transactions on Neural Networks and Learning Systems 33, 1 (2020), 378–391.Google ScholarCross Ref
[17] Jiang Kui, Wang Zhongyuan, Yi Peng, Wang Guangcheng, Gu Ke, and Jiang Junjun. 2019. ATMFN: Adaptive-threshold-based multi-model fusion network for compressed face hallucination. IEEE Transactions on Multimedia 22, 10 (2019), 2734–2747.Google ScholarCross Ref
[18] Jiang K., Wang Z., Yi P., Wang G., Lu T., and Jiang J.. 2019. Edge-enhanced GAN for remote sensing image superresolution. IEEE Transactions on Geoscience and Remote Sensing 57, 8 (Aug2019), 5799–5812.Google ScholarCross Ref
[19] Jo Younghyun, Oh Seoung Wug, Kang Jaeyeon, and Kim Seon Joo. 2018. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3224–3232.Google ScholarCross Ref
[20] Kappeler Armin, Yoo Seunghwan, Dai Qiqin, and Katsaggelos Aggelos K.. 2016. Video super-resolution with convolutional neural networks. IEEE Transactions on Computational Imaging 2, 2 (2016), 109–122.Google ScholarCross Ref
[21] Kim Jiwon, Lee Jung Kwon, and Lee Kyoung Mu. 2016. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1646–1654.Google ScholarCross Ref
[22] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google Scholar
[23] Kwon Younghee, Kim Kwang In, Tompkin James, Kim Jin Hyung, and Theobalt Christian. 2015. Efficient learning of image super-resolution and compression artifact removal with semi-local Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 9 (2015), 1792–1805.Google ScholarDigital Library
[24] Lai Wei-Sheng, Huang Jia-Bin, Ahuja Narendra, and Yang Ming-Hsuan. 2017. Deep Laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5835–5843.Google ScholarCross Ref
[25] Lai Wei-Sheng, Huang Jia-Bin, Ahuja Narendra, and Yang Ming-Hsuan. 2019. Fast and accurate image super-resolution with deep Laplacian pyramid networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 11 (Nov2019), 2599–2613.Google ScholarCross Ref
[26] Ledig Christian, Wang Zehan, Shi Wenzhe, Theis Lucas, Huszar Ferenc, Caballero Jose, Cunningham Andrew, Acosta Alejandro, Aitken Andrew, and Tejani Alykhan. 2016. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 105–114.Google Scholar
[27] Li Dingyi and Wang Zengfu. 2017. Video superresolution via motion compensation and deep residual learning. IEEE Transactions on Computational Imaging 3, 4 (2017), 749–762.Google ScholarCross Ref
[28] Liu Ce and Sun Deqing. 2014. On Bayesian adaptive video super resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 2 (2014), 346–60.Google ScholarDigital Library
[29] Liu Ding, Wang Zhaowen, Fan Yuchen, Liu Xianming, Wang Zhangyang, Chang Shiyu, and Huang Thomas. 2017. Robust video super-resolution with learned temporal dynamics. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2526–2534.Google ScholarCross Ref
[30] Liu Ding, Wang Zhaowen, Fan Yuchen, Liu Xianming, Wang Zhangyang, Chang Shiyu, Wang Xinchao, and Huang Thomas S.. 2018. Learning temporal dynamics for video super-resolution: A deep learning approach. IEEE Transactions on Image Processing 27, 7 (2018), 3432–3445.Google ScholarCross Ref
[31] Nah Seungjun, Baik Sungyong, Hong Seokil, Moon Gyeongsik, Son Sanghyun, Timofte Radu, and Lee Kyoung Mu. 2019. NTIRE 2019 challenge on video deblurring and super-resolution: Dataset and study. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1996–2005.Google ScholarCross Ref
[32] Sajjadi Mehdi S. M, Vemulapalli Raviteja, and Brown Matthew. 2018. Frame-recurrent video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6626–6634.Google ScholarCross Ref
[33] Shi Wenzhe, Caballero Jose, Huszár Ferenc, Totz Johannes, Aitken Andrew P., Bishop Rob, Rueckert Daniel, and Wang Zehan. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1874–1883.Google ScholarCross Ref
[34] Shi Xingjian, Chen Zhourong, Wang Hao, Woo Wang Chun, Woo Wang Chun, and Woo Wang Chun. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS). 802–810.Google Scholar
[35] Tao Xin, Gao Hongyun, Liao Renjie, Wang Jue, and Jia Jiaya. 2017. Detail-revealing deep video super-resolution. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 4482–4490.Google ScholarCross Ref
[36] Tian Yapeng, Zhang Yulun, Fu Yun, and Xu Chenliang. 2020. TDAN: Temporally-deformable alignment network for video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3360–3369.Google ScholarCross Ref
[37] Tong Tong, Li Gen, Liu Xiejie, and Gao Qinquan. 2017. Image super-resolution using dense skip connections. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 4809–4817.Google ScholarCross Ref
[38] Wang Xintao, Chan Kelvin C. K., Yu Ke, Dong Chao, and Loy Chen Change. 2019. EDVR: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1954–1963.Google ScholarCross Ref
[39] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7794–7803.Google ScholarCross Ref
[40] Wang Xintao, Yu Ke, Wu Shixiang, Gu Jinjin, Liu Yihao, Dong Chao, Qiao Yu, and Loy Chen Change. 2018. ESRGAN: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision Workshops (ECCVW). 63–79.Google Scholar
[41] Wang Zhou, Bovik A.C., Sheikh H.R., and Simoncelli E.P.. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.Google ScholarDigital Library
[42] Wang Zhihao, Chen Jian, and Hoi Steven C. H.. 2021. Deep learning for image super-resolution: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 10 (2021), 3365–3387.Google ScholarCross Ref
[43] Wang Zhongyuan, Yi Peng, Jiang Kui, Jiang Junjun, Han Zhen, Lu Tao, and Ma Jiayi. 2019. Multi-memory convolutional neural network for video super-resolution. IEEE Transactions on Image Processing 28, 5 (2019), 2530–2544.Google ScholarDigital Library
[44] Xia Bin, He Jingwen, Zhang Yulun, Wang Yitong, Tian Yapeng, Yang Wenming, and Gool Luc Van. 2023. Structured sparsity learning for efficient video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22638–22647.Google ScholarCross Ref
[45] Xue Tianfan, Chen Baian, Wu Jiajun, Wei Donglai, and Freeman William T.. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.Google ScholarDigital Library
[46] Yan Bo, Lin Chuming, and Tan Weimin. 2019. Frame and feature-context video super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence. 5597–5604.Google ScholarDigital Library
[47] Yi Peng, Wang Zhongyuan, Jiang Kui, Jiang Junjun, Lu Tao, and Ma Jiayi. 2022. A progressive fusion generative adversarial network for realistic and consistent video super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 5 (2022), 2264–2280.Google Scholar
[48] Yi Peng, Wang Zhongyuan, Jiang Kui, Jiang Junjun, Lu Tao, Tian Xin, and Ma Jiayi. 2021. Omniscient video super-resolution. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 4409–4418.Google ScholarCross Ref
[49] Yi Peng, Wang Zhongyuan, Jiang Kui, Jiang Junjun, and Ma Jiayi. 2019. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 3106–3115.Google ScholarCross Ref
[50] Yi Peng, Wang Zhongyuan, Jiang Kui, Shao Zhenfeng, and Ma Jiayi. 2020. Multi-temporal ultra dense memory network for video super-resolution. IEEE Transactions on Circuits and Systems for Video Technology 30, 8 (2020), 2503–2516.Google ScholarDigital Library
[51] Yu Xin, Fernando Basura, Hartley Richard, and Porikli Fatih. 2020. Semantic face hallucination: Super-resolving very low-resolution face images with supplementary attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 11 (2020), 2926–2943.Google Scholar
[52] Zhang Yulun, Li Kunpeng, Li Kai, Wang Lichen, Zhong Bineng, and Fu Yun. 2018. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV). 294–310.Google ScholarDigital Library
[53] Zhang Yulun, Tian Yapeng, Kong Yu, Zhong Bineng, and Fu Yun. 2018. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2472–2481.Google ScholarCross Ref
[54] Zhang Z., Wang X., and Jung C.. 2019. DCSR: Dilated convolutions for single image super-resolution. IEEE Transactions on Image Processing 28, 4 (April2019), 1625–1635.Google ScholarDigital Library
[55] Zhu Xizhou, Hu Han, Lin Stephen, and Dai Jifeng. 2019. Deformable ConvNets V2: More deformable, better results. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9300–9308.Google ScholarCross Ref

Index Terms

Omniscient Video Super-Resolution with Explicit-Implicit Alignment
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction

Recommendations

Patch-based spatio-temporal super-resolution for video with non-rigid motion

This paper presents a novel approach for spatio-temporal video super-resolution. Whereas the task of synthesizing high-frequency information on the spatial domain can be accomplished without introducing arbitrary priors on the image model (beyond the ...
Read More
Video Super-Resolution using Multi-scale Pyramid 3D Convolutional Networks
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Video super-resolution (SR) aims at generating high-resolution (HR) frames from consecutive low-resolution (LR) frames. The challenge is how to make use of temporal coherence among neighbouring LR frames. Most previous works use motion estimation and ...
Read More
Video super-resolution network using detail component extraction and optical flow enhancement algorithm
Abstract
The video super-resolution (SR) task refers to the use of corresponding low-resolution (LR) frames and multiple neighboring frames to generate high-resolution (HR) frames. Existing deep learning-based approaches usually utilize LR optical flow for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 5
May 2024
650 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3613634
Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 February 2024
- Online AM: 11 January 2024
- Accepted: 8 January 2024
- Revised: 22 December 2023
- Received: 27 September 2023
Published in tomm Volume 20, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Convolutional Neural Network
Video Super-Resolution
Omniscient Framework
Explicit-Implicit Alignment
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 172
  Total Downloads
- Downloads (Last 12 months)172
- Downloads (Last 6 weeks)42
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Omniscient Video Super-Resolution with Explicit-Implicit Alignment

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Patch-based spatio-temporal super-resolution for video with non-rigid motion

Video Super-Resolution using Multi-scale Pyramid 3D Convolutional Networks

Video super-resolution network using detail component extraction and optical flow enhancement algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

Omniscient Video Super-Resolution with Explicit-Implicit Alignment

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Patch-based spatio-temporal super-resolution for video with non-rigid motion

Video Super-Resolution using Multi-scale Pyramid 3D Convolutional Networks

Video super-resolution network using detail component extraction and optical flow enhancement algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media