skip to main content
research-article

Omniscient Video Super-Resolution with Explicit-Implicit Alignment

Published:07 February 2024Publication History
Skip Abstract Section

Abstract

When considering the temporal relationships, most previous video super-resolution (VSR) methods follow the iterative or recurrent framework. The iterative framework adopts neighboring low-resolution (LR) frames from a sliding window, while the recurrent framework utilizes the output generated in the previous SR procedure. The hybrid framework combines them but still cannot fully leverage the temporal relationships. Meanwhile, the existing methods are limited in the receptive field of the optical flow or lack semantic constrains on motion information. In this work, we propose an omniscient framework to fully explore the temporal relationships in the video, which encompasses both LR frames and SR outputs from the past, present, and future. The omniscient framework is more generic because the iterative, recurrent, and hybrid frameworks can be regarded as its special cases. Besides, when addressing the motion information, most previous VSR methods adopt the explicit motion estimation and compensation, while many recent methods turn to implicit alignment. In implicit alignment methods, because basic non-local means suffers from heavy computational costs, we improve it by capturing the non-local correlations in a relatively local manner to reduce the complexity. Moreover, we integrate the explicit and implicit methods into an explicit-implicit alignment module to better utilize motion information. We have conducted extensive experiments on public datasets, which show that our method is superior over the state-of-the-art methods in objective metrics, subjective visual quality, and complexity. In particular, on datasets of Vid4 and UDM10, our method improves PSNR by 0.19 dB, 0.49 dB against the most advanced method BasicVSR++, respectively.

REFERENCES

  1. [1] Bao Wenbo, Lai Wei-Sheng, Zhang Xiaoyun, Gao Zhiyong, and Yang Ming-Hsuan. 2021. MEMC-Net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2021), 933948.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Belekos Stefanos P., Galatsanos Nikolaos P., and Katsaggelos Aggelos K.. 2010. Maximum a posteriori video super-resolution using a new multichannel image prior. IEEE Transactions on Image Processing 19, 6 (2010), 14511464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Caballero Jose, Ledig Christian, Aitken Andrew Peter, Acosta Alejandro, Totz Johannes, Wang Zehan, and Shi Wenzhe. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 28482857.Google ScholarGoogle Scholar
  4. [4] Chan Kelvin C.K., Wang Xintao, Yu Ke, Dong Chao, and Loy Chen Change. 2021. BasicVSR: The search for essential components in video super-resolution and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 49474956.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chan Kelvin C.K., Zhou Shangchen, Xu Xiangyu, and Loy Chen Change. 2022. BasicVSR++: Improving video super-resolution with enhanced propagation and alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 59725981.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Dai Jifeng, Qi Haozhi, Xiong Yuwen, Li Yi, Zhang Guodong, Hu Han, and Wei Yichen. 2017. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 764773.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Dong Chao, Chen Change Loy, He Kaiming, and Tang Xiaoou. 2016. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 2 (2016), 295307.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Dong Chao, Chen Change Loy, and Tang Xiaoou. 2016. Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision (ECCV). 391407.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Farrugia Reuben A. and Guillemot Christine. 2020. Light field super-resolution using a low-rank prior and deep convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 5 (2020), 11621175.Google ScholarGoogle Scholar
  10. [10] Fuoli Dario, Gu Shuhang, and Timofte Radu. 2019. Efficient video super-resolution through recurrent latent space propagation. In Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCVW). 34763485.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Haris Muhammad, Shakhnarovich Gregory, and Ukita Norimichi. 2018. Deep back-projection networks for super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 16641673.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Haris Muhammad, Shakhnarovich Gregory, and Ukita Norimichi. 2019. Recurrent back-projection network for video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 38923901.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Hu Mengshun, Jiang Kui, Wang Zheng, Bai Xiang, and Hu Ruimin. 2023. CycMuNet+: Cycle-projected mutual learning for spatial-temporal video super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 11 (2023), 13376–13392.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Isobe Takashi, Jia Xu, Gu Shuhang, Li Songjiang, Wang Shengjin, and Tian Qi. 2020. Video super-resolution with recurrent structure-detail network. In Proceedings of the European Conference on Computer Vision (ECCV). 645660.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Isobe Takashi, Li Songjiang, Jia Xu, Yuan Shanxin, Slabaugh Gregory, Xu Chunjing, Li Ya-Li, Wang Shengjin, and Tian Qi. 2020. Video super-resolution with temporal group attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 80088017.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Jiang Kui, Wang Zhongyuan, Yi Peng, Lu Tao, Jiang Junjun, and Xiong Zixiang. 2020. Dual-path deep fusion network for face image hallucination. IEEE Transactions on Neural Networks and Learning Systems 33, 1 (2020), 378391.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Jiang Kui, Wang Zhongyuan, Yi Peng, Wang Guangcheng, Gu Ke, and Jiang Junjun. 2019. ATMFN: Adaptive-threshold-based multi-model fusion network for compressed face hallucination. IEEE Transactions on Multimedia 22, 10 (2019), 27342747.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Jiang K., Wang Z., Yi P., Wang G., Lu T., and Jiang J.. 2019. Edge-enhanced GAN for remote sensing image superresolution. IEEE Transactions on Geoscience and Remote Sensing 57, 8 (Aug2019), 57995812.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Jo Younghyun, Oh Seoung Wug, Kang Jaeyeon, and Kim Seon Joo. 2018. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 32243232.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Kappeler Armin, Yoo Seunghwan, Dai Qiqin, and Katsaggelos Aggelos K.. 2016. Video super-resolution with convolutional neural networks. IEEE Transactions on Computational Imaging 2, 2 (2016), 109122.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Kim Jiwon, Lee Jung Kwon, and Lee Kyoung Mu. 2016. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 16461654.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  23. [23] Kwon Younghee, Kim Kwang In, Tompkin James, Kim Jin Hyung, and Theobalt Christian. 2015. Efficient learning of image super-resolution and compression artifact removal with semi-local Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 9 (2015), 17921805.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Lai Wei-Sheng, Huang Jia-Bin, Ahuja Narendra, and Yang Ming-Hsuan. 2017. Deep Laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 58355843.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Lai Wei-Sheng, Huang Jia-Bin, Ahuja Narendra, and Yang Ming-Hsuan. 2019. Fast and accurate image super-resolution with deep Laplacian pyramid networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 11 (Nov2019), 25992613.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Ledig Christian, Wang Zehan, Shi Wenzhe, Theis Lucas, Huszar Ferenc, Caballero Jose, Cunningham Andrew, Acosta Alejandro, Aitken Andrew, and Tejani Alykhan. 2016. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 105114.Google ScholarGoogle Scholar
  27. [27] Li Dingyi and Wang Zengfu. 2017. Video superresolution via motion compensation and deep residual learning. IEEE Transactions on Computational Imaging 3, 4 (2017), 749762.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Liu Ce and Sun Deqing. 2014. On Bayesian adaptive video super resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 2 (2014), 346–60.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Liu Ding, Wang Zhaowen, Fan Yuchen, Liu Xianming, Wang Zhangyang, Chang Shiyu, and Huang Thomas. 2017. Robust video super-resolution with learned temporal dynamics. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 25262534.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Liu Ding, Wang Zhaowen, Fan Yuchen, Liu Xianming, Wang Zhangyang, Chang Shiyu, Wang Xinchao, and Huang Thomas S.. 2018. Learning temporal dynamics for video super-resolution: A deep learning approach. IEEE Transactions on Image Processing 27, 7 (2018), 34323445.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Nah Seungjun, Baik Sungyong, Hong Seokil, Moon Gyeongsik, Son Sanghyun, Timofte Radu, and Lee Kyoung Mu. 2019. NTIRE 2019 challenge on video deblurring and super-resolution: Dataset and study. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 19962005.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Sajjadi Mehdi S. M, Vemulapalli Raviteja, and Brown Matthew. 2018. Frame-recurrent video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 66266634.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Shi Wenzhe, Caballero Jose, Huszár Ferenc, Totz Johannes, Aitken Andrew P., Bishop Rob, Rueckert Daniel, and Wang Zehan. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 18741883.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Shi Xingjian, Chen Zhourong, Wang Hao, Woo Wang Chun, Woo Wang Chun, and Woo Wang Chun. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS). 802810.Google ScholarGoogle Scholar
  35. [35] Tao Xin, Gao Hongyun, Liao Renjie, Wang Jue, and Jia Jiaya. 2017. Detail-revealing deep video super-resolution. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 44824490.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Tian Yapeng, Zhang Yulun, Fu Yun, and Xu Chenliang. 2020. TDAN: Temporally-deformable alignment network for video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 33603369.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Tong Tong, Li Gen, Liu Xiejie, and Gao Qinquan. 2017. Image super-resolution using dense skip connections. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 48094817.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Wang Xintao, Chan Kelvin C. K., Yu Ke, Dong Chao, and Loy Chen Change. 2019. EDVR: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 19541963.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 77947803.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Wang Xintao, Yu Ke, Wu Shixiang, Gu Jinjin, Liu Yihao, Dong Chao, Qiao Yu, and Loy Chen Change. 2018. ESRGAN: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision Workshops (ECCVW). 6379.Google ScholarGoogle Scholar
  41. [41] Wang Zhou, Bovik A.C., Sheikh H.R., and Simoncelli E.P.. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Wang Zhihao, Chen Jian, and Hoi Steven C. H.. 2021. Deep learning for image super-resolution: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 10 (2021), 33653387.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Wang Zhongyuan, Yi Peng, Jiang Kui, Jiang Junjun, Han Zhen, Lu Tao, and Ma Jiayi. 2019. Multi-memory convolutional neural network for video super-resolution. IEEE Transactions on Image Processing 28, 5 (2019), 25302544.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Xia Bin, He Jingwen, Zhang Yulun, Wang Yitong, Tian Yapeng, Yang Wenming, and Gool Luc Van. 2023. Structured sparsity learning for efficient video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2263822647.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Xue Tianfan, Chen Baian, Wu Jiajun, Wei Donglai, and Freeman William T.. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 11061125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Yan Bo, Lin Chuming, and Tan Weimin. 2019. Frame and feature-context video super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence. 55975604.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Yi Peng, Wang Zhongyuan, Jiang Kui, Jiang Junjun, Lu Tao, and Ma Jiayi. 2022. A progressive fusion generative adversarial network for realistic and consistent video super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 5 (2022), 22642280.Google ScholarGoogle Scholar
  48. [48] Yi Peng, Wang Zhongyuan, Jiang Kui, Jiang Junjun, Lu Tao, Tian Xin, and Ma Jiayi. 2021. Omniscient video super-resolution. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 44094418.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Yi Peng, Wang Zhongyuan, Jiang Kui, Jiang Junjun, and Ma Jiayi. 2019. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 31063115.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Yi Peng, Wang Zhongyuan, Jiang Kui, Shao Zhenfeng, and Ma Jiayi. 2020. Multi-temporal ultra dense memory network for video super-resolution. IEEE Transactions on Circuits and Systems for Video Technology 30, 8 (2020), 25032516.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Yu Xin, Fernando Basura, Hartley Richard, and Porikli Fatih. 2020. Semantic face hallucination: Super-resolving very low-resolution face images with supplementary attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 11 (2020), 29262943.Google ScholarGoogle Scholar
  52. [52] Zhang Yulun, Li Kunpeng, Li Kai, Wang Lichen, Zhong Bineng, and Fu Yun. 2018. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV). 294310.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Zhang Yulun, Tian Yapeng, Kong Yu, Zhong Bineng, and Fu Yun. 2018. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 24722481.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Zhang Z., Wang X., and Jung C.. 2019. DCSR: Dilated convolutions for single image super-resolution. IEEE Transactions on Image Processing 28, 4 (April2019), 16251635.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Zhu Xizhou, Hu Han, Lin Stephen, and Dai Jifeng. 2019. Deformable ConvNets V2: More deformable, better results. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 93009308.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Omniscient Video Super-Resolution with Explicit-Implicit Alignment

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 5
      May 2024
      650 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3613634
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 February 2024
      • Online AM: 11 January 2024
      • Accepted: 8 January 2024
      • Revised: 22 December 2023
      • Received: 27 September 2023
      Published in tomm Volume 20, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)172
      • Downloads (Last 6 weeks)42

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text