skip to main content
10.1145/3587716.3587788acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

Temporal Difference Enhancement for High-Resolution Video Frame Interpolation

Published:07 September 2023Publication History

ABSTRACT

Video frame interpolation techniques provide a smoother visual experience by enhancing the temporal resolution of videos. To generate intermediate frames, numerous techniques estimate various parameters, such as optical flow and occlusion masks, directly on the original resolution images. As a result, processing high-resolution images requires more computing power and inference time. This paper proposes a lightweight network for high-resolution video frame interpolation that performs a complete interpolation workflow on low-resolution images to provide accurate low-resolution optical flow and occlusion masks. To effectively restore the optical flow and mask of the original resolution image, we propose an extremely lightweight temporal difference enhancement module that makes use of the hidden motion information in the temporal difference to aid in the restoration of optical flow and mask. The proposed network has comparable performance and faster inference speed for high-resolution video interpolation compared to the current mainstream network. The ablation experiment demonstrates the importance of the temporal difference module.

References

  1. Pierre Charbonnier, Laure Blanc-Feraud, Gilles Aubert, and Michel Barlaud. 1994. Two deterministic half-quadratic regularization algorithms for computed imaging. In Proceedings of 1st International Conference on Image Processing, Vol. 2. IEEE, 168–172.Google ScholarGoogle ScholarCross RefCross Ref
  2. Xianhang Cheng and Zhenzhong Chen. 2021. Multiple video frame interpolation via enhanced deformable separable convolution. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Myungsub Choi, Heewon Kim, Bohyung Han, Ning Xu, and Kyoung Mu Lee. 2020. Channel attention is all you need for video frame interpolation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10663–10671.Google ScholarGoogle ScholarCross RefCross Ref
  4. Duolikun Danier, Fan Zhang, and David Bull. 2021. Spatio-Temporal Multi-Flow Network for Video Frame Interpolation. arXiv preprint arXiv:2111.15483 (2021).Google ScholarGoogle Scholar
  5. John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5515–5524.Google ScholarGoogle Scholar
  6. Shurui Gui, Chaoyue Wang, Qihua Chen, and Dacheng Tao. 2020. Featureflow: Robust video interpolation via structure-to-texture generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14004–14013.Google ScholarGoogle ScholarCross RefCross Ref
  7. Berthold KP Horn and Brian G Schunck. 1981. Determining optical flow. Artificial intelligence 17, 1-3 (1981), 185–203.Google ScholarGoogle Scholar
  8. Takashi Isobe, Xu Jia, Xin Tao, Changlin Li, Ruihuang Li, Yongjie Shi, Jing Mu, Huchuan Lu, and Yu-Wing Tai. 2022. Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17411–17420.Google ScholarGoogle ScholarCross RefCross Ref
  9. Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, and Jan Kautz. 2018. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9000–9008.Google ScholarGoogle ScholarCross RefCross Ref
  10. Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Xiaoming Huang, Ying Tai, Chengjie Wang, and Jie Yang. 2022. IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1969–1978.Google ScholarGoogle ScholarCross RefCross Ref
  11. Hyeongmin Lee, Taeoh Kim, Tae-young Chung, Daehyun Pak, Yuseok Ban, and Sangyoun Lee. 2020. Adacof: Adaptive collaboration of flows for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5316–5325.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).Google ScholarGoogle Scholar
  13. Simon Meister, Junhwa Hur, and Stefan Roth. 2018. Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  14. Simon Niklaus and Feng Liu. 2020. Softmax splatting for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5437–5446.Google ScholarGoogle ScholarCross RefCross Ref
  15. Simon Niklaus, Long Mai, and Feng Liu. 2017. Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision. 261–270.Google ScholarGoogle ScholarCross RefCross Ref
  16. Junheum Park, Keunsoo Ko, Chul Lee, and Chang-Su Kim. 2020. Bmbc: Bilateral motion estimation with bilateral cost volume for video interpolation. In European Conference on Computer Vision. Springer, 109–125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 724–732.Google ScholarGoogle ScholarCross RefCross Ref
  18. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234–241.Google ScholarGoogle ScholarCross RefCross Ref
  19. Hyeonjun Sim, Jihyong Oh, and Munchurl Kim. 2021. Xvfi: Extreme video frame interpolation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14489–14498.Google ScholarGoogle ScholarCross RefCross Ref
  20. Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, 2020. Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence 43, 10 (2020), 3349–3364.Google ScholarGoogle Scholar
  21. Limin Wang, Zhan Tong, Bin Ji, and Gangshan Wu. 2021. Tdn: Temporal difference networks for efficient action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1895–1904.Google ScholarGoogle ScholarCross RefCross Ref
  22. Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision. Springer, 20–36.Google ScholarGoogle ScholarCross RefCross Ref
  23. Chao-Yuan Wu, Nayan Singhal, and Philipp Krahenbuhl. 2018. Video compression through image interpolation. In Proceedings of the European conference on computer vision (ECCV). 416–431.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jin Xin, Wu Longhai, Shen Guotao, Chen Youxin, Chen Jie, Koo Jayoon, and Hahm Cheul-hee. 2022. Enhanced Bi-directional Motion Estimation for Video Frame Interpolation. arXiv preprint arXiv:2206.08572 (2022).Google ScholarGoogle Scholar
  25. Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Temporal Difference Enhancement for High-Resolution Video Frame Interpolation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing
      February 2023
      619 pages
      ISBN:9781450398411
      DOI:10.1145/3587716

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 September 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format