ABSTRACT
Video frame interpolation techniques provide a smoother visual experience by enhancing the temporal resolution of videos. To generate intermediate frames, numerous techniques estimate various parameters, such as optical flow and occlusion masks, directly on the original resolution images. As a result, processing high-resolution images requires more computing power and inference time. This paper proposes a lightweight network for high-resolution video frame interpolation that performs a complete interpolation workflow on low-resolution images to provide accurate low-resolution optical flow and occlusion masks. To effectively restore the optical flow and mask of the original resolution image, we propose an extremely lightweight temporal difference enhancement module that makes use of the hidden motion information in the temporal difference to aid in the restoration of optical flow and mask. The proposed network has comparable performance and faster inference speed for high-resolution video interpolation compared to the current mainstream network. The ablation experiment demonstrates the importance of the temporal difference module.
- Pierre Charbonnier, Laure Blanc-Feraud, Gilles Aubert, and Michel Barlaud. 1994. Two deterministic half-quadratic regularization algorithms for computed imaging. In Proceedings of 1st International Conference on Image Processing, Vol. 2. IEEE, 168–172.Google ScholarCross Ref
- Xianhang Cheng and Zhenzhong Chen. 2021. Multiple video frame interpolation via enhanced deformable separable convolution. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).Google ScholarDigital Library
- Myungsub Choi, Heewon Kim, Bohyung Han, Ning Xu, and Kyoung Mu Lee. 2020. Channel attention is all you need for video frame interpolation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10663–10671.Google ScholarCross Ref
- Duolikun Danier, Fan Zhang, and David Bull. 2021. Spatio-Temporal Multi-Flow Network for Video Frame Interpolation. arXiv preprint arXiv:2111.15483 (2021).Google Scholar
- John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5515–5524.Google Scholar
- Shurui Gui, Chaoyue Wang, Qihua Chen, and Dacheng Tao. 2020. Featureflow: Robust video interpolation via structure-to-texture generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14004–14013.Google ScholarCross Ref
- Berthold KP Horn and Brian G Schunck. 1981. Determining optical flow. Artificial intelligence 17, 1-3 (1981), 185–203.Google Scholar
- Takashi Isobe, Xu Jia, Xin Tao, Changlin Li, Ruihuang Li, Yongjie Shi, Jing Mu, Huchuan Lu, and Yu-Wing Tai. 2022. Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17411–17420.Google ScholarCross Ref
- Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, and Jan Kautz. 2018. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9000–9008.Google ScholarCross Ref
- Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Xiaoming Huang, Ying Tai, Chengjie Wang, and Jie Yang. 2022. IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1969–1978.Google ScholarCross Ref
- Hyeongmin Lee, Taeoh Kim, Tae-young Chung, Daehyun Pak, Yuseok Ban, and Sangyoun Lee. 2020. Adacof: Adaptive collaboration of flows for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5316–5325.Google ScholarCross Ref
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).Google Scholar
- Simon Meister, Junhwa Hur, and Stefan Roth. 2018. Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.Google ScholarCross Ref
- Simon Niklaus and Feng Liu. 2020. Softmax splatting for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5437–5446.Google ScholarCross Ref
- Simon Niklaus, Long Mai, and Feng Liu. 2017. Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision. 261–270.Google ScholarCross Ref
- Junheum Park, Keunsoo Ko, Chul Lee, and Chang-Su Kim. 2020. Bmbc: Bilateral motion estimation with bilateral cost volume for video interpolation. In European Conference on Computer Vision. Springer, 109–125.Google ScholarDigital Library
- Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 724–732.Google ScholarCross Ref
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234–241.Google ScholarCross Ref
- Hyeonjun Sim, Jihyong Oh, and Munchurl Kim. 2021. Xvfi: Extreme video frame interpolation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14489–14498.Google ScholarCross Ref
- Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, 2020. Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence 43, 10 (2020), 3349–3364.Google Scholar
- Limin Wang, Zhan Tong, Bin Ji, and Gangshan Wu. 2021. Tdn: Temporal difference networks for efficient action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1895–1904.Google ScholarCross Ref
- Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision. Springer, 20–36.Google ScholarCross Ref
- Chao-Yuan Wu, Nayan Singhal, and Philipp Krahenbuhl. 2018. Video compression through image interpolation. In Proceedings of the European conference on computer vision (ECCV). 416–431.Google ScholarDigital Library
- Jin Xin, Wu Longhai, Shen Guotao, Chen Youxin, Chen Jie, Koo Jayoon, and Hahm Cheul-hee. 2022. Enhanced Bi-directional Motion Estimation for Video Frame Interpolation. arXiv preprint arXiv:2206.08572 (2022).Google Scholar
- Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.Google ScholarDigital Library
Index Terms
- Temporal Difference Enhancement for High-Resolution Video Frame Interpolation
Recommendations
Cross-view Resolution and Frame Rate Joint Enhancement for Binocular Video
MM '23: Proceedings of the 31st ACM International Conference on MultimediaWith the popular of stereo video and free-viewpoint video, binocular and multi-view video enhancement has attracted increasing attention. Current binocular video enhancement methods mainly focus on stereo super-resolution. In this paper, we tend to ...
How Video Super-Resolution and Frame Interpolation Mutually Benefit
MM '21: Proceedings of the 29th ACM International Conference on MultimediaVideo super-resolution (VSR) and video frame interpolation (VFI) are inter-dependent for enhancing videos of low resolution and low frame rate. However, most studies treat VSR and temporal VFI as independent tasks. In this work, we design a spatial-...
Frame and feature-context video super-resolution
AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial IntelligenceFor video super-resolution, current state-of-the-art approaches either process multiple low-resolution (LR) frames to produce each output high-resolution (HR) frame separately in a sliding window fashion or recurrently exploit the previously estimated HR ...
Comments