research-article

L²BEC²: Local Lightweight Bidirectional Encoding and Channel Attention Cascade for Video Frame Interpolation

Authors:
Dengyong Zhang

Changsha University of Science and Technology, Changsha, China

Changsha University of Science and Technology, Changsha, China

0000-0002-2789-2980
View Profile

,
Pu Huang

Changsha University of Science and Technology, Changsha, China

Changsha University of Science and Technology, Changsha, China

0000-0002-0700-1490
View Profile

,
Xiangling Ding

Hunan University of Science and Technology, State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences Zhengzhou Xinda Institute of Advanced Technology, Zhengzhou, China

Hunan University of Science and Technology, State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences Zhengzhou Xinda Institute of Advanced Technology, Zhengzhou, China

0000-0002-6581-4633
View Profile

,
Feng Li

Changsha University of Science and Technology, Changsha, China

Changsha University of Science and Technology, Changsha, China

0000-0003-2718-9918
View Profile

,
Wenjie Zhu

Changsha University of Science and Technology, Changsha, China

Changsha University of Science and Technology, Changsha, China

0000-0002-2414-9307
View Profile

,
Yun Song

Changsha University of Science and Technology, Changsha, China

Changsha University of Science and Technology, Changsha, China

0000-0002-6962-8475
View Profile

,
Gaobo Yang

Hunan University, Changsha, China

Hunan University, Changsha, China

0000-0003-2734-659X
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19 Issue 2Article No.: 66pp 1–19https://doi.org/10.1145/3547660

Published:06 February 2023Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Video frame interpolation (VFI) is of great importance for many video applications, yet it is still challenging even in the era of deep learning. Some existing VFI models directly exploit existing lightweight network frameworks, thus making synthesized in-between frames blurry and creating artifacts due to imprecise motion representation. The other existing VFI models typically depend on heavy model architectures with a large number of parameters, preventing them from being deployed on small terminals. To address these issues, we propose a local lightweight VFI network (L²BEC²) that leverages bidirectional encoding structure with channel attention cascade. Specifically, we improve visual quality by introducing a forward and backward encoding structure with channel attention cascade to better characterize motion information. Furthermore, we introduce a local lightweight strategy into the state-of-the-art Adaptive Collaboration of Flows (AdaCoF) model to simplify its model parameters. Compared with the original AdaCoF model, the proposed L²BEC² obtains performance gain at the cost of only one-third of the number of parameters and performs favorably against the state-of-the-art works on public datasets. Our source code is available at https://github.com/Pumpkin123709/LBEC.git.

REFERENCES

[1] He Jiale, Yang Gaobo, Liu Xin, and Ding Xiangling. 2020. Spatio-temporal saliency-based motion vector refinement for frame rate up-conversion. ACM Trans. Multimedia Comput. Commun. Appl. 16, 2, Article 55 (May2020), 18 pages. Google ScholarDigital Library
[2] Lee Hyeongmin, Kim Taeoh, Chung Tae-young, Pak Daehyun, Ban Yuseok, and Lee Sangyoun. 2020. AdaCoF: Adaptive collaboration of flows for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 5315–5324. Google ScholarCross Ref
[3] Jiang Huaizu, Sun Deqing, Jampani Varan, Yang Ming-Hsuan, Learned-Miller Erik, and Kautz Jan. 2018. Super SloMo: High quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9000–9008. Google ScholarCross Ref
[4] Bao Wenbo, Lai Wei-Sheng, Ma Chao, Zhang Xiaoyun, Gao Zhiyong, and Yang Ming-Hsuan. 2019. Depth-aware video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 3698–3707. Google ScholarCross Ref
[5] Flynn John, Neulander Ivan, Philbin James, and Snavely Noah. 2016. Deep stereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 5515–5524. Google ScholarCross Ref
[6] Yu Songhyun, Park Bumjun, and Jeong Jechang. 2019. PoSNet: 4x video frame interpolation using position-specific flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW’19). 3503–3511. Google ScholarCross Ref
[7] Long Gucan, Kneip Laurent, Alvarez Jose M., Li Hongdong, Zhang Xiaohu, and Yu Qifeng. 2016. Learning image matching by simply watching video. In Proceedings of the European Conference on Computer Vision (ECCV’16), Leibe Bastian, Matas Jiri, Sebe Nicu, and Welling Max (Eds.). Springer International Publishing, Cham, 434–450.Google ScholarCross Ref
[8] Li Haopeng, Yuan Yuan, and Wang Qi. 2020. Video frame interpolation via residue refinement. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). 2613–2617. Google ScholarCross Ref
[9] Bao Wenbo, Lai Wei-Sheng, Zhang Xiaoyun, Gao Zhiyong, and Yang Ming-Hsuan. 2021. MEMC-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3 (2021), 933–948. Google ScholarCross Ref
[10] Niklaus Simon, Mai Long, and Liu Feng. 2017. Video frame interpolation via adaptive convolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2270–2279. Google ScholarCross Ref
[11] Niklaus Simon, Mai Long, and Liu Feng. 2017. Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 261–270. Google ScholarCross Ref
[12] Cheng Xianhang and Chen Zhenzhong. 2020. Video frame interpolation via deformable separable convolution. Proceedings of the AAAI Conference on Artificial Intelligence 34, 7 (2020), 10607–10614.Google Scholar
[13] Ding Tianyu, Liang Luming, Zhu Zhihui, and Zharkov Ilya. 2021. CDFI: Compression-driven network design for frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 7997–8007. Google ScholarCross Ref
[14] Choi Myungsub, Kim Heewon, Han Bohyung, Xu Ning, and Lee Kyoung Mu. 2020. Channel attention is all you need for video frame interpolation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’20).Google ScholarCross Ref
[15] Cheng Xianhang and Chen Zhenzhong. 2021. Multiple video frame interpolation via enhanced deformable separable convolution. IEEE Trans. Pattern Anal. Mach. Intell. (2021), 1–1. Google ScholarDigital Library
[16] Liu Zhouyong, Luo Shun, Li Wubin, Lu Jingben, Wu Yufan, Sun Shilei, Li Chunguo, and Yang Luxi. 2020. ConvTransformer: A convolutional transformer network for video frame synthesis. arXiv:2011.10185. Retrieved from https://arXiv.org/abs/2011.10185.Google Scholar
[17] Chi Zhixiang, Nasiri Rasoul Mohammadi, Liu Zheng, Lu Juwei, Tang Jin, and Plataniotis Konstantinos N. 2020. All at once: Temporally adaptive multi-frame interpolation with advanced motion modeling. arXiv:2007.11762. Retrieved from https://arXiv:org/abs/2007.11762.Google Scholar
[18] Liu Ziwei, Yeh Raymond A., Tang Xiaoou, Liu Yiming, and Agarwala Aseem. 2017. Video frame synthesis using deep voxel flow. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 4473–4481. Google ScholarCross Ref
[19] Xue Tianfan, Chen Baian, Wu Jiajun, Wei Donglai, and Freeman William T.. 2017. Video enhancement with task-oriented flow. arXiv:1711.09078. Retrieved from https://arXiv.org/abs/1711.09078.Google Scholar
[20] Yuan Liangzhe, Chen Yibo, Liu Hantian, Kong Tao, and Shi Jianbo. 2019. Zoom-in-to-check: boosting video interpolation via instance-level discrimination. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 12175–12183. Google ScholarCross Ref
[21] Howard Andrew G., Zhu Menglong, Chen Bo, Kalenichenko Dmitry, Wang Weijun, Weyand Tobias, Andreetto Marco, and Adam Hartwig. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved from https://arXiv.org/abs/1704.04861.Google Scholar
[22] Chollet François. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1800–1807. Google ScholarCross Ref
[23] Dosovitskiy Alexey, Fischer Philipp, Ilg Eddy, Häusser Philip, Hazirbas Caner, Golkov Vladimir, Smagt Patrick van der, Cremers Daniel, and Brox Thomas. 2015. FlowNet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 2758–2766. Google ScholarDigital Library
[24] Huang Zhewei, Zhang Tianyuan, Heng Wen, Shi Boxin, and Zhou Shuchang. 2020. RIFE: Real-time intermediate flow estimation for video frame interpolation. arXiv:2011.06294. Retrieved from https://arXiv.org/abs/2011.06294.Google Scholar
[25] Park Minho, Kim Hak Gu, Lee Sangmin, and Ro Yong Man. 2021. Robust video frame interpolation with exceptional motion map. IEEE Trans. Circ. Syst. Vid. Technol. 31, 2 (2021), 754–764. Google ScholarCross Ref
[26] Li Haopeng, Yuan Yuan, and Wang Qi. 2019. FI-net: A lightweight video frame interpolation network using feature-level flow. IEEE Access 7 (2019), 118287–118296. Google ScholarCross Ref
[27] Meyer Simone, Djelouah Abdelaziz, McWilliams Brian, Sorkine-Hornung Alexander, Gross Markus, and Schroers Christopher. 2018. PhaseNet for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 498–507. Google ScholarCross Ref
[28] Koren Mark, Menda Kunal, and Sharma Apoorva. 2017. Frame interpolation using generative adversarial networks. (2017).Google Scholar
[29] Tran Quang Nhat and Yang Shih-Hsuan. 2020. Efficient video frame interpolation using generative adversarial networks. Appl. Sci. 10, 18 (2020), 6245. Google ScholarCross Ref
[30] Xiao Jian and Bi Xiaojun. 2020. Multi-scale attention generative adversarial networks for video frame interpolation. IEEE Access 8 (2020), 94842–94851. Google ScholarCross Ref
[31] Zhuang Jiankai, Qin Zengchang, Chen Jialu, and Wan Tao. 2020. A lightweight network model for video frame interpolation using spatial Pyramids. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20). 543–547. Google ScholarCross Ref
[32] Gupta Akash, Aich Abhishek, and Roy-Chowdhury Amit K.. 2020. ALANET: Adaptive latent attention network forjoint video deblurring and interpolation. arXiv:2009.01005. Retrieved from https://arXiv.org/abs/2009.01005.Google Scholar
[33] Zagoruyko Sergey and Komodakis Nikos. 2016. Wide residual networks. arXiv:1605.07146. Retrieved from https://arXiv.org/abs/1605.0714.Google Scholar
[34] Bengio Yoshua. 2009. Learning deep architectures for AI. Now Publishers Inc.Google Scholar
[35] Baker Simon, Roth Stefan, Scharstein Daniel, Black Michael J., Lewis J. P., and Szeliski Richard. 2007. A database and evaluation methodology for optical flow. In Proceedings of the IEEE 11th International Conference on Computer Vision. 1–8. Google ScholarCross Ref
[36] Soomro Khurram, Zamir Amir Roshan, and Shah Mubarak. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402. Retrieved from https://arXiv.org/abs/1212.0402.Google Scholar
[37] Perazzi F., Pont-Tuset J., McWilliams B., Gool L. Van, Gross M., and Sorkine-Hornung A.. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 724–732. Google ScholarCross Ref
[38] Johnson Justin, Alahi Alexandre, and Fei-Fei Li. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV’16), Leibe Bastian, Matas Jiri, Sebe Nicu, and Welling Max (Eds.). Springer International Publishing, Cham, 694–711.Google ScholarCross Ref
[39] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, Desmaison Alban, Köpf Andreas, Yang Edward, DeVito Zach, Raison Martin, Tejani Alykhan, Chilamkurthy Sasank, Steiner Benoit, Fang Lu, Bai Junjie, and Chintala Soumith. 2019. PyTorch: An imperative style, high-performance deep learning library. arXiv:1912.01703. Retrieved from https://arXiv.org/abs/1912.01703.Google Scholar
[40] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from https://arXiv.org/abs/1412.6980.Google Scholar
[41] Wang Zhou, Bovik A. C., Sheikh H. R., and Simoncelli E. P.. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600–612. Google ScholarDigital Library
[42] Ioannou Yani, Robertson Duncan, Cipolla Roberto, and Criminisi Antonio. 2017. Deep roots: Improving CNN efficiency with hierarchical filter groups. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5977–5986. Google ScholarCross Ref
[43] Zhang Xiangyu, Zhou Xinyu, Lin Mengxiao, and Sun Jian. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of hte IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6848–6856. Google ScholarCross Ref

Index Terms

L²BEC²: Local Lightweight Bidirectional Encoding and Channel Attention Cascade for Video Frame Interpolation
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Video Frame Interpolation: A Comprehensive Survey
Video Frame Interpolation (VFI) is a fascinating and challenging problem in the computer vision (CV) field, aiming to generate non-existing frames between two consecutive video frames. In recent years, many algorithms based on optical flow, kernel, or ...
Read More
Spatial Degrees of Freedom for MIMO Interference Channel with Local Channel State Information at Transmitters

This paper discusses inner bound and outer bound for the total number of spatial degrees of freedom (DoF) of the K-user MIMO interference channel with only local channel state information at each transmitter when channel extensions are disabled over ...
Read More
Lightweight image super-resolution with feature cheap convolution and attention mechanism
Abstract
Since deep learning is introduced into the field of super-resolution (SR), many deep learning-based SR methods have been proposed and achieved good results. At present, most neural networks use ordinary convolution and deeper neural layer in image ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19, Issue 2
March 2023
540 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3572860
Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 February 2023
- Online AM: 15 July 2022
- Accepted: 5 July 2022
- Revised: 17 June 2022
- Received: 28 November 2021
Published in tomm Volume 19, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Video frame interpolation
lightweight network
bidirectional encoding
channel attention cascade
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 359
  Total Downloads
- Downloads (Last 12 months)235
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

L²BEC²: Local Lightweight Bidirectional Encoding and Channel Attention Cascade for Video Frame Interpolation

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Video Frame Interpolation: A Comprehensive Survey

Spatial Degrees of Freedom for MIMO Interference Channel with Local Channel State Information at Transmitters

Lightweight image super-resolution with feature cheap convolution and attention mechanism