skip to main content
research-article

L2BEC2: Local Lightweight Bidirectional Encoding and Channel Attention Cascade for Video Frame Interpolation

Authors Info & Claims
Published:06 February 2023Publication History
Skip Abstract Section

Abstract

Video frame interpolation (VFI) is of great importance for many video applications, yet it is still challenging even in the era of deep learning. Some existing VFI models directly exploit existing lightweight network frameworks, thus making synthesized in-between frames blurry and creating artifacts due to imprecise motion representation. The other existing VFI models typically depend on heavy model architectures with a large number of parameters, preventing them from being deployed on small terminals. To address these issues, we propose a local lightweight VFI network (L2BEC2) that leverages bidirectional encoding structure with channel attention cascade. Specifically, we improve visual quality by introducing a forward and backward encoding structure with channel attention cascade to better characterize motion information. Furthermore, we introduce a local lightweight strategy into the state-of-the-art Adaptive Collaboration of Flows (AdaCoF) model to simplify its model parameters. Compared with the original AdaCoF model, the proposed L2BEC2 obtains performance gain at the cost of only one-third of the number of parameters and performs favorably against the state-of-the-art works on public datasets. Our source code is available at https://github.com/Pumpkin123709/LBEC.git.

REFERENCES

  1. [1] He Jiale, Yang Gaobo, Liu Xin, and Ding Xiangling. 2020. Spatio-temporal saliency-based motion vector refinement for frame rate up-conversion. ACM Trans. Multimedia Comput. Commun. Appl. 16, 2, Article 55 (May2020), 18 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Lee Hyeongmin, Kim Taeoh, Chung Tae-young, Pak Daehyun, Ban Yuseok, and Lee Sangyoun. 2020. AdaCoF: Adaptive collaboration of flows for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 53155324. Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Jiang Huaizu, Sun Deqing, Jampani Varan, Yang Ming-Hsuan, Learned-Miller Erik, and Kautz Jan. 2018. Super SloMo: High quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 90009008. Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Bao Wenbo, Lai Wei-Sheng, Ma Chao, Zhang Xiaoyun, Gao Zhiyong, and Yang Ming-Hsuan. 2019. Depth-aware video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 36983707. Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Flynn John, Neulander Ivan, Philbin James, and Snavely Noah. 2016. Deep stereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 55155524. Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Yu Songhyun, Park Bumjun, and Jeong Jechang. 2019. PoSNet: 4x video frame interpolation using position-specific flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW’19). 35033511. Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Long Gucan, Kneip Laurent, Alvarez Jose M., Li Hongdong, Zhang Xiaohu, and Yu Qifeng. 2016. Learning image matching by simply watching video. In Proceedings of the European Conference on Computer Vision (ECCV’16), Leibe Bastian, Matas Jiri, Sebe Nicu, and Welling Max (Eds.). Springer International Publishing, Cham, 434450.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Li Haopeng, Yuan Yuan, and Wang Qi. 2020. Video frame interpolation via residue refinement. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). 26132617. Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Bao Wenbo, Lai Wei-Sheng, Zhang Xiaoyun, Gao Zhiyong, and Yang Ming-Hsuan. 2021. MEMC-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3 (2021), 933948. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Niklaus Simon, Mai Long, and Liu Feng. 2017. Video frame interpolation via adaptive convolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 22702279. Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Niklaus Simon, Mai Long, and Liu Feng. 2017. Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 261270. Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Cheng Xianhang and Chen Zhenzhong. 2020. Video frame interpolation via deformable separable convolution. Proceedings of the AAAI Conference on Artificial Intelligence 34, 7 (2020), 10607–10614.Google ScholarGoogle Scholar
  13. [13] Ding Tianyu, Liang Luming, Zhu Zhihui, and Zharkov Ilya. 2021. CDFI: Compression-driven network design for frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 79978007. Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Choi Myungsub, Kim Heewon, Han Bohyung, Xu Ning, and Lee Kyoung Mu. 2020. Channel attention is all you need for video frame interpolation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’20).Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Cheng Xianhang and Chen Zhenzhong. 2021. Multiple video frame interpolation via enhanced deformable separable convolution. IEEE Trans. Pattern Anal. Mach. Intell. (2021), 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Liu Zhouyong, Luo Shun, Li Wubin, Lu Jingben, Wu Yufan, Sun Shilei, Li Chunguo, and Yang Luxi. 2020. ConvTransformer: A convolutional transformer network for video frame synthesis. arXiv:2011.10185. Retrieved from https://arXiv.org/abs/2011.10185.Google ScholarGoogle Scholar
  17. [17] Chi Zhixiang, Nasiri Rasoul Mohammadi, Liu Zheng, Lu Juwei, Tang Jin, and Plataniotis Konstantinos N. 2020. All at once: Temporally adaptive multi-frame interpolation with advanced motion modeling. arXiv:2007.11762. Retrieved from https://arXiv:org/abs/2007.11762.Google ScholarGoogle Scholar
  18. [18] Liu Ziwei, Yeh Raymond A., Tang Xiaoou, Liu Yiming, and Agarwala Aseem. 2017. Video frame synthesis using deep voxel flow. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 44734481. Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Xue Tianfan, Chen Baian, Wu Jiajun, Wei Donglai, and Freeman William T.. 2017. Video enhancement with task-oriented flow. arXiv:1711.09078. Retrieved from https://arXiv.org/abs/1711.09078.Google ScholarGoogle Scholar
  20. [20] Yuan Liangzhe, Chen Yibo, Liu Hantian, Kong Tao, and Shi Jianbo. 2019. Zoom-in-to-check: boosting video interpolation via instance-level discrimination. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 1217512183. Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Howard Andrew G., Zhu Menglong, Chen Bo, Kalenichenko Dmitry, Wang Weijun, Weyand Tobias, Andreetto Marco, and Adam Hartwig. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved from https://arXiv.org/abs/1704.04861.Google ScholarGoogle Scholar
  22. [22] Chollet François. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 18001807. Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Dosovitskiy Alexey, Fischer Philipp, Ilg Eddy, Häusser Philip, Hazirbas Caner, Golkov Vladimir, Smagt Patrick van der, Cremers Daniel, and Brox Thomas. 2015. FlowNet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 27582766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Huang Zhewei, Zhang Tianyuan, Heng Wen, Shi Boxin, and Zhou Shuchang. 2020. RIFE: Real-time intermediate flow estimation for video frame interpolation. arXiv:2011.06294. Retrieved from https://arXiv.org/abs/2011.06294.Google ScholarGoogle Scholar
  25. [25] Park Minho, Kim Hak Gu, Lee Sangmin, and Ro Yong Man. 2021. Robust video frame interpolation with exceptional motion map. IEEE Trans. Circ. Syst. Vid. Technol. 31, 2 (2021), 754764. Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Li Haopeng, Yuan Yuan, and Wang Qi. 2019. FI-net: A lightweight video frame interpolation network using feature-level flow. IEEE Access 7 (2019), 118287118296. Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Meyer Simone, Djelouah Abdelaziz, McWilliams Brian, Sorkine-Hornung Alexander, Gross Markus, and Schroers Christopher. 2018. PhaseNet for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 498507. Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Koren Mark, Menda Kunal, and Sharma Apoorva. 2017. Frame interpolation using generative adversarial networks. (2017).Google ScholarGoogle Scholar
  29. [29] Tran Quang Nhat and Yang Shih-Hsuan. 2020. Efficient video frame interpolation using generative adversarial networks. Appl. Sci. 10, 18 (2020), 6245. Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Xiao Jian and Bi Xiaojun. 2020. Multi-scale attention generative adversarial networks for video frame interpolation. IEEE Access 8 (2020), 9484294851. Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Zhuang Jiankai, Qin Zengchang, Chen Jialu, and Wan Tao. 2020. A lightweight network model for video frame interpolation using spatial Pyramids. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20). 543547. Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Gupta Akash, Aich Abhishek, and Roy-Chowdhury Amit K.. 2020. ALANET: Adaptive latent attention network forjoint video deblurring and interpolation. arXiv:2009.01005. Retrieved from https://arXiv.org/abs/2009.01005.Google ScholarGoogle Scholar
  33. [33] Zagoruyko Sergey and Komodakis Nikos. 2016. Wide residual networks. arXiv:1605.07146. Retrieved from https://arXiv.org/abs/1605.0714.Google ScholarGoogle Scholar
  34. [34] Bengio Yoshua. 2009. Learning deep architectures for AI. Now Publishers Inc.Google ScholarGoogle Scholar
  35. [35] Baker Simon, Roth Stefan, Scharstein Daniel, Black Michael J., Lewis J. P., and Szeliski Richard. 2007. A database and evaluation methodology for optical flow. In Proceedings of the IEEE 11th International Conference on Computer Vision. 18. Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Soomro Khurram, Zamir Amir Roshan, and Shah Mubarak. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402. Retrieved from https://arXiv.org/abs/1212.0402.Google ScholarGoogle Scholar
  37. [37] Perazzi F., Pont-Tuset J., McWilliams B., Gool L. Van, Gross M., and Sorkine-Hornung A.. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 724732. Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Johnson Justin, Alahi Alexandre, and Fei-Fei Li. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV’16), Leibe Bastian, Matas Jiri, Sebe Nicu, and Welling Max (Eds.). Springer International Publishing, Cham, 694711.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, Desmaison Alban, Köpf Andreas, Yang Edward, DeVito Zach, Raison Martin, Tejani Alykhan, Chilamkurthy Sasank, Steiner Benoit, Fang Lu, Bai Junjie, and Chintala Soumith. 2019. PyTorch: An imperative style, high-performance deep learning library. arXiv:1912.01703. Retrieved from https://arXiv.org/abs/1912.01703.Google ScholarGoogle Scholar
  40. [40] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from https://arXiv.org/abs/1412.6980.Google ScholarGoogle Scholar
  41. [41] Wang Zhou, Bovik A. C., Sheikh H. R., and Simoncelli E. P.. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Ioannou Yani, Robertson Duncan, Cipolla Roberto, and Criminisi Antonio. 2017. Deep roots: Improving CNN efficiency with hierarchical filter groups. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 59775986. Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Zhang Xiangyu, Zhou Xinyu, Lin Mengxiao, and Sun Jian. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of hte IEEE/CVF Conference on Computer Vision and Pattern Recognition. 68486856. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. L2BEC2: Local Lightweight Bidirectional Encoding and Channel Attention Cascade for Video Frame Interpolation

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2
            March 2023
            540 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/3572860
            • Editor:
            • Abdulmotaleb El Saddik
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 6 February 2023
            • Online AM: 15 July 2022
            • Accepted: 5 July 2022
            • Revised: 17 June 2022
            • Received: 28 November 2021
            Published in tomm Volume 19, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed
          • Article Metrics

            • Downloads (Last 12 months)235
            • Downloads (Last 6 weeks)25

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format