skip to main content
10.1145/3664647.3681006acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Learning to Handle Large Obstructions in Video Frame Interpolation

Published: 28 October 2024 Publication History

Abstract

Video frame interpolation based on optical flow has made great progress in recent years. Most of the previous studies have focused on improving the quality of clean videos. However, many real-world videos contain large obstructions making the video discontinuous. To address this challenge, we propose our Obstruction Robustness Framework (ORF) that enhances the robustness of existing VFI networks in the face of large obstructions. The ORF contains two components: (1) A feature repair module that first captures ambiguous pixels in the synthetic frame by a region similarity map, then repairs them with a cross-overlap attention module. (2) A data augmentation strategy that enables the network to handle dynamic obstructions without extra data. To the best of our knowledge, this is the first work that explicitly addresses the error caused by large obstructions in video frame interpolation. By using previous state-of-the-art methods as backbones, our method does not only improve the results in original benchmarks but also significantly enhances the interpolation quality for videos with obstructions.

References

[1]
Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming- Hsuan Yang. 2019. Depth-aware video frame interpolation. In CVPR. 3703--3712.
[2]
Xianhang Cheng and Zhenzhong Chen. 2021. Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution. IEEE TPAMI 44, 10 (2021), 7029--7045.
[3]
Jinsoo Choi and In So Kweon. 2019. Deep Iterative Frame Interpolation for Full-frame Video Stabilization. ACM Transactions on Graphics (TOG) 38, 6 (2019), Article No. 235.
[4]
Myungsub Choi, Heewon Kim, Bohyung Han, Ning Xu, and Kyoung Mu Lee. 2020. Channel Attention Is All You Need for Video Frame Interpolation. In AAAI. 10663--10671.
[5]
Tianyu Ding, Luming Liang, Zhihui Zhu, and Ilya Zharkov. 2021. CDFI: Compression-Driven Network Design for Frame Interpolation. In CVPR. 8001-- 8011.
[6]
Philipp Dufter, Martin Schmitt, and Hinrich Schütze. 2022. Position information in transformers: An overview. Computational Linguistics 48, 3 (2022), 733--763.
[7]
Ping Hu, Simon Niklaus, Stan Sclaroff, and Kate Saenko. 2022. Many-to-many splatting for efficient video frame interpolation. In CVPR. 3553--3562.
[8]
Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi, and Shuchang Zhou. 2022. Real-time intermediate flow estimation for video frame interpolation. In ECCV. Part XIV: 624--642.
[9]
Zhaoyang Jia, Yan Lu, and Houqiang Li. 2022. Neighbor Correspondence Matching for Flow-based Video Frame Synthesis. In Proceedings of the ACM Multimedia Conference.
[10]
Huaizu Jiang, Deqing Sun, Varan Jampani, Ming-Hsuan Yang, Erik Learned- Miller, and Jan Kautz. 2018. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. In CVPR. 9000--9008.
[11]
Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-Based View Synthesis for Light Field Cameras. ACM TOG 35, 6 (2016), 1--10.
[12]
Tarun Kalluri, Deepak Pathak, Manmohan Chandraker, and Du Tran. 2023. Flavr: Flow-agnostic video representations for fast frame interpolation. In CVPR. 2071-- 2082.
[13]
Soo Ye Kim, Jihyong Oh, and Munchurl Kim. 2020. FISR: Deep Joint Frame Interpolation and Super-Resolution with a Multi-scale Temporal Loss. In AAAI. 11278--11286.
[14]
Taewoo Kim, Yujeong Chae, Hyun-Kurl Jang, and Kuk-Jin Yoon. 2023. Event- Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields. In CVPR. 18032--18042.
[15]
Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Xiaoming Huang, Ying Tai, ChengjieWang, and Jie Yang. 2022. IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation. In CVPR. 1959--1968.
[16]
Hyeongmin Lee, Taeoh Kim, Tae-young Chung, Daehyun Pak, Yuseok Ban, and Sangyoun Lee. 2020. AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation. In CVPR. 5315--5324.
[17]
Sangjin Lee, Hyeongmin Lee, Chajin Shin, Hanbin Son, and Sangyoun Lee. 2023. Exploring discontinuity for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9791--9800.
[18]
Ruoteng Li, XintaoWang, Yue Luo, and Ying Shan. 2022. HybridWarping Fusion for Video Frame Interpolation. International Journal of Computer Vision 130 (2022), 2980--2993.
[19]
Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, and Jia-Bin Huang. 2020. Learning to See Through Obstructions. In CVPR. 14215--14224.
[20]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In ICCV. 10012--10022.
[21]
Gucan Long, Laurent Kneip, Jose M Alvarez, Hongdong Li, Xiaohu Zhang, and Qifeng Yu. 2016. Learning Image Matching by Simply Watching Video. In ECCV. Springer, Part VI 14: 434--450.
[22]
Guo Lu, Xiaoyun Zhang, Li Chen, and Zhiyong Gao. 2018. Novel Integration of Frame Rate Up Conversion and HEVC Coding Based on Rate-Distortion Optimization. IEEE TIP 27, 2 (2018), 678--691.
[23]
Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, and Jiaya Jia. 2022. Video Frame Interpolation with Transformer. In CVPR. 3522--3532.
[24]
Simone Meyer, Abdelaziz Djelouah, Brian McWilliams, Alexander Sorkine- Hornung, Markus Gross, and Christopher Schroers. 2018. PhaseNet for Video Frame Interpolation. In CVPR. 498--507.
[25]
Simone Meyer, Oliver Wang, Henning Zimmer, Max Grosse, and Alexander Sorkine-Hornung. 2015. Phase-based frame interpolation for video. In CVPR. 1410--1418.
[26]
Simon Niklaus and Feng Liu. 2018. Context-aware Synthesis for Video Frame Interpolation. In CVPR. 645--654.
[27]
Simon Niklaus and Feng Liu. 2020. Softmax splatting for video frame interpolation. In CVPR. 5437--5446.
[28]
Simon Niklaus, Long Mai, and Feng Liu. 2017. Video Frame Interpolation via Adaptive Convolution. In CVPR. 2270--2279.
[29]
Simon Niklaus, Long Mai, and Feng Liu. 2017. Video Frame Interpolation via Adaptive Separable Convolution. In ICCV. 261--270.
[30]
Junheum Park, Jintae Kim, and Chang-Su Kim. 2023. BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation. In CVPR. 1568--1577.
[31]
Junheum Park, Keunsoo Ko, Chul Lee, and Chang-Su Kim. 2020. BMBC: Bilateral Motion Estimation with Bilateral Cost Volume for Video Interpolation. In ECCV. Part XIV 16: 109--125.
[32]
Junheum Park, Chul Lee, and Chang-Su Kim. 2021. Asymmetric Bilateral Motion Estimation for Video Frame Interpolation. In ICCV. 14519--14528.
[33]
Tomer Peleg, Pablo Szekely, Doron Sabo, and Omry Sendik. 2019. IM-Net for High Resolution Video Frame Interpolation. In CVPR. 2393--2402.
[34]
Markus Plack, Karlis Martins Briedis, Abdelaziz Djelouah, Matthias B. Hullin, Markus Gross, and Christopher Schroers. 2023. Frame Interpolation Transformer and Uncertainty Guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9811--9821.
[35]
Fitsum Reda, Janne Kontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru, and Brian Curless. 2022. FILM: Frame Interpolation for Large Motion. In ECCV. Part VII, 250--266.
[36]
Zheng Shi, Yuval Bahat, Seung-Hwan Baek, Qiang Fu, Hadi Amata, Xiao Li, Praneeth Chakravarthula, Wolfgang Heidrich, and Felix Heide. 2022. Seeing through Obstructions with Diffractive Cloaking. ACM TOG 41, 4 (2022). https: //doi.org/10.1145/3528223.3530185
[37]
Zhihao Shi, Xiangyu XU, Xiaohong Liu, Jun Chen, and Ming-Hsuan Yang. 2022. Video Frame Interpolation Transformer. In CVPR. 17461--17470.
[38]
Patrice Y Simard, Dave Steinkraus, and John C Platt. 2003. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In International Conference on Document Analysis and Recognition. 958--963.
[39]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE TIP 13, 4 (2004), 600--612.
[40]
Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, and Chenliang Xu. 2020. Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution. In CVPR. 3367--3376.
[41]
Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019. Video Enhancement with Task-Oriented Flow. IJCV 127, 8 (2019), 1106--1125.
[42]
Jun-Sang Yoo, Hongjae Lee, and Seung-Won Jung. 2023. Video Object Segmentation-aware Video Frame Interpolation. In ICCV. 12322--12333.
[43]
Guozhen Zhang, Yuhan Zhu, Haonan Wang, Youxin Chen, Gangshan Wu, and Limin Wang. 2023. Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In CVPR. 5682--5692.
[44]
Chengcheng Zhou, Zongqing Lu, Qiangyu Yan, Linge Li, and Jing-Hao Xue. 2021. How Video Super-Resolution and Frame Interpolation Mutually Benefit. In Proceedings of the ACM Multimedia Conference.
[45]
Tinghui Zhou, Shubham Tulsiani,Weilun Sun, Jitendra Malik, and Alexei A Efros. 2016. View synthesis by appearance flow. In ECCV. Part IV 14, 286--301.
[46]
Chengxuan Zhu, Renjie Wan, Yunkai Tang, and Boxin Shi. 2023. Occlusion-Free Scene Recovery via Neural Radiance Fields. In CVPR. 20722--20731

Index Terms

  1. Learning to Handle Large Obstructions in Video Frame Interpolation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-attention}
    2. keywords{video frame interpolation
    3. occlusion/obstruction handling

    Qualifiers

    • Research-article

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 123
      Total Downloads
    • Downloads (Last 12 months)123
    • Downloads (Last 6 weeks)58
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media