research-article

Disentangle Propagation and Restoration for Efficient Video Recovery

Authors:

Yan LuAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 8336 - 8345

https://doi.org/10.1145/3581783.3611922

Published: 27 October 2023 Publication History

Abstract

We propose the first framework for accelerating video recovery, which aims to efficiently recover high-quality videos from degraded inputs affected by various deteriorative factors. Although current video recovery methods have achieved excellent performance, their significant computational overhead limits their widespread application. To address this, we present a pioneering study on explicitly disentangling temporal and spatial redundant computation by decomposing the input frame into propagation and restoration regions, thereby achieving significant computational reduction. Specifically, we leverage contrastive learning to learn degradation-invariant features, which overcomes the disturbance of deteriorative factors and enables accurate disentanglement. For the propagation region, we introduce a split-fusion block to address inter-frame variations, efficiently generating high-quality output at a low cost and significantly reducing temporal redundant computation. For the restoration region, we propose an efficient adaptive halting mechanism that requires few extra parameters and can adaptively halt the patch processing, considerably reducing spatial redundant computation. Furthermore, we design patch-adaptive prior regularization to boost efficiency and performance. Our proposed method achieves outstanding results on various video recovery tasks, such as video denoising, video deraining, video dehazing, and video super-resolution, with a 50% ~ 60% reduction in GMAC over the state-of-the-art video recovery methods while maintaining comparable performance.

Supplemental Material

MP4 File

Video presentation

Download
29.42 MB

References

[1]

Andrea Banino, Jan Balaguer, and Charles Blundell. 2021. Pondernet: Learning to ponder. arXiv preprint arXiv:2107.05407 (2021).

[2]

Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2020. BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond. arXiv preprint arXiv:2012.02181 (2020).

[3]

Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2021a. BasicVSR: The search for essential components in video super-resolution and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4947--4956.

[4]

Kelvin CK Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2021b. BasicVSR: Improving Video Super-Resolution with Enhanced Propagation and Alignment. arXiv preprint arXiv:2104.13371 (2021).

[5]

Kelvin CK Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2021c. BasicVSR: Improving Video Super-Resolution with Enhanced Propagation and Alignment. arXiv preprint arXiv:2104.13371 (2021).

[6]

Bohong Chen, Mingbao Lin, Kekai Sheng, Mengdan Zhang, Peixian Chen, Ke Li, Liujuan Cao, and Rongrong Ji. 2022. Arm: Any-time super-resolution method. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XIX. Springer, 254--270.

[7]

Jie Chen, Cheen-Hau Tan, Junhui Hou, Lap-Pui Chau, and He Li. 2018. Robust video content alignment and compensation for rain removal in a cnn framework. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6286--6295.

[8]

Axel Davy, Thibaud Ehret, Jean-Michel Morel, Pablo Arias, and Gabriele Facciolo. 2018. Non-local video denoising by cnn. arXiv preprint arXiv:1811.12758 (2018).

[9]

Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, and Ming-Hsuan Yang. 2020. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2157--2167.

[10]

Dario Fuoli, Shuhang Gu, and Radu Timofte. 2019. Efficient video super-resolution through recurrent latent space propagation. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 3476--3485.

[11]

Alex Graves. 2016. Adaptive computation time for recurrent neural networks. arXiv preprint arXiv:1603.08983 (2016).

[12]

Amirhossein Habibian, Davide Abati, Taco S Cohen, and Babak Ehteshami Bejnordi. 2021. Skip-convolutions for efficient video processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2695--2704.

[13]

Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2019. Recurrent Back-Projection Network for Video Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]

Kaiming He, Jian Sun, and Xiaoou Tang. 2010. Single image haze removal using dark channel prior. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 12 (2010), 2341--2353.

[15]

Ming Hong, Yuan Xie, Cuihua Li, and Yanyun Qu. 2020. Distilling image dehazing with heterogeneous task imitation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3462--3471.

[16]

Cong Huang, Jiahao Li, Bin Li, Dong Liu, and Yan Lu. 2022. Neural compression-based feature learning for video restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5872--5881.

[17]

Zheng Hui, Xinbo Gao, Yunchu Yang, and Xiumei Wang. 2019. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th acm international conference on multimedia. 2024--2032.

Digital Library

[18]

Takashi Isobe, Xu Jia, Shuhang Gu, Songjiang Li, Shengjin Wang, and Qi Tian. 2020a. Video super-resolution with recurrent structure-detail network. In European Conference on Computer Vision. Springer, 645--660.

Digital Library

[19]

Takashi Isobe, Xu Jia, Shuhang Gu, Songjiang Li, Shengjin Wang, and Qi Tian. 2020b. Video super-resolution with recurrent structure-detail network. In European conference on computer vision. Springer, 645--660.

Digital Library

[20]

Takashi Isobe, Xu Jia, Xin Tao, Changlin Li, Ruihuang Li, Yongjie Shi, Jing Mu, Huchuan Lu, and Yu-Wing Tai. 2022. Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17411--17420.

[21]

Takashi Isobe, Fang Zhu, Xu Jia, and Shengjin Wang. 2020c. Revisiting temporal modeling for video super-resolution. arXiv preprint arXiv:2008.05765 (2020).

[22]

Tai-Xiang Jiang, Ting-Zhu Huang, Xi-Le Zhao, Liang-Jian Deng, and Yao Wang. 2018. Fastderain: A novel video rain streak removal method using directional gradient priors. IEEE Transactions on Image Processing, Vol. 28, 4 (2018), 2089--2102.

Digital Library

[23]

Younghyun Jo, Seoung Wug Oh, Jaeyeon Kang, and Seon Joo Kim. 2018. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3224--3232.

[24]

Xiangtao Kong, Hengyuan Zhao, Yu Qiao, and Chao Dong. 2021. Classsr: A general framework to accelerate super-resolution networks by data characteristic. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12016--12025.

[25]

Jiahao Li, Bin Li, and Yan Lu. 2021. Deep Contextual Video Compression. Advances in Neural Information Processing Systems, Vol. 34 (2021).

[26]

Jiahao Li, Bin Li, and Yan Lu. 2022a. Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression. In Proceedings of the 30th ACM International Conference on Multimedia.

Digital Library

[27]

Jiahao Li, Bin Li, and Yan Lu. 2023. Neural Video Compression with Diverse Contexts. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, Canada, June 18--22, 2023.

[28]

Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Song Han, and Jun-Yan Zhu. 2022b. Efficient spatially sparse inference for conditional gans and diffusion models. arXiv preprint arXiv:2211.02048 (2022).

[29]

Minghan Li, Qi Xie, Qian Zhao, Wei Wei, Shuhang Gu, Jing Tao, and Deyu Meng. 2018. Video rain streak removal by multiscale convolutional sparse coding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6644--6653.

[30]

Wenbo Li, Xin Tao, Taian Guo, Lu Qi, Jiangbo Lu, and Jiaya Jia. 2020. MuCAN: Multi-correspondence Aggregation Network for Video Super-Resolution. In Computer Vision - ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 335--351.

Digital Library

[31]

Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, and Luc V Gool. 2022. Recurrent video restoration transformer with guided deformable attention. Advances in Neural Information Processing Systems, Vol. 35 (2022), 378--393.

[32]

Ce Liu and Deqing Sun. 2013. On Bayesian adaptive video super resolution. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 2 (2013), 346--360.

[33]

Jiaying Liu, Wenhan Yang, Shuai Yang, and Zongming Guo. 2018. Erase or fill? deep joint recurrent rain removal and reconstruction in videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3233--3242.

[34]

Ming Liu, Zhilu Zhang, Liya Hou, Wangmeng Zuo, and Lei Zhang. 2020. Deep adaptive inference networks for single image super-resolution. In Computer Vision--ECCV 2020 Workshops: Glasgow, UK, August 23-28, 2020, Proceedings, Part IV 16. Springer, 131--148.

Digital Library

[35]

Xiaohong Liu, Yongrui Ma, Zhihao Shi, and Jun Chen. 2019a. Griddehazenet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7314--7323.

[36]

Xing Liu, Masanori Suganuma, Zhun Sun, and Takayuki Okatani. 2019b. Dual residual networks leveraging the potential of paired operations for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7007--7016.

[37]

Matteo Maggioni, Yibin Huang, Cheng Li, Shuai Xiao, Zhongqian Fu, and Fenglong Song. 2021. Efficient Multi-Stage Video Denoising With Recurrent Spatio-Temporal Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3466--3475.

[38]

Paras Maharjan, Li Li, Zhu Li, Ning Xu, Chongyang Ma, and Yue Li. 2019. Improving Extreme Low-Light Image Denoising via Residual Learning. In IEEE International Conference on Multimedia and Expo, ICME 2019, Shanghai, China, July 8-12, 2019. 916--921. https://doi.org/10.1109/ICME.2019.00162

[39]

Seungjun Nah, Sungyong Baik, Seokil Hong, Gyeongsik Moon, Sanghyun Son, Radu Timofte, and Kyoung Mu Lee. 2019. NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.

[40]

Mathias Parger, Chengcheng Tang, Christopher D Twigg, Cem Keskin, Robert Wang, and Markus Steinberger. 2022. DeltaCNN: end-to-end CNN inference of sparse frame differences in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12497--12506.

[41]

F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung. 2016. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. In Computer Vision and Pattern Recognition.

[42]

Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and Huizhu Jia. 2020. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11908--11915.

[43]

Mengye Ren, Andrei Pokrovsky, Bin Yang, and Raquel Urtasun. 2018a. Sbnet: Sparse blocks network for fast inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8711--8720.

[44]

Wenqi Ren, Jingang Zhang, Xiangyu Xu, Lin Ma, Xiaochun Cao, Gaofeng Meng, and Wei Liu. 2018b. Deep video dehazing with semantic segmentation. IEEE Transactions on Image Processing, Vol. 28, 4 (2018), 1895--1908.

Digital Library

[45]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.

[46]

Mehdi SM Sajjadi, Raviteja Vemulapalli, and Matthew Brown. 2018. Frame-recurrent video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6626--6634.

[47]

Xihua Sheng, Jiahao Li, Bin Li, Li Li, Dong Liu, and Yan Lu. 2022. Temporal context mining for learned video compression. IEEE Transactions on Multimedia (2022).

Digital Library

[48]

Matias Tassano, Julie Delon, and Thomas Veit. 2019. DVDnet: A fast network for deep video denoising. In 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 1805--1809.

[49]

Matias Tassano, Julie Delon, and Thomas Veit. 2020. Fastdvdnet: Towards real-time deep video denoising without flow estimation. In CVPR. 1354--1363.

[50]

Gregory Vaksman, Michael Elad, and Peyman Milanfar. 2021. Patch Craft: Video Denoising by Deep Modeling and Patch Matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2157--2166.

[51]

Guo-Hua Wang, Jiahao Li, Bin Li, and Yan Lu. 2023. EVC: Towards Real-Time Neural Image Compression with Mask Decay. In International Conference on Learning Representations.

[52]

Longguang Wang, Xiaoyu Dong, Yingqian Wang, Xinyi Ying, Zaiping Lin, Wei An, and Yulan Guo. 2021. Exploring sparsity in image super-resolution for efficient inference. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4917--4926.

[53]

Shizun Wang, Jiaming Liu, Kaixin Chen, Xiaoqi Li, Ming Lu, and Yandong Guo. 2022. Adaptive Patch Exiting for Scalable Single Image Super-Resolution. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XVIII. Springer, 292--307.

[54]

Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, and Chen Change Loy. 2019a. EDVR: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0--0.

[55]

Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, and Chen Change Loy. 2019b. Edvr: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0--0.

[56]

Wei Wei, Lixuan Yi, Qi Xie, Qian Zhao, Deyu Meng, and Zongben Xu. 2017. Should we encode rain streaks in video as deterministic or stochastic?. In Proceedings of the IEEE International Conference on Computer Vision. 2516--2525.

[57]

Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S Davis, Kristen Grauman, and Rogerio Feris. 2018. Blockdrop: Dynamic inference paths in residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8817--8826.

[58]

Wenbin Xie, Dehua Song, Chang Xu, Chunjing Xu, Hui Zhang, and Yunhe Wang. 2021. Learning frequency-aware dynamic network for efficient super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4308--4317.

[59]

Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019a. Video enhancement with task-oriented flow. International Journal of Computer Vision, Vol. 127, 8 (2019), 1106--1125.

Digital Library

[60]

Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019b. Video Enhancement with Task-Oriented Flow. International Journal of Computer Vision (IJCV), Vol. 127, 8 (2019), 1106--1125.

Digital Library

[61]

Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019c. Video Enhancement with Task-Oriented Flow. International Journal of Computer Vision (IJCV), Vol. 127, 8 (2019), 1106--1125.

Digital Library

[62]

Bo Yan, Chuming Lin, and Weimin Tan. 2019. Frame and feature-context video super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5597--5604.

Digital Library

[63]

Wenhan Yang, Jiaying Liu, and Jiashi Feng. 2019. Frame-consistent recurrent video deraining with dual-level flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1661--1670.

[64]

Wenhan Yang, Robby T Tan, Jiashi Feng, Shiqi Wang, Bin Cheng, and Jiaying Liu. 2021. Recurrent Multi-Frame Deraining: Combining Physics Guidance and Adversarial Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

[65]

Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, and Jiayi Ma. 2019. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. In Proceedings of the IEEE/CVF international conference on computer vision. 3106--3115.

[66]

Hongxu Yin, Arash Vahdat, Jose M Alvarez, Arun Mallya, Jan Kautz, and Pavlo Molchanov. 2022. A-vit: Adaptive tokens for efficient vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10809--10818.

[67]

Huanjing Yue, Cong Cao, Lei Liao, Ronghe Chu, and Jingyu Yang. 2020. Supervised raw video denoising with a benchmark dataset on dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2301--2310.

[68]

Xinyi Zhang, Hang Dong, Jinshan Pan, Chao Zhu, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, and Fei Wang. 2021. Learning To Restore Hazy Video: A New Real-World Dataset and a New Method. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9239--9248.

[69]

Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. 2018. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV). 286--301.

Digital Library

Cited By

Qi LJia ZLi JLi BLi HLu Y(2024)Long-Term Temporal Context Gathering for Neural Video CompressionComputer Vision – ECCV 202410.1007/978-3-031-72848-8_18(305-322)Online publication date: 29-Nov-2024
https://doi.org/10.1007/978-3-031-72848-8_18

Index Terms

Disentangle Propagation and Restoration for Efficient Video Recovery
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction

Recommendations

Contracting for Infrequent Restoration and Recovery of Mission-Critical Systems

Firms that rely on functioning mission-critical equipment for their businesses cannot afford significant operational downtime due to system disruptions. To minimize the impact of disruptions, a proper incentive mechanism has to be in place so that the ...
Improving Micro-video Recommendation by Controlling Position Bias
Machine Learning and Knowledge Discovery in Databases
Abstract
As the micro-video apps become popular, the numbers of micro-videos and users increase rapidly, which highlights the importance of micro-video recommendation. Although the micro-video recommendation can be naturally treated as the sequential ...
A systematic approach to system state restoration during storage controller micro-recovery
FAST '09: Proccedings of the 7th conference on File and storage technologies

Micro-recovery, or failure recovery at a fine granularity, is a promising approach to improve the recovery time of software for modern storage systems. Instead of stalling the whole system during failure recovery, micro-recovery can facilitate recovery ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
259
Total Downloads

Downloads (Last 12 months)129
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qi LJia ZLi JLi BLi HLu Y(2024)Long-Term Temporal Context Gathering for Neural Video CompressionComputer Vision – ECCV 202410.1007/978-3-031-72848-8_18(305-322)Online publication date: 29-Nov-2024
https://doi.org/10.1007/978-3-031-72848-8_18

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten