skip to main content
10.1145/3581783.3611922acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Disentangle Propagation and Restoration for Efficient Video Recovery

Published: 27 October 2023 Publication History

Abstract

We propose the first framework for accelerating video recovery, which aims to efficiently recover high-quality videos from degraded inputs affected by various deteriorative factors. Although current video recovery methods have achieved excellent performance, their significant computational overhead limits their widespread application. To address this, we present a pioneering study on explicitly disentangling temporal and spatial redundant computation by decomposing the input frame into propagation and restoration regions, thereby achieving significant computational reduction. Specifically, we leverage contrastive learning to learn degradation-invariant features, which overcomes the disturbance of deteriorative factors and enables accurate disentanglement. For the propagation region, we introduce a split-fusion block to address inter-frame variations, efficiently generating high-quality output at a low cost and significantly reducing temporal redundant computation. For the restoration region, we propose an efficient adaptive halting mechanism that requires few extra parameters and can adaptively halt the patch processing, considerably reducing spatial redundant computation. Furthermore, we design patch-adaptive prior regularization to boost efficiency and performance. Our proposed method achieves outstanding results on various video recovery tasks, such as video denoising, video deraining, video dehazing, and video super-resolution, with a 50% ~ 60% reduction in GMAC over the state-of-the-art video recovery methods while maintaining comparable performance.

Supplemental Material

MP4 File
Video presentation

References

[1]
Andrea Banino, Jan Balaguer, and Charles Blundell. 2021. Pondernet: Learning to ponder. arXiv preprint arXiv:2107.05407 (2021).
[2]
Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2020. BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond. arXiv preprint arXiv:2012.02181 (2020).
[3]
Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2021a. BasicVSR: The search for essential components in video super-resolution and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4947--4956.
[4]
Kelvin CK Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2021b. BasicVSR: Improving Video Super-Resolution with Enhanced Propagation and Alignment. arXiv preprint arXiv:2104.13371 (2021).
[5]
Kelvin CK Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2021c. BasicVSR: Improving Video Super-Resolution with Enhanced Propagation and Alignment. arXiv preprint arXiv:2104.13371 (2021).
[6]
Bohong Chen, Mingbao Lin, Kekai Sheng, Mengdan Zhang, Peixian Chen, Ke Li, Liujuan Cao, and Rongrong Ji. 2022. Arm: Any-time super-resolution method. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XIX. Springer, 254--270.
[7]
Jie Chen, Cheen-Hau Tan, Junhui Hou, Lap-Pui Chau, and He Li. 2018. Robust video content alignment and compensation for rain removal in a cnn framework. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6286--6295.
[8]
Axel Davy, Thibaud Ehret, Jean-Michel Morel, Pablo Arias, and Gabriele Facciolo. 2018. Non-local video denoising by cnn. arXiv preprint arXiv:1811.12758 (2018).
[9]
Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, and Ming-Hsuan Yang. 2020. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2157--2167.
[10]
Dario Fuoli, Shuhang Gu, and Radu Timofte. 2019. Efficient video super-resolution through recurrent latent space propagation. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 3476--3485.
[11]
Alex Graves. 2016. Adaptive computation time for recurrent neural networks. arXiv preprint arXiv:1603.08983 (2016).
[12]
Amirhossein Habibian, Davide Abati, Taco S Cohen, and Babak Ehteshami Bejnordi. 2021. Skip-convolutions for efficient video processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2695--2704.
[13]
Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2019. Recurrent Back-Projection Network for Video Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14]
Kaiming He, Jian Sun, and Xiaoou Tang. 2010. Single image haze removal using dark channel prior. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 12 (2010), 2341--2353.
[15]
Ming Hong, Yuan Xie, Cuihua Li, and Yanyun Qu. 2020. Distilling image dehazing with heterogeneous task imitation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3462--3471.
[16]
Cong Huang, Jiahao Li, Bin Li, Dong Liu, and Yan Lu. 2022. Neural compression-based feature learning for video restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5872--5881.
[17]
Zheng Hui, Xinbo Gao, Yunchu Yang, and Xiumei Wang. 2019. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th acm international conference on multimedia. 2024--2032.
[18]
Takashi Isobe, Xu Jia, Shuhang Gu, Songjiang Li, Shengjin Wang, and Qi Tian. 2020a. Video super-resolution with recurrent structure-detail network. In European Conference on Computer Vision. Springer, 645--660.
[19]
Takashi Isobe, Xu Jia, Shuhang Gu, Songjiang Li, Shengjin Wang, and Qi Tian. 2020b. Video super-resolution with recurrent structure-detail network. In European conference on computer vision. Springer, 645--660.
[20]
Takashi Isobe, Xu Jia, Xin Tao, Changlin Li, Ruihuang Li, Yongjie Shi, Jing Mu, Huchuan Lu, and Yu-Wing Tai. 2022. Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17411--17420.
[21]
Takashi Isobe, Fang Zhu, Xu Jia, and Shengjin Wang. 2020c. Revisiting temporal modeling for video super-resolution. arXiv preprint arXiv:2008.05765 (2020).
[22]
Tai-Xiang Jiang, Ting-Zhu Huang, Xi-Le Zhao, Liang-Jian Deng, and Yao Wang. 2018. Fastderain: A novel video rain streak removal method using directional gradient priors. IEEE Transactions on Image Processing, Vol. 28, 4 (2018), 2089--2102.
[23]
Younghyun Jo, Seoung Wug Oh, Jaeyeon Kang, and Seon Joo Kim. 2018. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3224--3232.
[24]
Xiangtao Kong, Hengyuan Zhao, Yu Qiao, and Chao Dong. 2021. Classsr: A general framework to accelerate super-resolution networks by data characteristic. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12016--12025.
[25]
Jiahao Li, Bin Li, and Yan Lu. 2021. Deep Contextual Video Compression. Advances in Neural Information Processing Systems, Vol. 34 (2021).
[26]
Jiahao Li, Bin Li, and Yan Lu. 2022a. Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression. In Proceedings of the 30th ACM International Conference on Multimedia.
[27]
Jiahao Li, Bin Li, and Yan Lu. 2023. Neural Video Compression with Diverse Contexts. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, Canada, June 18--22, 2023.
[28]
Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Song Han, and Jun-Yan Zhu. 2022b. Efficient spatially sparse inference for conditional gans and diffusion models. arXiv preprint arXiv:2211.02048 (2022).
[29]
Minghan Li, Qi Xie, Qian Zhao, Wei Wei, Shuhang Gu, Jing Tao, and Deyu Meng. 2018. Video rain streak removal by multiscale convolutional sparse coding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6644--6653.
[30]
Wenbo Li, Xin Tao, Taian Guo, Lu Qi, Jiangbo Lu, and Jiaya Jia. 2020. MuCAN: Multi-correspondence Aggregation Network for Video Super-Resolution. In Computer Vision - ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 335--351.
[31]
Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, and Luc V Gool. 2022. Recurrent video restoration transformer with guided deformable attention. Advances in Neural Information Processing Systems, Vol. 35 (2022), 378--393.
[32]
Ce Liu and Deqing Sun. 2013. On Bayesian adaptive video super resolution. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 2 (2013), 346--360.
[33]
Jiaying Liu, Wenhan Yang, Shuai Yang, and Zongming Guo. 2018. Erase or fill? deep joint recurrent rain removal and reconstruction in videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3233--3242.
[34]
Ming Liu, Zhilu Zhang, Liya Hou, Wangmeng Zuo, and Lei Zhang. 2020. Deep adaptive inference networks for single image super-resolution. In Computer Vision--ECCV 2020 Workshops: Glasgow, UK, August 23-28, 2020, Proceedings, Part IV 16. Springer, 131--148.
[35]
Xiaohong Liu, Yongrui Ma, Zhihao Shi, and Jun Chen. 2019a. Griddehazenet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7314--7323.
[36]
Xing Liu, Masanori Suganuma, Zhun Sun, and Takayuki Okatani. 2019b. Dual residual networks leveraging the potential of paired operations for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7007--7016.
[37]
Matteo Maggioni, Yibin Huang, Cheng Li, Shuai Xiao, Zhongqian Fu, and Fenglong Song. 2021. Efficient Multi-Stage Video Denoising With Recurrent Spatio-Temporal Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3466--3475.
[38]
Paras Maharjan, Li Li, Zhu Li, Ning Xu, Chongyang Ma, and Yue Li. 2019. Improving Extreme Low-Light Image Denoising via Residual Learning. In IEEE International Conference on Multimedia and Expo, ICME 2019, Shanghai, China, July 8-12, 2019. 916--921. https://doi.org/10.1109/ICME.2019.00162
[39]
Seungjun Nah, Sungyong Baik, Seokil Hong, Gyeongsik Moon, Sanghyun Son, Radu Timofte, and Kyoung Mu Lee. 2019. NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
[40]
Mathias Parger, Chengcheng Tang, Christopher D Twigg, Cem Keskin, Robert Wang, and Markus Steinberger. 2022. DeltaCNN: end-to-end CNN inference of sparse frame differences in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12497--12506.
[41]
F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung. 2016. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. In Computer Vision and Pattern Recognition.
[42]
Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and Huizhu Jia. 2020. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11908--11915.
[43]
Mengye Ren, Andrei Pokrovsky, Bin Yang, and Raquel Urtasun. 2018a. Sbnet: Sparse blocks network for fast inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8711--8720.
[44]
Wenqi Ren, Jingang Zhang, Xiangyu Xu, Lin Ma, Xiaochun Cao, Gaofeng Meng, and Wei Liu. 2018b. Deep video dehazing with semantic segmentation. IEEE Transactions on Image Processing, Vol. 28, 4 (2018), 1895--1908.
[45]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[46]
Mehdi SM Sajjadi, Raviteja Vemulapalli, and Matthew Brown. 2018. Frame-recurrent video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6626--6634.
[47]
Xihua Sheng, Jiahao Li, Bin Li, Li Li, Dong Liu, and Yan Lu. 2022. Temporal context mining for learned video compression. IEEE Transactions on Multimedia (2022).
[48]
Matias Tassano, Julie Delon, and Thomas Veit. 2019. DVDnet: A fast network for deep video denoising. In 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 1805--1809.
[49]
Matias Tassano, Julie Delon, and Thomas Veit. 2020. Fastdvdnet: Towards real-time deep video denoising without flow estimation. In CVPR. 1354--1363.
[50]
Gregory Vaksman, Michael Elad, and Peyman Milanfar. 2021. Patch Craft: Video Denoising by Deep Modeling and Patch Matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2157--2166.
[51]
Guo-Hua Wang, Jiahao Li, Bin Li, and Yan Lu. 2023. EVC: Towards Real-Time Neural Image Compression with Mask Decay. In International Conference on Learning Representations.
[52]
Longguang Wang, Xiaoyu Dong, Yingqian Wang, Xinyi Ying, Zaiping Lin, Wei An, and Yulan Guo. 2021. Exploring sparsity in image super-resolution for efficient inference. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4917--4926.
[53]
Shizun Wang, Jiaming Liu, Kaixin Chen, Xiaoqi Li, Ming Lu, and Yandong Guo. 2022. Adaptive Patch Exiting for Scalable Single Image Super-Resolution. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XVIII. Springer, 292--307.
[54]
Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, and Chen Change Loy. 2019a. EDVR: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0--0.
[55]
Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, and Chen Change Loy. 2019b. Edvr: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0--0.
[56]
Wei Wei, Lixuan Yi, Qi Xie, Qian Zhao, Deyu Meng, and Zongben Xu. 2017. Should we encode rain streaks in video as deterministic or stochastic?. In Proceedings of the IEEE International Conference on Computer Vision. 2516--2525.
[57]
Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S Davis, Kristen Grauman, and Rogerio Feris. 2018. Blockdrop: Dynamic inference paths in residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8817--8826.
[58]
Wenbin Xie, Dehua Song, Chang Xu, Chunjing Xu, Hui Zhang, and Yunhe Wang. 2021. Learning frequency-aware dynamic network for efficient super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4308--4317.
[59]
Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019a. Video enhancement with task-oriented flow. International Journal of Computer Vision, Vol. 127, 8 (2019), 1106--1125.
[60]
Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019b. Video Enhancement with Task-Oriented Flow. International Journal of Computer Vision (IJCV), Vol. 127, 8 (2019), 1106--1125.
[61]
Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019c. Video Enhancement with Task-Oriented Flow. International Journal of Computer Vision (IJCV), Vol. 127, 8 (2019), 1106--1125.
[62]
Bo Yan, Chuming Lin, and Weimin Tan. 2019. Frame and feature-context video super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5597--5604.
[63]
Wenhan Yang, Jiaying Liu, and Jiashi Feng. 2019. Frame-consistent recurrent video deraining with dual-level flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1661--1670.
[64]
Wenhan Yang, Robby T Tan, Jiashi Feng, Shiqi Wang, Bin Cheng, and Jiaying Liu. 2021. Recurrent Multi-Frame Deraining: Combining Physics Guidance and Adversarial Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
[65]
Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, and Jiayi Ma. 2019. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. In Proceedings of the IEEE/CVF international conference on computer vision. 3106--3115.
[66]
Hongxu Yin, Arash Vahdat, Jose M Alvarez, Arun Mallya, Jan Kautz, and Pavlo Molchanov. 2022. A-vit: Adaptive tokens for efficient vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10809--10818.
[67]
Huanjing Yue, Cong Cao, Lei Liao, Ronghe Chu, and Jingyu Yang. 2020. Supervised raw video denoising with a benchmark dataset on dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2301--2310.
[68]
Xinyi Zhang, Hang Dong, Jinshan Pan, Chao Zhu, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, and Fei Wang. 2021. Learning To Restore Hazy Video: A New Real-World Dataset and a New Method. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9239--9248.
[69]
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. 2018. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV). 286--301.

Cited By

View all
  • (2024)Long-Term Temporal Context Gathering for Neural Video CompressionComputer Vision – ECCV 202410.1007/978-3-031-72848-8_18(305-322)Online publication date: 29-Nov-2024

Index Terms

  1. Disentangle Propagation and Restoration for Efficient Video Recovery

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. contrastive learning
    2. disentangle
    3. efficient video recovery

    Qualifiers

    • Research-article

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)129
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Long-Term Temporal Context Gathering for Neural Video CompressionComputer Vision – ECCV 202410.1007/978-3-031-72848-8_18(305-322)Online publication date: 29-Nov-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media