skip to main content
research-article

High-quality Frame Recurrent Video De-raining with Multi-contextual Adversarial Network

Published: 11 May 2021 Publication History

Abstract

In this article, we address the problem of rain-streak removal in the videos. Unlike the image, challenges in video restoration comprise temporal consistency besides spatial enhancement. The researchers across the world have proposed several effective methods for estimating the de-noised videos with outstanding temporal consistency. However, such methods also amplify the computational cost due to their larger size. By way of analysis, incorporating separate modules for spatial and temporal enhancement may require more computational resources. It motivates us to propose a unified architecture that directly estimates the de-rained frame with maximal visual quality and minimal computational cost. To this end, we present a deep learning-based Frame-recurrent Multi-contextual Adversarial Network for rain-streak removal in videos. The proposed model is built upon a Conditional Generative Adversarial Network (CGAN)-based framework where the generator model directly estimates the de-rained frame from the previously estimated one with the help of its multi-contextual adversary. To optimize the proposed model, we have incorporated the Perceptual loss function in addition to the conventional Euclidean distance. Also, instead of traditional entropy loss from the adversary, we propose to use the Euclidean distance between the features of de-rained and clean frames, extracted from the discriminator model as a cost function for video de-raining. Various experimental observations across 11 test sets, with over 10 state-of-the-art methods, using 14 image-quality metrics, prove the efficacy of the proposed work, both visually and computationally.

Supplementary Material

a56-sharma-suppl.pdf (sharma.zip)
Supplemental movie, appendix, image and software files for, High-quality Frame Recurrent Video De-raining with Multi-contextual Adversarial Network

References

[1]
Jie Chen and Lap-Pui Chau. 2013. A rain pixel recovery algorithm for videos with highly dynamic scenes. IEEE Trans. Image Proc. 23 (11 2013).
[2]
Jie Chen, Cheen-Hau Tan, Junhui Hou, Lap-Pui Chau, and He Li. 2018. Robust video content alignment and compensation for rain removal in a CNN framework. arxiv:cs.CV/1803.10433 (2018).
[3]
N. Divakar and R. V. Babu. 2017. Image denoising via CNNs: An adversarial approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). 1076--1083.
[4]
X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley. 2017. Removing rain from single images via a deep detail network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1715--1723.
[5]
K. Garg and S. K. Nayar. 2004. Detection and removal of rain from videos. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[6]
K. Garg and S. K. Nayar. 2005. When does a camera see rain? In Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV’05). 1067--1074.
[7]
Kshitiz Garg and Shree K. Nayar. 2007. Vision and rain. Int. J. Comput. Vision 75, 1 (Oct. 2007), 3--27.
[8]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arxiv:stat.ML/1406.2661 (2014).
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. CoRR abs/1512.03385 (2015).
[10]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015).
[11]
T. Jiang, T. Huang, X. Zhao, L. Deng, and Y. Wang. 2019. FastDeRain: A novel video rain streak removal method using directional gradient priors. IEEE Trans. Image Proc. 28, 4 (2019), 2089--2102.
[12]
Tai-Xiang Jiang, Ting-Zhu Huang, Xi-Le Zhao, Liang-Jian Deng, and Yao Wang. 2017. A novel tensor-based video rain streaks removal approach via utilizing discriminatively intrinsic priors. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[13]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision.
[14]
L. Kang, C. Lin, and Y. Fu. 2012. Automatic single-image-based rain streaks removal via image decomposition. IEEE Trans. Image Proc. 21, 4 (2012), 1742--1755.
[15]
J. Kim, J. Sim, and C. Kim. 2015. Video deraining and desnowing using temporal correlation and low-rank matrix completion. IEEE Trans. Image Proc. 24, 9 (2015), 2658--2670.
[16]
Diederik P. Kingma and Jimmy Ba. 2014. ADAM: A Method for Stochastic Optimization. Retrieved from http://arxiv.org/abs/1412.6980.
[17]
M. Li, Q. Xie, Q. Zhao, W. Wei, S. Gu, J. Tao, and D. Meng. 2018. Video rain streak removal by multiscale convolutional sparse coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6644--6653.
[18]
Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown. 2016. Rain streak removal using layer priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 2736--2744.
[19]
J. Liu, W. Yang, S. Yang, and Z. Guo. 2018. Erase or fill? Deep joint recurrent rain removal and reconstruction in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3233--3242.
[20]
J. Liu, W. Yang, S. Yang, and Z. Guo. 2019. D3R-Net: Dynamic routing residue recurrent network for video rain removal. IEEE Trans. Image Proc. 28, 2 (2019), 699--712.
[21]
A. Mittal, A. K. Moorthy, and A. C. Bovik. 2012. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Proc. 21, 12 (2012), 4695--4708.
[22]
A. Mittal, R. Soundararajan, and A. C. Bovik. 2013. Making a “Completely Blind” image quality analyzer. IEEE Sig. Proc. Lett. 20, 3 (2013), 209--212.
[23]
Nai-Xiang Lian, V. Zagorodnov, and Yap-Peng Tan. 2006. Edge-preserving image denoising via optimal color space projection. IEEE Trans. Image Proc. 15, 9 (2006), 2575--2587.
[24]
John F. Nash. 1950. Equilibrium points in n-person games. Proc. Nat. Acad. Sci. 36, 1 (1950), 48--49.
[25]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In Proceedings of the Conference on Neural Information Processing Systems.
[26]
Weihong Ren, Jiandong Tian, Zhi Han, Antoni Chan, and Yandong Tang. 2017. Video desnowing and deraining based on matrix decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
[27]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. CoRR abs/1505.04597 (2015).
[28]
Mehdi S. M. Sajjadi, Raviteja Vemulapalli, and Matthew Brown. 2018. Frame-recurrent video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).
[29]
Prasen Sharma, Priyankar Jain, and Arijit Sur. 2020. Scale-aware conditional generative adversarial network for image dehazing. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’20).
[30]
P. K. Sharma, P. Jain, and A. Sur. 2019. Dual-domain single image de-raining using conditional generative adversarial network. In Proceedings of the IEEE International Conference on Image Processing (ICIP’19). 2796--2800.
[31]
H. R. Sheikh and A. C. Bovik. 2006. Image information and visual quality. IEEE Trans. Image Proc. 15, 2 (2006), 430--444.
[32]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.
[33]
S. Sun, S. Fan, and Y. F. Wang. 2014. Exploiting image structural similarity for single image rain removal. In Proceedings of the IEEE International Conference on Image Processing (ICIP’14). 4482--4486.
[34]
N. Venkatanath, D. Praneeth, Maruthi Chandrasekhar Bh, S. S. Channappayya, and S. S. Medasani. 2015. Blind image quality evaluation using perception based features. In Proceedings of the 21st National Conference on Communications (NCC’15). 1--6.
[35]
J. Vis, Peter Barnum, Srinivasa Narasimhan, and Takeo Kanade. 2010. Analysis of rain and snow in frequency space. Int. J. Comput. Vis. 86 (01 2010).
[36]
Z. Wang, E. P. Simoncelli, and A. C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In Proceedings of the 37th Asilomar Conference on Signals, Systems Computers. 1398--1402.
[37]
Wei Wei, Lixuan Yi, Qi Xie, Qian Zhao, Deyu Meng, and Zongben Xu. 2017. Should we encode rain streaks in video as deterministic or stochastic? In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[38]
W. Xue, L. Zhang, X. Mou, and A. C. Bovik. 2014. Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. IEEE Trans. Image Proc. 23, 2 (2014), 684--695.
[39]
W. Yang, J. Liu, and J. Feng. 2019. Frame-consistent recurrent video deraining with dual-level flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 1661--1670.
[40]
W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan. 2017. Deep joint rain detection and removal from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1685--1694.
[41]
H. Zhang, V. Sindagi, and V. M. Patel. 2019. Image de-raining using a conditional generative adversarial network. IEEE Trans. Circ. Syst. Vid. Technol.
[42]
H. Zhang, V. Sindagi, and V. M. Patel. 2020. Joint transmission map estimation and dehazing using deep networks. IEEE Trans. Circ. Syst. Vid. Technol. 30, 7 (2020), 1975--1986.
[43]
L. Zhang, L. Zhang, X. Mou, and D. Zhang. 2011. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Proc. 20, 8 (2011), 2378--2386.
[44]
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[45]
X. Zhang, H. Li, Y. Qi, W. K. Leow, and T. K. Ng. 2006. Rain removal in video by combining temporal and chromatic properties. In Proceedings of the IEEE International Conference on Multimedia and Expo. 461--464.
[46]
X. Zhang, H. Li, Y. Qi, W. K. Leow, and T. K. Ng. 2006. Rain removal in video by combining temporal and chromatic properties. In Proceedings of the IEEE International Conference on Multimedia and Expo. 461--464.
[47]
Zhou Wang and A. C. Bovik. 2002. A universal image quality index. IEEE Sig. Proc. Lett. 9, 3 (2002), 81--84.
[48]
Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Proc. 13, 4 (2004), 600--612.

Cited By

View all
  • (2022)Triple-Level Model Inferred Collaborative Network Architecture for Video DerainingIEEE Transactions on Image Processing10.1109/TIP.2021.312832731(239-250)Online publication date: 1-Jan-2022

Index Terms

  1. High-quality Frame Recurrent Video De-raining with Multi-contextual Adversarial Network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2
    May 2021
    410 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3461621
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 May 2021
    Accepted: 01 December 2020
    Revised: 01 September 2020
    Received: 01 May 2020
    Published in TOMM Volume 17, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Video de-raining
    2. deep learning
    3. generative adversarial network

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Department of Biotechnology, Govt. of India
    • Ministry of Human Resource Development, Government of India

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Triple-Level Model Inferred Collaborative Network Architecture for Video DerainingIEEE Transactions on Image Processing10.1109/TIP.2021.312832731(239-250)Online publication date: 1-Jan-2022

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media