research-article

MSTMENet: Multi-Scale Spatio-Temporal Mapping and Evolution Network for Video Deraining

Authors:

Fengqiang XuAuthors Info & Claims

MMASIA '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia

Article No.: 67, Page 1

https://doi.org/10.1145/3696409.3700228

Published: 28 December 2024 Publication History

Abstract

Video deraining is vital in computer vision as rain streaks significantly degrade image quality and impair various outdoor visual tasks. Existing methods often struggle with accurately simulating rain morphology and rely heavily on synthetic data, leading to decreased performance in real-world scenarios. To address these challenges, we introduce the Multi-Scale Spatio-Temporal Mapping Evolution Network (MSTMENet), a novel framework that incorporates multi-scale learning and attention mechanisms through a semi-supervised learning approach. MSTMENet integrates labeled synthetic data with unlabeled real-world data for joint training, effectively bridging the gap between synthetic and real-world applications. The network leverages a Multi-scale Efficient Channel Attention (MECA) mechanism and a residual network to capture intricate spatial features and temporal correlations between video frames while maintaining minimal computational cost. Extensive experiments conducted on synthetic datasets like NTURain, and real-world datasets, including RainSynLight25 and RainSynComplex25, demonstrate MSTMENet’s superior performance compared to SOTA methods. Notably, MSTMENet achieves an improvement of 1.75 dB in Peak Signal-to-Noise Ratio (PSNR) and 0.0061 in Structural Similarity Index (SSIM), underscoring the network’s capability to deliver high-quality deraining results across diverse conditions and scales.

References

[1]

Wasyihun Sema Admass, Yirga Yayeh Munaye, and Girmaw Andualem Bogale. 2024. Convolutional neural networks and histogram-oriented gradients: a hybrid approach for automatic mango disease detection and classification. International Journal of Information Technology 16, 2 (2024), 817–829.

[2]

Ayesha Banu, Sabiha Anan, and Kaushik Deb. 2024. Removing Rain from Single Image Using Atrous U-Net and GAN. In 2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT). IEEE, 1298–1303.

[3]

Jie Chen and Lap-Pui Chau. 2013. A rain pixel recovery algorithm for videos with highly dynamic scenes. IEEE transactions on image processing 23, 3 (2013), 1097–1104.

[4]

Jie Chen, Cheen-Hau Tan, Junhui Hou, Lap-Pui Chau, and He Li. 2018. Robust video content alignment and compensation for rain removal in a cnn framework. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6286–6295.

[5]

Linwei Chen, Lin Gu, Dezhi Zheng, and Ying Fu. 2024. Frequency-Adaptive Dilated Convolution for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3414–3425.

[6]

Xiaozhi Chen, Kaustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler, and Raquel Urtasun. 2016. Monocular 3d object detection for autonomous driving. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2147–2156.

[7]

Zixuan Chen, Zewei He, and Zhe-Ming Lu. 2024. DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention. IEEE Transactions on Image Processing (2024).

[8]

Vladimir Frants, Sos Agaian, and Karen Panetta. 2023. QSAM-Net: Rain streak removal by quaternion neural network with self-attention module. IEEE Transactions on Multimedia 26 (2023), 789–798.

[9]

Roland Gao. 2023. Rethinking dilated convolution for real-time semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4675–4684.

[10]

Faten Hatem, Sura Rahim Alatba, Mohammed Yaseen Abdullah, Tatiana Nimchenko, and Anastasiia Myronchenko. 2024. A Method of Removing Rain or Snow from A Color Image using MATLAB. In 2024 35th Conference of Open Innovations Association (FRUCT). IEEE, 222–231.

[11]

Alain Hore and Djemel Ziou. 2010. Image quality metrics: PSNR vs. SSIM. In 2010 20th international conference on pattern recognition. IEEE, 2366–2369.

Digital Library

[12]

Tai-Xiang Jiang, Ting-Zhu Huang, Xi-Le Zhao, Liang-Jian Deng, and Yao Wang. 2018. Fastderain: A novel video rain streak removal method using directional gradient priors. IEEE Transactions on Image Processing 28, 4 (2018), 2089–2102.

[13]

Minghan Li, Qi Xie, Qian Zhao, Wei Wei, Shuhang Gu, Jing Tao, and Deyu Meng. 2018. Video rain streak removal by multiscale convolutional sparse coding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6644–6653.

[14]

Yu Li, Robby T Tan, Xiaojie Guo, Jiangbo Lu, and Michael S Brown. 2016. Rain streak removal using layer priors. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2736–2744.

[15]

Xiao Liang, Runde Li, and Jinhui Tang. 2019. Selective attention network for image dehazing and deraining. In Proceedings of the 1st ACM International Conference on Multimedia in Asia. 1–6.

Digital Library

[16]

Sudipta Mukhopadhyay and Abhishek Kumar Tripathi. 2014. Combating bad weather part i: Rain removal from video. Morgan & Claypool Publishers.

[17]

Yuto Namba and Xian-Hua Han. 2022. Multi-Scale Channel Transformer Network for Single Image Deraining. In Proceedings of the 4th ACM International Conference on Multimedia in Asia. 1–7.

Digital Library

[18]

Yuwei Qiu, Kaihao Zhang, Chenxi Wang, Wenhan Luo, Hongdong Li, and Zhi Jin. 2023. Mb-taylorformer: Multi-branch efficient transformer expanded by taylor formula for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12802–12813.

[19]

Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, and Deyu Meng. 2019. Progressive image deraining networks: A better and simpler baseline. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3937–3946.

[20]

Weihong Ren, Jiandong Tian, Zhi Han, Antoni Chan, and Yandong Tang. 2017. Video desnowing and deraining based on matrix decomposition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4210–4219.

[21]

De Rosal Igantius Moses Setiadi. 2021. PSNR vs SSIM: imperceptibility quality assessment for image steganography. Multimedia Tools and Applications 80, 6 (2021), 8423–8444.

[22]

Shuai Wang, Lei Zhu, Huazhu Fu, Jing Qin, Carola-Bibiane Schönlieb, Wei Feng, and Song Wang. 2022. Rethinking video rain streak removal: A new synthesis model and a deraining network with video rain prior. In European Conference on Computer Vision. Springer, 565–582.

Digital Library

[23]

Wei Wei, Lixuan Yi, Qi Xie, Qian Zhao, Deyu Meng, and Zongben Xu. 2017. Should we encode rain streaks in video as deterministic or stochastic?. In Proceedings of the IEEE International Conference on Computer Vision. 2516–2525.

[24]

Yanyan Wei, Zhao Zhang, Huan Zheng, Richang Hong, Yi Yang, and Meng Wang. 2022. Sginet: Toward sufficient interaction between single image deraining and semantic segmentation. In Proceedings of the 30th ACM International Conference on Multimedia. 6202–6210.

Digital Library

[25]

Xinwei Xue, Ying Ding, Long Ma, Yi Wang, Risheng Liu, and Xin Fan. 2021. Temporal rain decomposition with spatial structure guidance for video deraining. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015–2019.

[26]

Hao Yang, Liyuan Pan, Yan Yang, and Wei Liang. 2024. Language-driven All-in-one Adverse Weather Removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 24902–24912.

[27]

Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, et al. 2024. Unipad: A universal pre-training paradigm for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15238–15250.

[28]

Wenhan Yang, Jiaying Liu, and Jiashi Feng. 2019. Frame-consistent recurrent video deraining with dual-level flow. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1661–1670.

[29]

Wenhan Yang, Robby T Tan, Shiqi Wang, and Jiaying Liu. 2020. Self-learning video rain streak removal: When cyclic consistency meets temporal correspondence. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1720–1729.

[30]

Rajeev Yasarla and Vishal M Patel. 2019. Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8405–8414.

[31]

Zongsheng Yue, Jianwen Xie, Qian Zhao, and Deyu Meng. 2021. Semi-supervised video deraining with dynamical rain generator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 642–652.

[32]

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. 2021. Multi-stage progressive image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14821–14831.

[33]

Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, and Wei Liu. 2022. Enhanced spatio-temporal interaction learning for video deraining: faster and better. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2022), 1287–1293.

[34]

Yunpeng Zhang, Pucheng Zhou, and Mogen Xue. 2023. Single image snow removal via multi-scale dual domain decomposition and fusion. In Fourteenth International Conference on Graphics and Image Processing (ICGIP 2022), Vol. 12705. SPIE, 649–657.

[35]

Zijun Zhang. 2018. Improved adam optimizer for deep neural networks. In 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS). Ieee, 1–2.

[36]

Wei Zhong, Xuefeng Zhang, and Long Ma. 2020. Fast spatio-temporal multi-branch fusion network for video deraining. In 2020 8th International Conference on Digital Home (ICDH). IEEE, 260–265.

[37]

Junhao Zhuang, Yisi Luo, Xile Zhao, Taixiang Jiang, and Bichuan Guo. 2022. UConNet: Unsupervised controllable network for image and video deraining. In Proceedings of the 30th ACM International Conference on Multimedia. 5436–5445.

Digital Library

Index Terms

MSTMENet: Multi-Scale Spatio-Temporal Mapping and Evolution Network for Video Deraining

Index terms have been assigned to the content through auto-classification.

Recommendations

Spatio-Temporal Scale Selection in Video Data

This work presents a theory and methodology for simultaneous detection of local spatial and temporal scales in video data. The underlying idea is that if we process video data by spatio-temporal receptive fields at multiple spatial and temporal scales, ...
Generalized Gaussian Scale-Space Axiomatics Comprising Linear Scale-Space, Affine Scale-Space and Spatio-Temporal Scale-Space

This paper describes a generalized axiomatic scale-space theory that makes it possible to derive the notions of linear scale-space, affine Gaussian scale-space and linear spatio-temporal scale-space using a similar set of assumptions (scale-space axioms)...
Spatio-temporal scale-spaces
SSVM'07: Proceedings of the 1st international conference on Scale space and variational methods in computer vision

A family of spatio-temporal scale-spaces suitable for a moving observer is developed. The scale-spaces are required to be time causal for being usable for real time measurements, and to be "velocity adapted", i.e. to have Galilean covariance to avoid ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia

December 2024

939 pages

ISBN:9798400712739

DOI:10.1145/3696409

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2024

Check for updates

Author Tag

Video deraining; Spatio-temporal mapping; Multi-scale learning; Attention mechanisms

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
International Science and Technology Cooperation Program of Liaoning Province
Applied Basic Research Program of Liaoning Province
Research Foundation of Liaoning Province
Natural Science Foundation of Liaoning Province

Conference

MMAsia '24

Sponsor:

SIGMM

MMAsia '24: ACM Multimedia Asia

December 3 - 6, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
15
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)15

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Table of Contents