Meta-learning based relation and representation learning networks for single-image deraining

doi:10.1016/j.patcog.2021.108124

Pattern Recognition

Volume 120, December 2021, 108124

https://doi.org/10.1016/j.patcog.2021.108124 Get rights and content

Highlights

•
We propose the meta-learning based relation and representation learning networks for single-image deraining.
•
Our proposed method aims to learn the transferable embeddings of rainy images by characterizing the relation between rainy/clean images.
•
Effectiveness of our proposed method is validated through evaluations on different settings by comparing against several state-of-the-art algorithms.

Abstract

Single-image deraining is a kind of computer vision task that aims to restore the image that be degraded by rain streaks, which motivates existing methods to either directly translate the rainy image to its clean one, or indirectly learn the rain residual based on the prior information. However, both methodologies harm the generalization ability due to the limited diversity of the training samples, comparing with the endless varieties of the real-world rainy images. Such fact inspires us to take the merit of meta-learning and propose a meta-learning based representation learning network to learn the transferable embeddings of the rainy/clean images, while their discrepancies are characterized by the relation vector, which is generated by the subsequent meta-learning based relation learning network. These networks are leveraged into the meta-learning based deraining network (MLDN) to enhance the generalization ability by removing the latent relation vector from the transferable embedding of the rainy image and generate high-quality deraining result. Superior performance is achieved by MLDN, which has averaged 4 $%$ better than the state-of-the-arts.

Introduction

Raining is the most common weather condition that affects many outdoor computer vision tasks e.g., video surveillance [1], [2], person-reidentification [3], [4], [5], [6], [7] and object detection [8], [9], [10]. Single-image deraining is a kind of computer vision task that aims to simultaneously remove the rain streaks from the degraded image, while keep the fidelity of the background. Conventional deraining methods usually attempt to remove the implicit feature of the rain streaks from the single rainy image [11], [12]. These algorithms are realized by either modeling the shape of the rain streaks [13] or adopting hand-crafted filters [14]. In addition, deep learning architectures such as convolutional neural network (CNN) [15], [16], long short-term memory (LSTM) [17] and generative adversarial network(GAN) [18] are also employed to map rainy image to its clean one.

In general, the conventional deraining algorithms usually aim to either directly “translate” the entire rainy image that includes both the rain streaks and the clean background to the clean one, or indirectly characterize the rain residual and remove it from the embedding of the rainy image. However, in a departure from other computer vision task e.g., face recognition that has similar sample distributions, the backgrounds of rainy images show endless varieties. Such fact could damage the balance between the removal of the rain streaks and the preservation of the fidelity of the background, when one type of background is far from the data distribution of the training set, which resulting into blurring image. Moreover, the rainy images collected from the real-world don’t have ground-truths, which limits the performances of the conventional deep learning based deraining methods trained on the synthesised datasets.

Recently, meta-learning as a kind of auto machine learning technique that designed for “learning to learn” has been applied to many AI tasks [19], [20]. Its most common implementation is the few-shot learning, where a meta-learning based discriminator is trained by only few samples collected from different categories. These samples can be treated as “metadata” to help the discriminator depict the general distribution of the training data, so that it can rapidly adjust itself to the unseen samples. In this regard, the encoding layers of the discriminator is capable of learning the transferable embeddings of the input samples [21].

Such observation inspires us to learn the transferable embeddings of the rainy/clean images and the rain streaks, which can be rapidly transferred to unseen samples and enhance the deraining network to balance the performance of deraining and generalization. We demonstrate several deraining results of our meta-learning based deraining network (MLDN) in Fig. 1. It can be seen that although the backgrounds of the input rainy images show a big difference, our deraining network can still achieve satisfactory results. More results can be found in Section 4.

To achieve the superior deraining results on real-world rainy images, the major issue is to accurately disentangle and remove the rain streaks. We observe that although the backgrounds are various to each other, the relation between rainy images and clean images is clear: the existence of the similar rain streaks. Such observation motivates us to characterize the discrepancy between rainy/clean images to provide the deraining network with an accurate target to remove. Hence, we first propose a task-tailored meta-learning based relation network to preserve the transferable embeddings of the rain streaks in the relation vector, which is generated by forcing the rainy images with different backgrounds to have higher relevance scores, owing to the existence of rain streaks.

However, due to the endless diversity of the real-world rainy images, a well-characterized relation vector is not enough for the deraining task. Such drawback reveals another crucial problem within the deraining task i.e., the limited diversity of the training samples, which harms the generalization ability of the conventional deraining methods trained on the synthesized dataset. To address this issue, we aim to learn the transferable embeddings of the rainy/clean images based on the previous learned relation vector by proposing a meta-learning based discriminator. The discriminator is first trained to distinguish the rainy/clean images, which can be treated as a 2-way K-shot classification task. In contrast to the conventional meta-learning methods, we seamlessly integrate the relation vector into the discriminator to distinguish the rainy/clean images by evaluating the distances between the images and the feature of rain streaks. Thus, the encoding layers of the discriminator can be treated as the representation learning network, which has to explore the information within the rainy image that is closely related to the rain streaks and simultaneously encode the information within the clean image that is away from the rain streaks. Finally, we propose a meta-learning based deraining network (MLDN) to generate clean image by removing the feature of the rain streaks from the embedding of the rainy image.

As shown in Fig. 2, the architecture of MLDN includes three components. The first component is the representation learning network (Fig. 2(a)), which is part of the meta-learning based discriminator. This component aims to learn the embeddings of the rainy/clean images. The second component is the relation network (Fig. 2(b)), which is the classifier of the meta-learning based discriminator. It aims to characterize the relation between rainy/clean images. The obtained relation vector can be seen as the target that need to be removed. The third component is the deraining network (Fig. 2(c)), which aims to generate clean image by removing the feature of the rain streaks from the embedding of the rainy image.

To sum up, our major contributions are concluded as follows:

•
To address the issue that the conventional deep learning based deraining methods are trained on the samples with limited diversity, regardless the endless varieties of the backgrounds, we propose a task-tailored meta-learning based relation network to characterize the general relation between rainy/clean images in the relation vector, which helps to distinguish the rain streaks from different backgrounds.
•
To address the issue regarding the lack of ground-truths for the rainy images collected from the real-world, we propose a meta-learning based representation learning network to learn the transferable embeddings of the rainy/clean images and facilitate the ability of generalization.
•
We seamlessly integrate the relation network along with the representation learning network into an end-to-end meta-learning based deraining network (MLDN). Our experimental results demonstrate the advantages of our MLDN, especially on real-world rainy images.

The rest of the papers is organized as follows. Section 2 discusses related works. Section 3 presents the MLDN, while Section 4 offers the experimental results and comparative analysis. We conclude the paper in Section 5.

Section snippets

Meta-learning based algorithms

Meta-learning is a kind of automatic machine learning technique that aims at learning to learn. Unlike conventional deep learning algorithms that have limited generalization capability and hard to train, meta-learning based algorithms are capable of easily adapting or generalizing to new tasks and new environments that have never been seen during previous training stage.

Meta-learning has been applied to conventional machine learning algorithms to provide agnostic model. Wang et al. [22]

MLDN

Single-image deraining is a kind of computer vision task that aims to reconstruct the clean background from the image that be degraded by the rain streaks. A well-designed deraining network should first disentangle the rain streaks and subsequently remove them while keep the fidelity of the background. However, there are two intractable problems within the deraining task. The first problem is the endless varieties of the backgrounds. In a departure from the computer vision task e.g., face

Experiments and analysis

In this section, we conduct various experiments to verify the performance of MLDN. We first take ablation studies to evaluate the effectiveness of each component and the impact of major parameter. We subsequently compare MLDN against various state-of-the-art deraining algorithms. In addition, to prove the generalization ability of MLDN, we implement our method on rainy images that collected from real-world and demonstrate the intuitive deraining results.

The deraining performance is assessed by

Conclusion

In this paper, we propose a meta-learning based deraining network (MLDN), which consists of a meta-learning based relation network and a meta-learning based representation learning network. We learn the transferable embeddings of the rainy/clean images by incorporating the meta-learning technique and depict the underlying discrepancy between rainy/clean images by learning a relation vector, which facilitate the generalization ability of the deraining task. Effectiveness of MLDN is demonstrated

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (U1936217, 61806066, 61806035, 61672365, 62072151).

Xinjian Gao received his Ph.D. in signal and information processing from the Hefei University of Technology, China, in 2017. He is currently a Lecturer in the School of Computer Science and Information Engineering, Hefei University of Technology. His research interests include machine learning, multimedia, and pattern recognition.

References (57)

Y. Lin et al.
Improving person re-identification by attribute and identity learning
Pattern Recognit.
(2019)
L. Wu et al.
What-and-where to match: deep spatially multiplicative integration networks for person re-identification
Pattern Recognit.
(2018)
X. Jin et al.
AI-GAN: asynchronous interactive generative adversarial network for single image rain removal
Pattern Recognit.
(2020)
G. Santos et al.
Manifold learning for user profiling and identity verification using motion sensors
Pattern Recognit.
(2020)
P. Li et al.
Visual tracking by dynamic matching-classification network switching
Pattern Recognit.
(2020)
S. Du et al.
Single image deraining via decorrelating the rain streaks and background scene in gradient domain
Pattern Recognit.
(2018)
Y. Li et al.
Dust removal from high turbid underwater images using convolutional neural networks
Opt. Laser Technol.
(2019)
X. Gao et al.
Self-attention driven adversarial similarity learning network
Pattern Recognit.
(2020)
T. Han et al.
Tvenet: temporal variance embedding network for fine-grained action representation
Pattern Recognit.
(2020)
D.-P. Fan et al.
Shifting more attention to video salient object detection
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2019)

T.-N. Nguyen et al.

Anomaly detection in video sequence with appearance-motion correspondence

Proceedings of the IEEE International Conference on Computer Vision

(2019)

L. Zheng et al.

Pose-invariant embedding for deep person re-identification

IEEE Trans. Image Process.

(2019)

L. Wu et al.

Few-shot deep adversarial learning for video-based person re-identification

IEEE Trans. Image Process.

(2019)

L. Wu et al.

Cross-entropy adversarial view adaptation for person re-identification

IEEE Trans. Circuits Syst. Video Technol.

(2019)

Z.-Q. Zhao et al.

Object detection with deep learning: a review

IEEE Trans. Neural Netw. Learn. Syst.

(2019)

A. Borji et al.

Salient object detection: a survey

Comput. Vis. Media

(2019)

L. Wu et al.

Deep attention-based spatially recursive networks for fine-grained visual recognition

IEEE Trans. Cybern.

(2018)

H. Zhang et al.

Density-aware single image de-raining using a multi-stream dense network

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2018)

G. Wang et al.

ERL-Net: entangled representation learning for single image de-raining

Proceedings of the IEEE International Conference on Computer Vision

(2019)

T. Wang et al.

Spatial attentive single-image deraining with a high quality real rain dataset

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2019)

X. Lin et al.

Utilizing two-phase processing with FBLS for single image deraining

IEEE Trans. Multimed.

(2020)

R. Yasarla et al.

Uncertainty guided multi-scale residual learning-using a cycle spinning CNN for single image de-raining

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2019)

W. Yang et al.

Deep joint rain detection and removal from a single image

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2017)

D. Ren et al.

Progressive image deraining networks: a better and simpler baseline

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2019)

T. Hospedales, A. Antoniou, P. Micaelli, A. Storkey, Meta-learning in neural networks: a survey, arXiv preprint...

Y. Wang

Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion

ACM Trans. Multimed. Comput. Commun. Appl. (TOMM)

(2021)

F. Sung et al.

Learning to compare: relation network for few-shot learning

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2018)

Y.-X. Wang et al.

Meta-learning to detect rare objects

Proceedings of the IEEE International Conference on Computer Vision

(2019)

Cited by (16)

Data-Driven single image deraining: A Comprehensive review and new perspectives
2023, Pattern Recognition
Single Image Deraining (SID) aims at recovering the rain-free background from an image degraded by rain streaks. For the powerful fitting ability of deep neural networks and massive training data, data-driven deep SID methods have obtained significant improvement over traditional model/prior-based ones. Current studies usually focus on improving the deraining performance by proposing different categories of deraining networks, while neglecting the interpretation of the solving process. As a result, the generalization ability may still be limited in real-world scenarios, and the deraining results also cannot effectively improve the performance of subsequent high-level tasks (e.g., object detection). To explore these issues, we in this paper re-examine the three important factors (i.e., data, rain model and network architecture) for the SID problem, and specifically analyze them by proposing new and more reasonable criteria (i.e., general vs. specific, synthetical vs. mathematical, black-box vs. white-box). We also study the relationship of the three factors from a new perspective of data, and reveal two different solving paradigms (explicit vs. implicit) for the SID task. We further discuss the current mainstream data-driven SID methods from five aspects, i.e., training strategy, network pipeline, domain knowledge, data preprocessing, and objective function, and some useful conclusions are summarized by statistics. Besides, we profoundly studied one of the three factors, i.e., data, and measured the performance of current methods on different datasets through extensive experiments to reveal the effectiveness of SID data. Finally, with the comprehensive review and in-depth analysis, we draw some valuable conclusions and suggestions for future research.
A new image decomposition approach using pixel-wise analysis sparsity model
2023, Pattern Recognition
Citation Excerpt :
To investigate the effect of network input/output and loss function, Ren et al. [28] repeatedly unfold the shallow ResNet into multiple stages. More recently, a meta-learning based network is proposed to learn the transferable embeddings of the rainy/clean images [29]. The deep-based methods have led to the rapid progress of single image deraining.
Decomposing an image into two ‘simpler’ layers has been widely used in low-level vision tasks, such as image recovery and enhancement. It is an ill-posed problem since the number of unknowns are larger than the input. In this paper, a two-step strategy is introduced, including task-aware priors estimate and a decomposition model. A pixel-wise analysis sparsity model is proposed to regularize the separation layers, which supposes the transformed image generated with analysis operator is sparse. Unlike regularizing all pixels with one penalty weight, we try to estimate each pixel’s sparsity level with task-aware priors and to achieve pixel-wise sparse penalty. Additionally, one separation layer is regularized with both synthesis sparsity model and pixel-wise analysis sparsity model to exploit their complementary mechanisms. Unlike the analysis one utilizing image local features, the synthesis one exploits an over-complete dictionary and non-local similarity cues to provide flexible prior for regularizing the decomposition results. The proposed model is solved by an alternating optimization algorithm. We evaluate it with two applications, Retinex model and rain streaks removal. Extensive experiments on multiple enhancement datasets, many synthetic and real rainy images demonstrate that our method can remove imaging noise during Retinex decomposition, and can produce high fidelity deraining results. It achieves competing performance in terms of quantitative metrics and visual quality compared with the state-of-the-art methods.
Shedding light on images: Multi-level image brightness enhancement guided by arbitrary references
2022, Pattern Recognition
Citation Excerpt :
However, images often suffer degradation due to environmental and equipment limitations, i.e., low contrast, noise, and blur. Thus, image quality enhancement technologies have snowballed, including low-light image enhancement, dehazing [1], deblurring [2], deraining [3], image inpainting [4], and super-resolution [5]. Among them, low-light image enhancement is raising more and more attention, since an appropriate brightness is essential for both users feelings and downstream tasks.
The non-linearity between human perception and image brightness levels results in different definitions of NORMAL-light. Thus, most existing low-light image enhancement methods which produce one-to-one mapping can not meet the aesthetic demand. Other pioneers enhance low-light images guided by a given value. However, the inherent problem of non-linearity will cause poor usability. To this end, we propose a user-friendly neural network for multi-level low-light image enhancement. Inspired by style transfer, our method decomposes an image into content component feature and luminance component feature in the latent space. Then we enhance the image brightness to different levels by concatenating the content components from low-light images and the luminance components from reference images. The network meets various user requirements by selecting different brightness references. Moreover, information except for brightness is preserved to alleviate color distortion. Extensive experiments demonstrate the superiority of our network against existing methods.
Self-guided information for few-shot classification
2022, Pattern Recognition
Citation Excerpt :
Few-shot learning is used to solve this kind of problem. Many few-shot learning approaches adopt a meta-learning strategy [11,12], which utilizes a large number of tasks with similar configurations to the target task for learning a meta-learner, so it can use only a small amount of novel task data to generalize the model quickly. Recurrent neural network (RNN) provides a potential approach for meta-learning [17].
Few-shot classification aims to identify novel categories using only a few labeled samples. Generally, the metric-based few-shot classification methods compare the feature embedding of Query samples (unlabeled samples) with Support samples (labeled samples) in a metric algorithm to predict which category the Query sample belongs to. Obtaining a good feature embedding for each sample in the feature extraction stage can improve the classification accuracy in the metric stage. Based on this, we design the Self-Guided Information Convolution (SGI-Conv), an improved convolution structure, which utilizes the high-level features to guide the network to extract the required discriminative features. To effectively utilize the feature embeddings of samples, we divide the metric network into multiple blocks and build a multi-layer graph convolutional network by sharing adjacent matrices. The multi-layer structure enhances the aggregation ability of graph convolution. Extensive experiments on multiple benchmark datasets demonstrate that our method has achieved competitive results on the few-shot classification tasks.
Meta-Learning based efficient framework for diagnosing rare disorders: A comprehensive survey
2024, AIP Conference Proceedings
Decomformer: Decompose Self-Attention of Transformer for Efficient Image Restoration
2024, IEEE Access

View all citing articles on Scopus

Yang Wang is currently a Full Professor at Hefei University of Technology, China. He has published more than 70 research papers, with Google Scholar Citations 3000+, H-index 27. His research interests include deep learning over visual recognition, machine learning and multimedia analytics. He is currently serving as an Associate Editor of ACM Transactions on Information Systems.

Jun Cheng received the B.Eng. and M.Eng. degrees from the University of Science and Technology of China, Hefei, China, in 1999 and 2002, respectively, and the Ph.D. degree from the Chinese University of Hong Kong, Hong Kong, in 2006. He is currently a Professor with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, and the Director of the Laboratory for Human Machine Control. His current research interests include computer vision, robotics, machine intelligence, and control.

Mingliang Xu is a full professor in the School of Information Engineering of Zhengzhou University, China, and currently is the director of CIISR (Center for Interdisciplinary Information Science Research) and the vice General Secretary of ACM SIGAI China. He received his Ph.D. degree in computer science and technology from the State Key Lab of CAD&CG at Zhejiang University, Hangzhou, China. His current research interests include computer graphics and artificial intelligence. He has authored more than 80 journal and conference papers in these areas, including ACM TOG, ACM TIST, IEEE TPAMI, IEEE TIP, IEEE TCYB, IEEE TCSVT, IEEE TAC, IEEE TVCG, ACM SIGGRAPH (Asia), CVPR, ACM MM, IJCAI, etc.

Meng Wang is a Full Professor at Hefei University of Technology, China. He received his B.E. degree and Ph.D. degree in the Special Class for the Gifted Young and the Department of Electronic Engineering and Information Science from the University of Science and Technology of China (USTC), Hefei, China, in 2003 and 2008, respectively. His current research interests include multimedia content analysis, computer vision, and pattern recognition. He has authored more than 200 book chapters, journal and conference papers in these areas. He is the recipient of the ACM SIGMM Rising Star Award 2014. He is an associate editor of IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE), IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT), and IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS).

View full text

Meta-learning based relation and representation learning networks for single-image deraining

Highlights

Abstract

Introduction

Section snippets

Meta-learning based algorithms

MLDN

Experiments and analysis

Conclusion

Declaration of Competing Interest

Acknowledgments

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Opt. Laser Technol.

Pattern Recognit.

Pattern Recognit.

Shifting more attention to video salient object detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Anomaly detection in video sequence with appearance-motion correspondence

Proceedings of the IEEE International Conference on Computer Vision

Pose-invariant embedding for deep person re-identification

IEEE Trans. Image Process.

Few-shot deep adversarial learning for video-based person re-identification

IEEE Trans. Image Process.

Cross-entropy adversarial view adaptation for person re-identification

IEEE Trans. Circuits Syst. Video Technol.

Object detection with deep learning: a review

IEEE Trans. Neural Netw. Learn. Syst.

Salient object detection: a survey

Comput. Vis. Media

Deep attention-based spatially recursive networks for fine-grained visual recognition

IEEE Trans. Cybern.

Density-aware single image de-raining using a multi-stream dense network

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

ERL-Net: entangled representation learning for single image de-raining

Proceedings of the IEEE International Conference on Computer Vision

Spatial attentive single-image deraining with a high quality real rain dataset

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Utilizing two-phase processing with FBLS for single image deraining

IEEE Trans. Multimed.

Uncertainty guided multi-scale residual learning-using a cycle spinning CNN for single image de-raining

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Deep joint rain detection and removal from a single image

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Progressive image deraining networks: a better and simpler baseline

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion

ACM Trans. Multimed. Comput. Commun. Appl. (TOMM)

Learning to compare: relation network for few-shot learning

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Meta-learning to detect rare objects

Proceedings of the IEEE International Conference on Computer Vision