Elsevier

Pattern Recognition

Volume 120, December 2021, 108124
Pattern Recognition

Meta-learning based relation and representation learning networks for single-image deraining

https://doi.org/10.1016/j.patcog.2021.108124Get rights and content

Highlights

  • We propose the meta-learning based relation and representation learning networks for single-image deraining.

  • Our proposed method aims to learn the transferable embeddings of rainy images by characterizing the relation between rainy/clean images.

  • Effectiveness of our proposed method is validated through evaluations on different settings by comparing against several state-of-the-art algorithms.

Abstract

Single-image deraining is a kind of computer vision task that aims to restore the image that be degraded by rain streaks, which motivates existing methods to either directly translate the rainy image to its clean one, or indirectly learn the rain residual based on the prior information. However, both methodologies harm the generalization ability due to the limited diversity of the training samples, comparing with the endless varieties of the real-world rainy images. Such fact inspires us to take the merit of meta-learning and propose a meta-learning based representation learning network to learn the transferable embeddings of the rainy/clean images, while their discrepancies are characterized by the relation vector, which is generated by the subsequent meta-learning based relation learning network. These networks are leveraged into the meta-learning based deraining network (MLDN) to enhance the generalization ability by removing the latent relation vector from the transferable embedding of the rainy image and generate high-quality deraining result. Superior performance is achieved by MLDN, which has averaged 4% better than the state-of-the-arts.

Introduction

Raining is the most common weather condition that affects many outdoor computer vision tasks e.g., video surveillance [1], [2], person-reidentification [3], [4], [5], [6], [7] and object detection [8], [9], [10]. Single-image deraining is a kind of computer vision task that aims to simultaneously remove the rain streaks from the degraded image, while keep the fidelity of the background. Conventional deraining methods usually attempt to remove the implicit feature of the rain streaks from the single rainy image [11], [12]. These algorithms are realized by either modeling the shape of the rain streaks [13] or adopting hand-crafted filters [14]. In addition, deep learning architectures such as convolutional neural network (CNN) [15], [16], long short-term memory (LSTM) [17] and generative adversarial network(GAN) [18] are also employed to map rainy image to its clean one.

In general, the conventional deraining algorithms usually aim to either directly “translate” the entire rainy image that includes both the rain streaks and the clean background to the clean one, or indirectly characterize the rain residual and remove it from the embedding of the rainy image. However, in a departure from other computer vision task e.g., face recognition that has similar sample distributions, the backgrounds of rainy images show endless varieties. Such fact could damage the balance between the removal of the rain streaks and the preservation of the fidelity of the background, when one type of background is far from the data distribution of the training set, which resulting into blurring image. Moreover, the rainy images collected from the real-world don’t have ground-truths, which limits the performances of the conventional deep learning based deraining methods trained on the synthesised datasets.

Recently, meta-learning as a kind of auto machine learning technique that designed for “learning to learn” has been applied to many AI tasks [19], [20]. Its most common implementation is the few-shot learning, where a meta-learning based discriminator is trained by only few samples collected from different categories. These samples can be treated as “metadata” to help the discriminator depict the general distribution of the training data, so that it can rapidly adjust itself to the unseen samples. In this regard, the encoding layers of the discriminator is capable of learning the transferable embeddings of the input samples [21].

Such observation inspires us to learn the transferable embeddings of the rainy/clean images and the rain streaks, which can be rapidly transferred to unseen samples and enhance the deraining network to balance the performance of deraining and generalization. We demonstrate several deraining results of our meta-learning based deraining network (MLDN) in Fig. 1. It can be seen that although the backgrounds of the input rainy images show a big difference, our deraining network can still achieve satisfactory results. More results can be found in Section 4.

To achieve the superior deraining results on real-world rainy images, the major issue is to accurately disentangle and remove the rain streaks. We observe that although the backgrounds are various to each other, the relation between rainy images and clean images is clear: the existence of the similar rain streaks. Such observation motivates us to characterize the discrepancy between rainy/clean images to provide the deraining network with an accurate target to remove. Hence, we first propose a task-tailored meta-learning based relation network to preserve the transferable embeddings of the rain streaks in the relation vector, which is generated by forcing the rainy images with different backgrounds to have higher relevance scores, owing to the existence of rain streaks.

However, due to the endless diversity of the real-world rainy images, a well-characterized relation vector is not enough for the deraining task. Such drawback reveals another crucial problem within the deraining task i.e., the limited diversity of the training samples, which harms the generalization ability of the conventional deraining methods trained on the synthesized dataset. To address this issue, we aim to learn the transferable embeddings of the rainy/clean images based on the previous learned relation vector by proposing a meta-learning based discriminator. The discriminator is first trained to distinguish the rainy/clean images, which can be treated as a 2-way K-shot classification task. In contrast to the conventional meta-learning methods, we seamlessly integrate the relation vector into the discriminator to distinguish the rainy/clean images by evaluating the distances between the images and the feature of rain streaks. Thus, the encoding layers of the discriminator can be treated as the representation learning network, which has to explore the information within the rainy image that is closely related to the rain streaks and simultaneously encode the information within the clean image that is away from the rain streaks. Finally, we propose a meta-learning based deraining network (MLDN) to generate clean image by removing the feature of the rain streaks from the embedding of the rainy image.

As shown in Fig. 2, the architecture of MLDN includes three components. The first component is the representation learning network (Fig. 2(a)), which is part of the meta-learning based discriminator. This component aims to learn the embeddings of the rainy/clean images. The second component is the relation network (Fig. 2(b)), which is the classifier of the meta-learning based discriminator. It aims to characterize the relation between rainy/clean images. The obtained relation vector can be seen as the target that need to be removed. The third component is the deraining network (Fig. 2(c)), which aims to generate clean image by removing the feature of the rain streaks from the embedding of the rainy image.

To sum up, our major contributions are concluded as follows:

  • To address the issue that the conventional deep learning based deraining methods are trained on the samples with limited diversity, regardless the endless varieties of the backgrounds, we propose a task-tailored meta-learning based relation network to characterize the general relation between rainy/clean images in the relation vector, which helps to distinguish the rain streaks from different backgrounds.

  • To address the issue regarding the lack of ground-truths for the rainy images collected from the real-world, we propose a meta-learning based representation learning network to learn the transferable embeddings of the rainy/clean images and facilitate the ability of generalization.

  • We seamlessly integrate the relation network along with the representation learning network into an end-to-end meta-learning based deraining network (MLDN). Our experimental results demonstrate the advantages of our MLDN, especially on real-world rainy images.

The rest of the papers is organized as follows. Section 2 discusses related works. Section 3 presents the MLDN, while Section 4 offers the experimental results and comparative analysis. We conclude the paper in Section 5.

Section snippets

Meta-learning based algorithms

Meta-learning is a kind of automatic machine learning technique that aims at learning to learn. Unlike conventional deep learning algorithms that have limited generalization capability and hard to train, meta-learning based algorithms are capable of easily adapting or generalizing to new tasks and new environments that have never been seen during previous training stage.

Meta-learning has been applied to conventional machine learning algorithms to provide agnostic model. Wang et al. [22]

MLDN

Single-image deraining is a kind of computer vision task that aims to reconstruct the clean background from the image that be degraded by the rain streaks. A well-designed deraining network should first disentangle the rain streaks and subsequently remove them while keep the fidelity of the background. However, there are two intractable problems within the deraining task. The first problem is the endless varieties of the backgrounds. In a departure from the computer vision task e.g., face

Experiments and analysis

In this section, we conduct various experiments to verify the performance of MLDN. We first take ablation studies to evaluate the effectiveness of each component and the impact of major parameter. We subsequently compare MLDN against various state-of-the-art deraining algorithms. In addition, to prove the generalization ability of MLDN, we implement our method on rainy images that collected from real-world and demonstrate the intuitive deraining results.

The deraining performance is assessed by

Conclusion

In this paper, we propose a meta-learning based deraining network (MLDN), which consists of a meta-learning based relation network and a meta-learning based representation learning network. We learn the transferable embeddings of the rainy/clean images by incorporating the meta-learning technique and depict the underlying discrepancy between rainy/clean images by learning a relation vector, which facilitate the generalization ability of the deraining task. Effectiveness of MLDN is demonstrated

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (U1936217, 61806066, 61806035, 61672365, 62072151).

Xinjian Gao received his Ph.D. in signal and information processing from the Hefei University of Technology, China, in 2017. He is currently a Lecturer in the School of Computer Science and Information Engineering, Hefei University of Technology. His research interests include machine learning, multimedia, and pattern recognition.

References (57)

  • T.-N. Nguyen et al.

    Anomaly detection in video sequence with appearance-motion correspondence

    Proceedings of the IEEE International Conference on Computer Vision

    (2019)
  • L. Zheng et al.

    Pose-invariant embedding for deep person re-identification

    IEEE Trans. Image Process.

    (2019)
  • L. Wu et al.

    Few-shot deep adversarial learning for video-based person re-identification

    IEEE Trans. Image Process.

    (2019)
  • L. Wu et al.

    Cross-entropy adversarial view adaptation for person re-identification

    IEEE Trans. Circuits Syst. Video Technol.

    (2019)
  • Z.-Q. Zhao et al.

    Object detection with deep learning: a review

    IEEE Trans. Neural Netw. Learn. Syst.

    (2019)
  • A. Borji et al.

    Salient object detection: a survey

    Comput. Vis. Media

    (2019)
  • L. Wu et al.

    Deep attention-based spatially recursive networks for fine-grained visual recognition

    IEEE Trans. Cybern.

    (2018)
  • H. Zhang et al.

    Density-aware single image de-raining using a multi-stream dense network

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • G. Wang et al.

    ERL-Net: entangled representation learning for single image de-raining

    Proceedings of the IEEE International Conference on Computer Vision

    (2019)
  • T. Wang et al.

    Spatial attentive single-image deraining with a high quality real rain dataset

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2019)
  • X. Lin et al.

    Utilizing two-phase processing with FBLS for single image deraining

    IEEE Trans. Multimed.

    (2020)
  • R. Yasarla et al.

    Uncertainty guided multi-scale residual learning-using a cycle spinning CNN for single image de-raining

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2019)
  • W. Yang et al.

    Deep joint rain detection and removal from a single image

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • D. Ren et al.

    Progressive image deraining networks: a better and simpler baseline

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2019)
  • T. Hospedales, A. Antoniou, P. Micaelli, A. Storkey, Meta-learning in neural networks: a survey, arXiv preprint...
  • Y. Wang

    Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion

    ACM Trans. Multimed. Comput. Commun. Appl. (TOMM)

    (2021)
  • F. Sung et al.

    Learning to compare: relation network for few-shot learning

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • Y.-X. Wang et al.

    Meta-learning to detect rare objects

    Proceedings of the IEEE International Conference on Computer Vision

    (2019)
  • Cited by (16)

    • A new image decomposition approach using pixel-wise analysis sparsity model

      2023, Pattern Recognition
      Citation Excerpt :

      To investigate the effect of network input/output and loss function, Ren et al. [28] repeatedly unfold the shallow ResNet into multiple stages. More recently, a meta-learning based network is proposed to learn the transferable embeddings of the rainy/clean images [29]. The deep-based methods have led to the rapid progress of single image deraining.

    • Shedding light on images: Multi-level image brightness enhancement guided by arbitrary references

      2022, Pattern Recognition
      Citation Excerpt :

      However, images often suffer degradation due to environmental and equipment limitations, i.e., low contrast, noise, and blur. Thus, image quality enhancement technologies have snowballed, including low-light image enhancement, dehazing [1], deblurring [2], deraining [3], image inpainting [4], and super-resolution [5]. Among them, low-light image enhancement is raising more and more attention, since an appropriate brightness is essential for both users feelings and downstream tasks.

    • Self-guided information for few-shot classification

      2022, Pattern Recognition
      Citation Excerpt :

      Few-shot learning is used to solve this kind of problem. Many few-shot learning approaches adopt a meta-learning strategy [11,12], which utilizes a large number of tasks with similar configurations to the target task for learning a meta-learner, so it can use only a small amount of novel task data to generalize the model quickly. Recurrent neural network (RNN) provides a potential approach for meta-learning [17].

    View all citing articles on Scopus

    Xinjian Gao received his Ph.D. in signal and information processing from the Hefei University of Technology, China, in 2017. He is currently a Lecturer in the School of Computer Science and Information Engineering, Hefei University of Technology. His research interests include machine learning, multimedia, and pattern recognition.

    Yang Wang is currently a Full Professor at Hefei University of Technology, China. He has published more than 70 research papers, with Google Scholar Citations 3000+, H-index 27. His research interests include deep learning over visual recognition, machine learning and multimedia analytics. He is currently serving as an Associate Editor of ACM Transactions on Information Systems.

    Jun Cheng received the B.Eng. and M.Eng. degrees from the University of Science and Technology of China, Hefei, China, in 1999 and 2002, respectively, and the Ph.D. degree from the Chinese University of Hong Kong, Hong Kong, in 2006. He is currently a Professor with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, and the Director of the Laboratory for Human Machine Control. His current research interests include computer vision, robotics, machine intelligence, and control.

    Mingliang Xu is a full professor in the School of Information Engineering of Zhengzhou University, China, and currently is the director of CIISR (Center for Interdisciplinary Information Science Research) and the vice General Secretary of ACM SIGAI China. He received his Ph.D. degree in computer science and technology from the State Key Lab of CAD&CG at Zhejiang University, Hangzhou, China. His current research interests include computer graphics and artificial intelligence. He has authored more than 80 journal and conference papers in these areas, including ACM TOG, ACM TIST, IEEE TPAMI, IEEE TIP, IEEE TCYB, IEEE TCSVT, IEEE TAC, IEEE TVCG, ACM SIGGRAPH (Asia), CVPR, ACM MM, IJCAI, etc.

    Meng Wang is a Full Professor at Hefei University of Technology, China. He received his B.E. degree and Ph.D. degree in the Special Class for the Gifted Young and the Department of Electronic Engineering and Information Science from the University of Science and Technology of China (USTC), Hefei, China, in 2003 and 2008, respectively. His current research interests include multimedia content analysis, computer vision, and pattern recognition. He has authored more than 200 book chapters, journal and conference papers in these areas. He is the recipient of the ACM SIGMM Rising Star Award 2014. He is an associate editor of IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE), IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT), and IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS).

    View full text