Elsevier

Pattern Recognition

Volume 129, September 2022, 108713
Pattern Recognition

ECDNet: A bilateral lightweight cloud detection network for remote sensing images

https://doi.org/10.1016/j.patcog.2022.108713Get rights and content

Highlights

  • We propose a neural network: ECDNet. It consists of a lightweight two-pathway encoder and an extremely lightweight decoder.

  • In the encoder, the dense pyramid module (DPM) is designed to have large and diverse receptive fields in feature extraction.

  • In the encoder, the fusion module (FM) is developed to fuse detail and semantic information more efficiently.

  • The experiment results on LandSat8 and MODIS demonstrate that ECDNet can achieve state-of-the-art performance.

Abstract

Cloud detection is one of the critical tasks in remote sensing image pre-processing and it has attracted extensive research interest. In recent years, deep neural networks based cloud detection methods have surpassed the traditional methods (threshold-based methods and conventional machine learning-based methods). However, current approaches mainly focus on improving detection accuracy. The computation complexity and large model size are ignored. To tackle this problem, we propose a lightweight deep learning cloud detection model: Efficient Cloud Detection Network (ECDNet). This model is based on the encoder-decoder structure. In the encoder, a two-path architecture is proposed to extract the spatial and semantic information concurrently. One pathway is the detail branch. It is designed to capture low-level detail spatial features with only a few parameters. The other pathway is the semantic branch, which is mainly for capturing context features. In the semantic branch, a proposed dense pyramid module (DPM) is designed for multi-scale contextual information extraction. The number of parameters and calculations in DPM is greatly reduced by features reusing. Besides, a FusionBlock is developed to merge these two kinds of information. Then the extreme lightweight decoder recovers the cloud mask to the same scale as the input image step by step. To improve performance, boost loss is introduced without inference cost increment. We evaluate the proposed method on two public datasets: LandSat8 and MODIS. Extensive experiments demonstrate that the proposed ECDNet achieves comparable accuracy as the state-of-art cloud detection methods, and meantime has a much smaller model size and less computation burden.

Introduction

Cloud detection is an essential step in remote sensing. Two-thirds of our earth’s surface is covered by clouds [38]. When the earth’s surface is the objective of satellite images, cloud will be treated as noise and should be removed in pre-processing phase. This phase ensures the high quality of various remote sensing applications, such as land cover classification [38], environment observation [17], and vegetation engineering [30]. However, the strict bandwidth constraints for downlink transmission conflict with transferring high-resolution hyperspectral images from satellites [8]. Recently, the trend of deploying remote sensing applications directly on-board satellite attracts more and more attention [12]. Therefore, the cloud detection method, which can be implemented and executed on satellites, is in great demand.

In practice, to tackle the cloud detection problem, the traditional methods can be divided into two groups, i.e., threshold-based methods and machine learning-based methods. The threshold-based approaches utilize the spectral and wave-length difference between cloud and other objects for cloud identification [40]. However, the process of fetching an appropriate threshold is computationally expensive. In addition, for multi-spectral satellite imagery, which only has four bands (red, green, blue, and near-infrared), the threshold method is usually not robust enough. To overcome the limitation of threshold-based approaches, researchers adopted machine learning techniques in cloud detection, e.g., support vector machines [22], random forest [7]. Unfortunately, their performance highly relies on manual-crafted features. And these features usually contain insufficient distinguishable information. Therefore, it is difficult to discriminate cloud from other objects with them especially in complicated cases.

Recently, with the rise of deep learning and its success in computer vision [27] and remote sensing image processing [13], cloud detection methods based on convolutional neural network (CNN) have been widely investigated [20], [24], [31]. In these methods, features are extracted via CNN automatically. And the derived methods have achieved significant improvement in performance. However, these CNN-based models [31] tend to require a great number of parameters to achieve satisfactory performance. And the satellite, as a space device, has limitations in onboard resources like storage, computation, and power. Therefore, it’s impractical to directly deploy the existing CNN-based cloud detection methods on-board.

Since semantic segmentation is similar to cloud detection, e.g., they both associate each pixel of an image with a class label [27], another possible solution is deploying lightweight models for semantic segmentation tasks in onboard cloud detection. In recent decades, researchers have achieved significant improvement [10], [27], [28], [39] in lightweight semantic segmentation for Natural Scene Images (NSIs). However, there is an obvious gap between natural scene images and Remote Sensing Images (RSIs). First, the RSIs have the characteristics that the feature variance of the object is bigger in intra-class and smaller in inter-class than that of NSIs. For example, because of the influence of background, the features of the thin cloud are different from the features of the cloud. Besides, snow and ice cloud have quite similar features in most RSI bands [31]. Second, the object boundary, especially the cloud boundary is unclear, e.g., the thin cloud situation [6]. Therefore, it is unreliable to directly apply the semantic segmentation methods.

Recently, some researchers have paid attention to efficient cloud detection methods. In [8], a CNN-based model is proposed for nanosatellites to select eligible images to transmit to the ground. Even though the network structure is designed with low power consumption and low latency in inference, its accuracy is low. In [16], a lightweight network for cloud detection is designed on Sentinel-2A images. In this model, the number of parameters is reduced by using depthwise separable convolution and sharing kernel between channels in feature extraction blocks. However, kernel sharing cannot reduce the computation complexity, and hence the computation burden still exists. In conclusion, there still exist challenges in onboard cloud detection.

We observe that the cloud pixels are usually not isolated from others. Even though the thin cloud pixel has different features from the cloud and its spectral feature is heavily influenced by background, when it’s surrounded by cloud pixel, it’s of high possibility to be cloud. Therefore, context information of multi-scale is crucial for cloud detection. And the first challenge is how to make full use of such multi-scale context information. Since the current lightweight cloud detection method only focuses on minimizing the number of parameters, to obtain both effective and efficient onboard cloud detection, the second challenge is reducing the computation complexity as well.

Specifically, we propose the Efficient Cloud Detection Net (ECDNet), a new lightweight encoder-decoder neural network architecture to achieve comparable performance with generic state-of-art cloud detection methods. Inspired by Yu et al. [34], the encoder part is designed as a two-pathway architecture. One pathway is the detail branch. To keep enough spatial information, it is designed with shallow layers and wide channels. To decrease parameter amounts and computation cost, the lightweight module GhostModule [9] is adopted in the detail branch. The other pathway is the semantic branch. Different from the detail branch, in the semantic branch, as much as multi-scale semantic context information is captured. To have a large receptive field and meanwhile keep the network lightweight, we propose a dense pyramid module (DPM). In this module, each layer has fewer channels, and the features of layers are reused via dense connection to decrease computation cost. Inside each layer, a feature pyramid module is designed to extract and concatenate features with different receptive fields. The proposed semantic branch consists of a stemblock and two DPMs. It incorporates a large context without parameter amount increment. Then the outputs of the detail branch and semantic branch are fused via our proposed fusion module (FM) to gain more comprehensive feature maps. At last, a lightweight decoder takes intermediate results of the encoder to compensate for lost spatial features in downsampling. And it resizes feature maps to the original size of input step by step.

The main contributions of our work can be summarized as follows:

  • a)

    We propose a neural network: ECDNet. It consists of a lightweight two-pathway encoder and an extremely lightweight decoder.

  • b)

    In the encoder, the dense pyramid module (DPM) is designed to have large and diverse receptive fields in feature extraction.

  • c)

    In the encoder, the fusion module (FM) is developed to fuse detail and semantic information more efficiently.

  • d)

    The experiment results on LandSat8 and MODIS demonstrate that ECDNet can achieve state-of-the-art performance.

Section snippets

Cloud detection methods

The most straightforward way to distinguish cloud from other objects is by utilizing differences in their spectral characteristics to calculate thresholds. Fmask [40], which used the Landsat Top of Atmosphere (TOA) reflectance and Brightness Temperature (BT) of Landsat images, produced a probability mask as the threshold for cloud detection. Wei et al. proposed an algorithm to dynamically determine a proper threshold [29]. And the database used in this algorithm is constructed with MODIS

Overview of efficient cloud detection network (ECDNet)

In Fig. 1 (Page 9), the Efficient Cloud Detection Network (ECDNet) is depicted, which is based on an encoder-decoder architecture.

In the encoder, to capture the spatial and multi-scale context information separately, a two-path architecture is adopted. It ensures that the diversity of features is taken into account to strengthen the expressive ability of the feature map. The detail branch is designed as a 3-stage ResNet module with shallow layers to extract detail spatial features. And the

Datasets and experiments setup

To evaluate the proposed ECDnet, we compare ECDnet with other state-of-art cloud detection and semantic segmentation approaches on two remote sensing images (RSI) datasets: LandSat8 and MODIS. Then we conduct ablation experiments. The models are compared in performance and efficient aspect.

Conclusion and future work

In this paper, we propose a lightweight method based on a deep convolutional neural network for cloud detection: ECDNet. It aims at tackling the onboard cloud detection problem on satellites. By considering that the satellites have limitations on computation, storage, and power resource, ECDNet is designed to be an extremely lightweight model with less performance degradation in comparison with existing state-of-art methods. The method is based on encoder-decoder architecture. In the encoder,

Acknowledgments

This work was supported by the Shenzhen Science and Technology Program under Grant No. JCYJ20210324120208022 and Grant No. JCYJ20200109113014456.

Chen Luo is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. She received the B.Sc. degree from Xidian University, China in 2011, and received M.Sc. degree from Hannover University, Germany in 2014. Her research interests include deep learning, remote sensing and computer vision.

References (40)

  • L.C. Chen et al.

    Encoder-decoder with atrous separable convolution for semantic image segmentation

    Proceedings of the European conference on computer vision (ECCV)

    (2018)
  • S. Choi et al.

    Cars can’t fly up in the sky: improving urban-scene segmentation via height-driven attention networks

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2020)
  • F. Chollet

    Xception: deep learning with depthwise separable convolutions

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • L. Di Girolamo et al.

    Cloud fraction errors caused by finite resolution measurements

    J. Geophys. Res.

    (1997)
  • G. Giuffrida et al.

    Cloudscout: a deep neural network for on-board cloud detection on hyperspectral images

    Remote Sens.

    (2020)
  • K. Han et al.

    Ghostnet: more features from cheap operations

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2020)
  • A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: efficient...
  • V. Kothari et al.

    The final frontier: deep learning in space

    Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications

    (2020)
  • H. Li et al.

    Dfanet: deep feature aggregation for real-time semantic segmentation

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2019)
  • J. Li et al.

    A lightweight deep learning-based cloud detection method for sentinel-2a imagery fusing multiscale spectral and spatial features

    IEEE Trans. Geosci. Remote Sens.

    (2021)
  • Cited by (8)

    View all citing articles on Scopus

    Chen Luo is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. She received the B.Sc. degree from Xidian University, China in 2011, and received M.Sc. degree from Hannover University, Germany in 2014. Her research interests include deep learning, remote sensing and computer vision.

    Shanshan Feng is currently an Associate Professor with the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the Ph.D. degree in Computer Science from Nanyang Technological University, Singapre, in 2017. His research interests include sequential data mining and social network analysis.

    Xutao Li is currently an Associate Professor with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the Ph.D. and Master degrees in Computer Science from Harbin Institute of Technology in 2013 and 2009, and the Bachelor from Lanzhou University of Technology in 2007. His research interests include data mining, machine learning, graph mining, and social network analysis, especially tensor-based learning and mining algorithms.

    Yunming Ye is currently a Professor with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the Ph.D. degree in Computer Science from Shanghai Jiao Tong University, Shanghai, China, in 2004. His research interests include data mining, text mining, and ensemble learning algorithms.

    Baoquan Zhang is currently pursuing the Ph.D. degree with the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the B.S. degree from the Harbin Institute of Technology, Weihai, China, in 2015, and the M.S. degree from the Harbin Institute of Technology, China, in 2017. His current research interests include meta learning, few-shot learning, and machine learning.

    Zhihao Chen is currently pursuing the M.Sc degree with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the B.S. degree from the Harbin Institute of Technology, China, in 2020. His research interests include machine learning, remote sensing and computer vision.

    Yingling Quan is currently pursuing the M.Sc degree with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. She received the B.Sc. degree from University of Science and Technology of China, China in 2021. Her research interests include machine learning, remote sensing and computer vision.

    View full text