ECDNet: A bilateral lightweight cloud detection network for remote sensing images

doi:10.1016/j.patcog.2022.108713

Pattern Recognition

Volume 129, September 2022, 108713

https://doi.org/10.1016/j.patcog.2022.108713 Get rights and content

Highlights

•
We propose a neural network: ECDNet. It consists of a lightweight two-pathway encoder and an extremely lightweight decoder.
•
In the encoder, the dense pyramid module (DPM) is designed to have large and diverse receptive fields in feature extraction.
•
In the encoder, the fusion module (FM) is developed to fuse detail and semantic information more efficiently.
•
The experiment results on LandSat8 and MODIS demonstrate that ECDNet can achieve state-of-the-art performance.

Abstract

Cloud detection is one of the critical tasks in remote sensing image pre-processing and it has attracted extensive research interest. In recent years, deep neural networks based cloud detection methods have surpassed the traditional methods (threshold-based methods and conventional machine learning-based methods). However, current approaches mainly focus on improving detection accuracy. The computation complexity and large model size are ignored. To tackle this problem, we propose a lightweight deep learning cloud detection model: Efficient Cloud Detection Network (ECDNet). This model is based on the encoder-decoder structure. In the encoder, a two-path architecture is proposed to extract the spatial and semantic information concurrently. One pathway is the detail branch. It is designed to capture low-level detail spatial features with only a few parameters. The other pathway is the semantic branch, which is mainly for capturing context features. In the semantic branch, a proposed dense pyramid module (DPM) is designed for multi-scale contextual information extraction. The number of parameters and calculations in DPM is greatly reduced by features reusing. Besides, a FusionBlock is developed to merge these two kinds of information. Then the extreme lightweight decoder recovers the cloud mask to the same scale as the input image step by step. To improve performance, boost loss is introduced without inference cost increment. We evaluate the proposed method on two public datasets: LandSat8 and MODIS. Extensive experiments demonstrate that the proposed ECDNet achieves comparable accuracy as the state-of-art cloud detection methods, and meantime has a much smaller model size and less computation burden.

Introduction

Cloud detection is an essential step in remote sensing. Two-thirds of our earth’s surface is covered by clouds [38]. When the earth’s surface is the objective of satellite images, cloud will be treated as noise and should be removed in pre-processing phase. This phase ensures the high quality of various remote sensing applications, such as land cover classification [38], environment observation [17], and vegetation engineering [30]. However, the strict bandwidth constraints for downlink transmission conflict with transferring high-resolution hyperspectral images from satellites [8]. Recently, the trend of deploying remote sensing applications directly on-board satellite attracts more and more attention [12]. Therefore, the cloud detection method, which can be implemented and executed on satellites, is in great demand.

In practice, to tackle the cloud detection problem, the traditional methods can be divided into two groups, i.e., threshold-based methods and machine learning-based methods. The threshold-based approaches utilize the spectral and wave-length difference between cloud and other objects for cloud identification [40]. However, the process of fetching an appropriate threshold is computationally expensive. In addition, for multi-spectral satellite imagery, which only has four bands (red, green, blue, and near-infrared), the threshold method is usually not robust enough. To overcome the limitation of threshold-based approaches, researchers adopted machine learning techniques in cloud detection, e.g., support vector machines [22], random forest [7]. Unfortunately, their performance highly relies on manual-crafted features. And these features usually contain insufficient distinguishable information. Therefore, it is difficult to discriminate cloud from other objects with them especially in complicated cases.

Recently, with the rise of deep learning and its success in computer vision [27] and remote sensing image processing [13], cloud detection methods based on convolutional neural network (CNN) have been widely investigated [20], [24], [31]. In these methods, features are extracted via CNN automatically. And the derived methods have achieved significant improvement in performance. However, these CNN-based models [31] tend to require a great number of parameters to achieve satisfactory performance. And the satellite, as a space device, has limitations in onboard resources like storage, computation, and power. Therefore, it’s impractical to directly deploy the existing CNN-based cloud detection methods on-board.

Since semantic segmentation is similar to cloud detection, e.g., they both associate each pixel of an image with a class label [27], another possible solution is deploying lightweight models for semantic segmentation tasks in onboard cloud detection. In recent decades, researchers have achieved significant improvement [10], [27], [28], [39] in lightweight semantic segmentation for Natural Scene Images (NSIs). However, there is an obvious gap between natural scene images and Remote Sensing Images (RSIs). First, the RSIs have the characteristics that the feature variance of the object is bigger in intra-class and smaller in inter-class than that of NSIs. For example, because of the influence of background, the features of the thin cloud are different from the features of the cloud. Besides, snow and ice cloud have quite similar features in most RSI bands [31]. Second, the object boundary, especially the cloud boundary is unclear, e.g., the thin cloud situation [6]. Therefore, it is unreliable to directly apply the semantic segmentation methods.

Recently, some researchers have paid attention to efficient cloud detection methods. In [8], a CNN-based model is proposed for nanosatellites to select eligible images to transmit to the ground. Even though the network structure is designed with low power consumption and low latency in inference, its accuracy is low. In [16], a lightweight network for cloud detection is designed on Sentinel-2A images. In this model, the number of parameters is reduced by using depthwise separable convolution and sharing kernel between channels in feature extraction blocks. However, kernel sharing cannot reduce the computation complexity, and hence the computation burden still exists. In conclusion, there still exist challenges in onboard cloud detection.

We observe that the cloud pixels are usually not isolated from others. Even though the thin cloud pixel has different features from the cloud and its spectral feature is heavily influenced by background, when it’s surrounded by cloud pixel, it’s of high possibility to be cloud. Therefore, context information of multi-scale is crucial for cloud detection. And the first challenge is how to make full use of such multi-scale context information. Since the current lightweight cloud detection method only focuses on minimizing the number of parameters, to obtain both effective and efficient onboard cloud detection, the second challenge is reducing the computation complexity as well.

Specifically, we propose the Efficient Cloud Detection Net (ECDNet), a new lightweight encoder-decoder neural network architecture to achieve comparable performance with generic state-of-art cloud detection methods. Inspired by Yu et al. [34], the encoder part is designed as a two-pathway architecture. One pathway is the detail branch. To keep enough spatial information, it is designed with shallow layers and wide channels. To decrease parameter amounts and computation cost, the lightweight module GhostModule [9] is adopted in the detail branch. The other pathway is the semantic branch. Different from the detail branch, in the semantic branch, as much as multi-scale semantic context information is captured. To have a large receptive field and meanwhile keep the network lightweight, we propose a dense pyramid module (DPM). In this module, each layer has fewer channels, and the features of layers are reused via dense connection to decrease computation cost. Inside each layer, a feature pyramid module is designed to extract and concatenate features with different receptive fields. The proposed semantic branch consists of a stemblock and two DPMs. It incorporates a large context without parameter amount increment. Then the outputs of the detail branch and semantic branch are fused via our proposed fusion module (FM) to gain more comprehensive feature maps. At last, a lightweight decoder takes intermediate results of the encoder to compensate for lost spatial features in downsampling. And it resizes feature maps to the original size of input step by step.

The main contributions of our work can be summarized as follows:

a)
We propose a neural network: ECDNet. It consists of a lightweight two-pathway encoder and an extremely lightweight decoder.
b)
In the encoder, the dense pyramid module (DPM) is designed to have large and diverse receptive fields in feature extraction.
c)
In the encoder, the fusion module (FM) is developed to fuse detail and semantic information more efficiently.
d)
The experiment results on LandSat8 and MODIS demonstrate that ECDNet can achieve state-of-the-art performance.

Section snippets

Cloud detection methods

The most straightforward way to distinguish cloud from other objects is by utilizing differences in their spectral characteristics to calculate thresholds. Fmask [40], which used the Landsat Top of Atmosphere (TOA) reflectance and Brightness Temperature (BT) of Landsat images, produced a probability mask as the threshold for cloud detection. Wei et al. proposed an algorithm to dynamically determine a proper threshold [29]. And the database used in this algorithm is constructed with MODIS

Overview of efficient cloud detection network (ECDNet)

In Fig. 1 (Page 9), the Efficient Cloud Detection Network (ECDNet) is depicted, which is based on an encoder-decoder architecture.

In the encoder, to capture the spatial and multi-scale context information separately, a two-path architecture is adopted. It ensures that the diversity of features is taken into account to strengthen the expressive ability of the feature map. The detail branch is designed as a 3-stage ResNet module with shallow layers to extract detail spatial features. And the

Datasets and experiments setup

To evaluate the proposed ECDnet, we compare ECDnet with other state-of-art cloud detection and semantic segmentation approaches on two remote sensing images (RSI) datasets: LandSat8 and MODIS. Then we conduct ablation experiments. The models are compared in performance and efficient aspect.

Conclusion and future work

In this paper, we propose a lightweight method based on a deep convolutional neural network for cloud detection: ECDNet. It aims at tackling the onboard cloud detection problem on satellites. By considering that the satellites have limitations on computation, storage, and power resource, ECDNet is designed to be an extremely lightweight model with less performance degradation in comparison with existing state-of-art methods. The method is based on encoder-decoder architecture. In the encoder,

Acknowledgments

This work was supported by the Shenzhen Science and Technology Program under Grant No. JCYJ20210324120208022 and Grant No. JCYJ20200109113014456.

Chen Luo is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. She received the B.Sc. degree from Xidian University, China in 2011, and received M.Sc. degree from Hannover University, Germany in 2014. Her research interests include deep learning, remote sensing and computer vision.

References (40)

N. Ghasemian et al.
Introducing two random forest based methods for cloud detection in remote sensing images
Adv. Space Res.
(2018)
J.H. Jeppesen et al.
A cloud detection algorithm for satellite imagery based on deep learning
Remote Sens. Environ.
(2019)
K. Nogueira et al.
Towards better exploiting convolutional neural networks for remote sensing scene classification
Pattern Recognit.
(2017)
C. Li et al.
Detection of cloud cover using dynamic thresholds and radiative transfer models from the polarization satellite image
J. Quant. Spectrosc. Radiat. Transf.
(2019)
Z. Li et al.
Remote sensing of atmospheric particulate mass of dry PM2.5 near the ground: method validation using ground-based measurements
Remote Sens. Environ.
(2016)
L. Ma et al.
Deep learning in remote sensing applications: ameta-analysis and review
ISPRS J. Photogramm. Remote Sens.
(2019)
M. Oršić et al.
Efficient semantic segmentation with pyramidal fusion
Pattern Recognit.
(2021)
Z. Zhu et al.
Object-based cloud and cloud shadow detection in landsat imagery
Remote Sens. Environ.
(2012)
T. Bai et al.
Cloud detection for high-resolution satellite imagery using machine learning and multi-feature fusion
Remote Sens.
(2016)
L. C. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution for semantic image segmentation, arXiv...

L.C. Chen et al.

Encoder-decoder with atrous separable convolution for semantic image segmentation

Proceedings of the European conference on computer vision (ECCV)

(2018)

S. Choi et al.

Cars can’t fly up in the sky: improving urban-scene segmentation via height-driven attention networks

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

(2020)

F. Chollet

Xception: deep learning with depthwise separable convolutions

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2017)

L. Di Girolamo et al.

Cloud fraction errors caused by finite resolution measurements

J. Geophys. Res.

(1997)

G. Giuffrida et al.

Cloudscout: a deep neural network for on-board cloud detection on hyperspectral images

Remote Sens.

(2020)

K. Han et al.

Ghostnet: more features from cheap operations

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

(2020)

A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: efficient...

V. Kothari et al.

The final frontier: deep learning in space

Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications

(2020)

H. Li et al.

Dfanet: deep feature aggregation for real-time semantic segmentation

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

(2019)

J. Li et al.

A lightweight deep learning-based cloud detection method for sentinel-2a imagery fusing multiscale spectral and spatial features

IEEE Trans. Geosci. Remote Sens.

(2021)

Cited by (8)

RASNet: Renal automatic segmentation using an improved U-Net with multi-scale perception and attention unit
2024, Pattern Recognition
For the treatment of renal disease, the application of radioactive equipment has become one of the important methods. Accurate segmentation of renal contour plays an important role in clinical diagnosis. However, manual renal contour drawing is not only inefficient but also prone to inaccurate outlining results due to different manual proficiency and fatigue caused by long-term work. There is little research on automatic renal segmentation with renal dynamic imaging. To address this issue, an improved model based on a deep neural network called Renal Automatic Segmentation Network (RASNet) is proposed, to aid in the automatic segmentation of renal contours. Besides, a multi-scale spatial perception module and a decoding module with attention connection are introduced to enrich the semantic information and further improve the accuracy of network segmentation. Extensive experiments were conducted on a renal dynamic medical image database established in this paper. Analysis results show the superiority of the proposed RASNet to several existing segmentation frameworks.
AC<sup>2</sup>AS: Activation Consistency Coupled ANN-SNN framework for fast and memory-efficient SNN training
2023, Pattern Recognition
Spiking neural networks are efficient computation models for low-power environments. Spike-based BP algorithms and ANN-to-SNN (ANN2SNN) conversions are successful techniques for SNN training. Nevertheless, the spike-base BP training is slow and requires large memory costs, while ANN2SNN needs many inference steps to obtain good performance. In this paper, we propose an Activation Consistency Coupled ANN-SNN (A $C^{2}$ AS) framework to train the SNN in a fast and memory-efficient way. The A $C^{2}$ AS consists of two components: (a) a weight-shared architecture between ANN and SNN and (b) spiking mapping units. Firstly, the architecture trains the weight-shared parameters on the ANN branch, resulting in fast training and low memory costs for SNN. Secondly, the spiking mapping units are designed to ensure that the activation values of the ANN are the spiking features. As a result, the activation consistency is guaranteed, and the classification error of the SNN can be optimized by training the ANN branch. Besides, we design an adaptive threshold adjustment (ATA) algorithm to decrease the firing of noisy spikes. Experiment results show that our A $C^{2}$ AS-based models perform well on the benchmark datasets (CIFAR10, CIFAR100, and Tiny-ImageNet). Moreover, the A $C^{2}$ AS achieves comparable accuracy under $0.625 \times$ time steps, $0.377 \times$ training time, $0.27 \times$ GPU memory costs, and $0.33 \times$ spike activities of the Spike-based BP model. The code is available at https://github.com/TJXTT/AC2ASNN.
A balanced random learning strategy for CNN based Landsat image segmentation under imbalanced and noisy labels
2023, Pattern Recognition
Landsat image segmentation is important for obtaining large-scale land cover maps. The accuracy of CNN-based Landsat image segmentation highly depends on the quantity and quality of the training samples. However, enough accurate labels for Landsat images are difficult to access. Fortunately, traditional classifier induced segmentation results can be considered as an alternative, although they are noisy and unbalanced to a certain extent. To resist noisy labels and alleviate the impact of imbalanced samples, this paper proposes a confidence interval based balanced random learning strategy. Firstly, a confidence interval-based mask is employed to control the random learning rate of the network from the entire noisy training set. Then, the multi-layer feature maps of CNN are fully utilized to compensate for the information loss in random learning, in which down-sampled labels are used to decrease the uncertainty brought by up-sampling CNN feature maps. In addition, considering the corruption of noisy labels on different classes, a balanced random learning with different confidence levels is performed on each class to further improve the learning ability of CNN. Experimental results on two widely used backbones, namely VGGNet and ResNet, demonstrate that the proposed balanced random learning strategy can effectively improve the performance of CNN under imbalanced and noisy labels, which can be improved by 3.41%.
Local multi-scale feature aggregation network for real-time image dehazing
2023, Pattern Recognition
Haze causes visual degradation and obscures image information, which gravely affects the reliability of computer vision tasks in real-time systems. Leveraging an enormous number of learning parameters as the restoration costs, learning-based methods have gained significant success, but they are runtime intensive or memory inefficient. In this paper, we propose a local multi-scale feature aggregation network, called LMFA-Net, which has a lightweight model structure and can be used for real-time dehazing. By learning the local mapping relationship between the clean value of a haze image at a certain point and its surrounding local region, LMFA-Net can directly restore the final haze-free image. In particular, we adopt a novel multi-scale feature extraction sub-network (M-Net) to extract features from different scales. As a lightweight network, LMFA-Net can achieve fast and efficient dehazing. Extensive experiments demonstrate that our proposed LMFA-Net surpasses previous state-of-the-art lightweight dehazing methods in both quantitatively and qualitatively.
Deep Learning for In-Orbit Cloud Segmentation and Classification in Hyperspectral Satellite Data
2024, arXiv
Snn2ann: A Fast and Memory-Efficient Training Framework for Spiking Neural Networks
2023, SSRN

View all citing articles on Scopus

Shanshan Feng is currently an Associate Professor with the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the Ph.D. degree in Computer Science from Nanyang Technological University, Singapre, in 2017. His research interests include sequential data mining and social network analysis.

Xutao Li is currently an Associate Professor with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the Ph.D. and Master degrees in Computer Science from Harbin Institute of Technology in 2013 and 2009, and the Bachelor from Lanzhou University of Technology in 2007. His research interests include data mining, machine learning, graph mining, and social network analysis, especially tensor-based learning and mining algorithms.

Yunming Ye is currently a Professor with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the Ph.D. degree in Computer Science from Shanghai Jiao Tong University, Shanghai, China, in 2004. His research interests include data mining, text mining, and ensemble learning algorithms.

Baoquan Zhang is currently pursuing the Ph.D. degree with the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the B.S. degree from the Harbin Institute of Technology, Weihai, China, in 2015, and the M.S. degree from the Harbin Institute of Technology, China, in 2017. His current research interests include meta learning, few-shot learning, and machine learning.

Zhihao Chen is currently pursuing the M.Sc degree with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the B.S. degree from the Harbin Institute of Technology, China, in 2020. His research interests include machine learning, remote sensing and computer vision.

Yingling Quan is currently pursuing the M.Sc degree with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. She received the B.Sc. degree from University of Science and Technology of China, China in 2021. Her research interests include machine learning, remote sensing and computer vision.

View full text

ECDNet: A bilateral lightweight cloud detection network for remote sensing images

Highlights

Abstract

Introduction

Section snippets

Cloud detection methods

Overview of efficient cloud detection network (ECDNet)

Datasets and experiments setup

Conclusion and future work

Acknowledgments

Adv. Space Res.

Remote Sens. Environ.

Pattern Recognit.

J. Quant. Spectrosc. Radiat. Transf.

Remote Sens. Environ.

ISPRS J. Photogramm. Remote Sens.

Pattern Recognit.

Remote Sens. Environ.

Cloud detection for high-resolution satellite imagery using machine learning and multi-feature fusion

Remote Sens.

Encoder-decoder with atrous separable convolution for semantic image segmentation

Proceedings of the European conference on computer vision (ECCV)

Cars can’t fly up in the sky: improving urban-scene segmentation via height-driven attention networks

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Xception: deep learning with depthwise separable convolutions

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Cloud fraction errors caused by finite resolution measurements

J. Geophys. Res.

Cloudscout: a deep neural network for on-board cloud detection on hyperspectral images

Remote Sens.

Ghostnet: more features from cheap operations

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

The final frontier: deep learning in space

Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications

Dfanet: deep feature aggregation for real-time semantic segmentation

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

A lightweight deep learning-based cloud detection method for sentinel-2a imagery fusing multiscale spectral and spatial features

IEEE Trans. Geosci. Remote Sens.