ECDNet: A bilateral lightweight cloud detection network for remote sensing images
Introduction
Cloud detection is an essential step in remote sensing. Two-thirds of our earth’s surface is covered by clouds [38]. When the earth’s surface is the objective of satellite images, cloud will be treated as noise and should be removed in pre-processing phase. This phase ensures the high quality of various remote sensing applications, such as land cover classification [38], environment observation [17], and vegetation engineering [30]. However, the strict bandwidth constraints for downlink transmission conflict with transferring high-resolution hyperspectral images from satellites [8]. Recently, the trend of deploying remote sensing applications directly on-board satellite attracts more and more attention [12]. Therefore, the cloud detection method, which can be implemented and executed on satellites, is in great demand.
In practice, to tackle the cloud detection problem, the traditional methods can be divided into two groups, i.e., threshold-based methods and machine learning-based methods. The threshold-based approaches utilize the spectral and wave-length difference between cloud and other objects for cloud identification [40]. However, the process of fetching an appropriate threshold is computationally expensive. In addition, for multi-spectral satellite imagery, which only has four bands (red, green, blue, and near-infrared), the threshold method is usually not robust enough. To overcome the limitation of threshold-based approaches, researchers adopted machine learning techniques in cloud detection, e.g., support vector machines [22], random forest [7]. Unfortunately, their performance highly relies on manual-crafted features. And these features usually contain insufficient distinguishable information. Therefore, it is difficult to discriminate cloud from other objects with them especially in complicated cases.
Recently, with the rise of deep learning and its success in computer vision [27] and remote sensing image processing [13], cloud detection methods based on convolutional neural network (CNN) have been widely investigated [20], [24], [31]. In these methods, features are extracted via CNN automatically. And the derived methods have achieved significant improvement in performance. However, these CNN-based models [31] tend to require a great number of parameters to achieve satisfactory performance. And the satellite, as a space device, has limitations in onboard resources like storage, computation, and power. Therefore, it’s impractical to directly deploy the existing CNN-based cloud detection methods on-board.
Since semantic segmentation is similar to cloud detection, e.g., they both associate each pixel of an image with a class label [27], another possible solution is deploying lightweight models for semantic segmentation tasks in onboard cloud detection. In recent decades, researchers have achieved significant improvement [10], [27], [28], [39] in lightweight semantic segmentation for Natural Scene Images (NSIs). However, there is an obvious gap between natural scene images and Remote Sensing Images (RSIs). First, the RSIs have the characteristics that the feature variance of the object is bigger in intra-class and smaller in inter-class than that of NSIs. For example, because of the influence of background, the features of the thin cloud are different from the features of the cloud. Besides, snow and ice cloud have quite similar features in most RSI bands [31]. Second, the object boundary, especially the cloud boundary is unclear, e.g., the thin cloud situation [6]. Therefore, it is unreliable to directly apply the semantic segmentation methods.
Recently, some researchers have paid attention to efficient cloud detection methods. In [8], a CNN-based model is proposed for nanosatellites to select eligible images to transmit to the ground. Even though the network structure is designed with low power consumption and low latency in inference, its accuracy is low. In [16], a lightweight network for cloud detection is designed on Sentinel-2A images. In this model, the number of parameters is reduced by using depthwise separable convolution and sharing kernel between channels in feature extraction blocks. However, kernel sharing cannot reduce the computation complexity, and hence the computation burden still exists. In conclusion, there still exist challenges in onboard cloud detection.
We observe that the cloud pixels are usually not isolated from others. Even though the thin cloud pixel has different features from the cloud and its spectral feature is heavily influenced by background, when it’s surrounded by cloud pixel, it’s of high possibility to be cloud. Therefore, context information of multi-scale is crucial for cloud detection. And the first challenge is how to make full use of such multi-scale context information. Since the current lightweight cloud detection method only focuses on minimizing the number of parameters, to obtain both effective and efficient onboard cloud detection, the second challenge is reducing the computation complexity as well.
Specifically, we propose the Efficient Cloud Detection Net (ECDNet), a new lightweight encoder-decoder neural network architecture to achieve comparable performance with generic state-of-art cloud detection methods. Inspired by Yu et al. [34], the encoder part is designed as a two-pathway architecture. One pathway is the detail branch. To keep enough spatial information, it is designed with shallow layers and wide channels. To decrease parameter amounts and computation cost, the lightweight module GhostModule [9] is adopted in the detail branch. The other pathway is the semantic branch. Different from the detail branch, in the semantic branch, as much as multi-scale semantic context information is captured. To have a large receptive field and meanwhile keep the network lightweight, we propose a dense pyramid module (DPM). In this module, each layer has fewer channels, and the features of layers are reused via dense connection to decrease computation cost. Inside each layer, a feature pyramid module is designed to extract and concatenate features with different receptive fields. The proposed semantic branch consists of a stemblock and two DPMs. It incorporates a large context without parameter amount increment. Then the outputs of the detail branch and semantic branch are fused via our proposed fusion module (FM) to gain more comprehensive feature maps. At last, a lightweight decoder takes intermediate results of the encoder to compensate for lost spatial features in downsampling. And it resizes feature maps to the original size of input step by step.
The main contributions of our work can be summarized as follows:
- a)
We propose a neural network: ECDNet. It consists of a lightweight two-pathway encoder and an extremely lightweight decoder.
- b)
In the encoder, the dense pyramid module (DPM) is designed to have large and diverse receptive fields in feature extraction.
- c)
In the encoder, the fusion module (FM) is developed to fuse detail and semantic information more efficiently.
- d)
The experiment results on LandSat8 and MODIS demonstrate that ECDNet can achieve state-of-the-art performance.
Section snippets
Cloud detection methods
The most straightforward way to distinguish cloud from other objects is by utilizing differences in their spectral characteristics to calculate thresholds. Fmask [40], which used the Landsat Top of Atmosphere (TOA) reflectance and Brightness Temperature (BT) of Landsat images, produced a probability mask as the threshold for cloud detection. Wei et al. proposed an algorithm to dynamically determine a proper threshold [29]. And the database used in this algorithm is constructed with MODIS
Overview of efficient cloud detection network (ECDNet)
In Fig. 1 (Page 9), the Efficient Cloud Detection Network (ECDNet) is depicted, which is based on an encoder-decoder architecture.
In the encoder, to capture the spatial and multi-scale context information separately, a two-path architecture is adopted. It ensures that the diversity of features is taken into account to strengthen the expressive ability of the feature map. The detail branch is designed as a 3-stage ResNet module with shallow layers to extract detail spatial features. And the
Datasets and experiments setup
To evaluate the proposed ECDnet, we compare ECDnet with other state-of-art cloud detection and semantic segmentation approaches on two remote sensing images (RSI) datasets: LandSat8 and MODIS. Then we conduct ablation experiments. The models are compared in performance and efficient aspect.
Conclusion and future work
In this paper, we propose a lightweight method based on a deep convolutional neural network for cloud detection: ECDNet. It aims at tackling the onboard cloud detection problem on satellites. By considering that the satellites have limitations on computation, storage, and power resource, ECDNet is designed to be an extremely lightweight model with less performance degradation in comparison with existing state-of-art methods. The method is based on encoder-decoder architecture. In the encoder,
Acknowledgments
This work was supported by the Shenzhen Science and Technology Program under Grant No. JCYJ20210324120208022 and Grant No. JCYJ20200109113014456.
Chen Luo is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. She received the B.Sc. degree from Xidian University, China in 2011, and received M.Sc. degree from Hannover University, Germany in 2014. Her research interests include deep learning, remote sensing and computer vision.
References (40)
- et al.
Introducing two random forest based methods for cloud detection in remote sensing images
Adv. Space Res.
(2018) - et al.
A cloud detection algorithm for satellite imagery based on deep learning
Remote Sens. Environ.
(2019) - et al.
Towards better exploiting convolutional neural networks for remote sensing scene classification
Pattern Recognit.
(2017) - et al.
Detection of cloud cover using dynamic thresholds and radiative transfer models from the polarization satellite image
J. Quant. Spectrosc. Radiat. Transf.
(2019) - et al.
Remote sensing of atmospheric particulate mass of dry PM2.5 near the ground: method validation using ground-based measurements
Remote Sens. Environ.
(2016) - et al.
Deep learning in remote sensing applications: ameta-analysis and review
ISPRS J. Photogramm. Remote Sens.
(2019) - et al.
Efficient semantic segmentation with pyramidal fusion
Pattern Recognit.
(2021) - et al.
Object-based cloud and cloud shadow detection in landsat imagery
Remote Sens. Environ.
(2012) - et al.
Cloud detection for high-resolution satellite imagery using machine learning and multi-feature fusion
Remote Sens.
(2016) - L. C. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution for semantic image segmentation, arXiv...
Encoder-decoder with atrous separable convolution for semantic image segmentation
Proceedings of the European conference on computer vision (ECCV)
Cars can’t fly up in the sky: improving urban-scene segmentation via height-driven attention networks
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Xception: deep learning with depthwise separable convolutions
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Cloud fraction errors caused by finite resolution measurements
J. Geophys. Res.
Cloudscout: a deep neural network for on-board cloud detection on hyperspectral images
Remote Sens.
Ghostnet: more features from cheap operations
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
The final frontier: deep learning in space
Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications
Dfanet: deep feature aggregation for real-time semantic segmentation
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
A lightweight deep learning-based cloud detection method for sentinel-2a imagery fusing multiscale spectral and spatial features
IEEE Trans. Geosci. Remote Sens.
Cited by (8)
Local multi-scale feature aggregation network for real-time image dehazing
2023, Pattern Recognition
Chen Luo is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. She received the B.Sc. degree from Xidian University, China in 2011, and received M.Sc. degree from Hannover University, Germany in 2014. Her research interests include deep learning, remote sensing and computer vision.
Shanshan Feng is currently an Associate Professor with the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the Ph.D. degree in Computer Science from Nanyang Technological University, Singapre, in 2017. His research interests include sequential data mining and social network analysis.
Xutao Li is currently an Associate Professor with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the Ph.D. and Master degrees in Computer Science from Harbin Institute of Technology in 2013 and 2009, and the Bachelor from Lanzhou University of Technology in 2007. His research interests include data mining, machine learning, graph mining, and social network analysis, especially tensor-based learning and mining algorithms.
Yunming Ye is currently a Professor with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the Ph.D. degree in Computer Science from Shanghai Jiao Tong University, Shanghai, China, in 2004. His research interests include data mining, text mining, and ensemble learning algorithms.
Baoquan Zhang is currently pursuing the Ph.D. degree with the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the B.S. degree from the Harbin Institute of Technology, Weihai, China, in 2015, and the M.S. degree from the Harbin Institute of Technology, China, in 2017. His current research interests include meta learning, few-shot learning, and machine learning.
Zhihao Chen is currently pursuing the M.Sc degree with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. He received the B.S. degree from the Harbin Institute of Technology, China, in 2020. His research interests include machine learning, remote sensing and computer vision.
Yingling Quan is currently pursuing the M.Sc degree with the Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. She received the B.Sc. degree from University of Science and Technology of China, China in 2021. Her research interests include machine learning, remote sensing and computer vision.