Keywords

1 Introduction

Many countries see oil reserves as a national strategy and oil tank detection is one of the most important tasks in the field of remote sensing. It has great significance in security monitoring, disaster prevention and so on. Recently, military actions taken by some western countries against oil depots of the terrorist organization effectively weaken the enemy’s strength, furtherly proving its strategic position.

So far, most studies only focus on the detection task itself. In fact, oil tanks roughly includes three types: fixed cone-shaped crest, interior floating crest and exterior floating crest. The first two are cone-shaped crest structure, mainly storing light oil. The last one is flat crest structure and its capacity is mostly more than 10,000 \({m^3}\), mainly storing heavy oil. In the recently built oil depots of China, the capacity of most exterior floating crest oil tanks is 100,000 \({m^3}\). Obviously, these tanks have more strategic value. So we divide oil tanks into two types (flat crest and cone-shaped crest) and take detecting and distinguishing them as our task.

Most studies are based on the shape feature (circle) of oil tanks. Zhang et al. [1] applied edge detection and Hough transform [2] to the images processed by Brovey transform fusion method. Xu et al. [3] exploited the characteristics of quasi-circular shadow and highlighting arcs of SAR images. Ok and Baseski [4] used the symmetric feature of circle. Zhao et al. [5] improved Hough transform to get a directional and weighted Hough voting method.

Zhang et al. [6] proposed a systematic three-step process: candidate selection, feature extraction, and classification. They used a rapid and efficient algorithm called ELSD [7] (ellipse and line segment detector) for candidate selection and an existing CNN model trained by Krizhevsky [8] in the ILSVRC2012 contest to extract surrounding features. Coupled with the extracted features of HOG, LBP and Gabor, SVM is applied to the feature set. The same process was also used to detect seals [9].

In this paper we demonstrate how to detect and classify oil tanks into flat crest and cone-shaped crest from optical satellite images. The framework includes four steps: (1) prepare dataset; (2) train the classifier; (3) extract candidate regions and (4) classification. In step3, some clustering tricks are used to determine the final position and size of each potential target to solve the double-detection (one target corresponds to multiple candidate regions) problem.

2 Oil Tank Detection and Classification

This framework consists of four steps: (1) prepare dataset; (2) train the classifier with the deep network Krizhevsky [10] used on cifar-10 dataset; (3) extract candidate regions with ELSD [7]; (4) apply the classifier to these candidate regions.

2.1 Prepare Dataset

The type of input data is optical satellite image with RGB mode. From Google Earth, optical images of some oil depots in America, China, Japan and South Korea are collected as raw data. The resolution is between 0.5 and 0.9 m. Some of the collected images are cut into square pieces in the method of sliding window. To simulate different illumination conditions, the HSV color space is exploited to adjust the brightness of images. To simulate different resolutions, images are cut in different scales. Finally, all these pieces are scaled to size of 32 \({\times }\) 32 to fit the input size of the deep network [10]. To achieve the task of detection and classification, images should be divided into three classes: flat crest tanks, cone-shaped crest tanks and non-tanks.

Table 1. The sizes of training and validation sets. Another 35 original optical satellite images are directly used as testing set. Two of them are shown in Fig. 5.

The final dataset totally contains 60101 labeled images, among which 15131 are flat crest tanks and 12618 are cone-shaped crest tanks. The dataset is split into training and validation sets. The sizes of these two sets are shown in Table 1. When training the classifier, cross validation is performed on the validation set. The other 35 original optical satellite images collected from Google Earth are directly used as testing set.

2.2 Train the Classifier

The classifier is trained with existing deep network [10] shown in Fig. 1. There are three pooling layers in this network. The first one is max-pooling and the other two are mean-pooling. Max-pooling can efficiently reduce computational complexity while mean-pooling keeps more information of local regions. It tries to find a balance between computational complexity and local details. This network also contains ReLu (Rectified Linear Units) layer and LRN (Local Response Normalization) layer around pooling layer. ReLu is an activation function: \(f(x) = max(0, x)\). It directly erases the negative value but keeps the positive value. Therefore, the sparsity is produced, which can effectively accelerate the training process and prevent over-fitting in some extent. The LRN process imitates the lateral inhibition mechanism of nervous system. It highlights the larger response value and as a result enhances the generalization ability of this model.

Fig. 1.
figure 1

The deep network structure (CNN) Krizhevsky [10] used on cifar-10 dataset.

The classifier is trained with standard SGD (stochastic gradient descent) algorithm. Each layer contains two calculations, forward and backward passes. The forward pass computes the output given the input for inference. The backward pass computes the gradient given the loss for learning. The weights are initialized with a Gaussian distribution of variance 0.01 and mean 0. The bias is initialized with 0 and the learning rate is set to 0.001. In addition, GPU is used to accelerate the training process. After iterations, parameters finally got can roughly describe the distribution of dataset and be used as classifier for other images.

2.3 Extract Candidate Regions

Existing algorithms for candidate selection in oil tank detection can be roughly classified into two categories: potential targets extraction and circle detection. Among the first one, a widely used method is selective-search [11]. Hough transform [2] and ELSD [7] are typical line segment and elliptical arc detectors which can be used for circle detection.

Fig. 2.
figure 2

Figure (a) shows the ground truth. The size is 1011 \({\times }\) 657. The flat crest tanks are marked with thick boxes and the cone-shaped crest tanks are marked with thin boxes. The other three figures respectively show the results of extracting candidate regions with methods: (b) selective search, (c) Hough transform and (d) ELSD. The tanks in black boxes are missed or not properly surrounded.

Selective Search. This method is based on hierarchical grouping algorithm, optimized with three complementary strategies including color spaces, similarity measures and starting regions [11]. Here selective-search is applied to an optical satellite image which contains oil tanks of quite different scales. As shown in Fig. 2 (b), this method misses 9 out of 39 tanks. Some are not found and the extracted sizes of the others are wrong. The missing rate reaches over 20 %. It greatly reduces the performance of the whole framework.

Hough Transform. The basic principle of Hough transform [2] is that lines in the image space are changed to aggregation points of the parameter space by using the duality of points and lines, so as to validate whether an image contains the curves of given characteristics. Several parameters including \(min\_radius\) (minimal radius of the circles to search for), \(max\_radius\) (maximal radius of the circles to search for) and \(min\_dist\) (minimum distance between centers of the detected circles) need to be provided to this algorithm. To effectively reduce false alarms and meanwhile not miss any target, all target sizes should be between the \(min\_radius\) and the \(max\_radius\). At the same time, the distance between any pair of targets should be greater than the \(min\_dist\). Only the request to provide parameters artificially itself will greatly reduce the algorithm’s generality and automation ability. What’s more, inaccurate parameters will lead to a rapid increase in the number of candidate regions.

Taking Fig. 2 (c) as an example, in the case of 4 missed targets, the potential regions of different tanks have already begun to overlap with each other in a large area. And when ensuring that all targets are not missed by adjusting the parameters, the image space has already been covered by massive false alarms.

ELSD. ELSD [7] is a combined and parameter-free line segment and elliptical arc detector. This detector obeys a 3-step scheme: candidate selection, candidate validation and model selection. The automation and generality ability of object extraction is guaranteed by the characteristic of parameter-free. As shown in Fig. 2 (d), all targets are in some appropriate boxes.

However, due to shadow, texture or other factors, there are many redundant regions. Zhang et al. [6] took a trick to remove most of these regions, in which all discontinuous arcs belonging to a same ring are figured out. Their trick need to compute distances between a center and every aligned pixel of a detected arc, with high computational complexity. We use a rule of concentric circles depending on the center coordinates and radius instead of all aligned pixels.

Step 1: validate whether two discontinuous arcs belong to a same ring.

$$\begin{aligned} Dist(a,\; b) = \sqrt{{(Cenr_a - Cenr_b)}^{2} + {(Cenc_a - Cenc_b)}^{2}}. \end{aligned}$$
(1)
$$\begin{aligned} \frac{Dist(a,\; b)}{max(r_a,\; r_b)} \le \theta ,\; \theta \ge 0. \end{aligned}$$
(2)
$$\begin{aligned} \frac{|r_a - r_b|}{max(r_a,\; r_b)} \le \sigma ,\; \sigma \ge 0. \end{aligned}$$
(3)

where Cenr and Cenc, respectively indicate the row and column index of a circle’s center, and r is the circle’s radius. \(\theta \) and \(\sigma \) are two constants. The subscripts indicate two different circles. The formula (2) validates whether two circles have a same center and the formula (3) validates whether two circles have a same radius. They are both robust to different resolutions due to the ratio.

Fig. 3.
figure 3

Figure (a) shows the process to determine an only candidate region for a target. Figure (b) shows the optimization result of Fig. 2 (d) when applying the ratio trick [6]. Figure (c) shows the result when using additional steps to figure (b).

Step 2: compute a aligned ratio that all aligned pixels occupy in total pixels of a ring, same as what proposed in [6].

$$\begin{aligned} R_{circle} = \frac{k_x(circle)}{l_{circle}}. \end{aligned}$$
(4)

where \(l_{circle}\) indicates the total pixels of the ring and \(k_x(circle)\) is the number of aligned pixels. A threshold is used to select the final validated selections. But after this, there is still phenomenon of double-detection. Figure 3 (b) shows the optimization result of Fig. 2 (d). Additional steps in Fig. 3 (a) are taken.

Step 3: group candidate regions based on Jaccard similarity. Each group corresponds to an only target.

$$\begin{aligned} J(a,\; b) = \frac{|area_a \cap area_b|}{|area_a \cup area_b|} \; . \end{aligned}$$
(5)

where area indicates the area of a circle.

Step 4: perform two-class clustering on each group according to the center distance of circles.

If the summaries (average value of the center coordinates and radius for each cluster) of the two clusters meet a constraint of formula (2) but don’t satisfy formula (3), the whole group should be selected as representative results. Because the cluster with shorter radius most likely responds to ring contour on flat crest tanks and the small cluster should not be dropped. Otherwise, the small cluster is most likely caused by shadows. So the large cluster is selected as representative results. The box determined by those finally selected regions from one group is used as the only candidate region of this group. Figure 3 (c) shows optimization result of Fig. 3 (b). Every target is properly marked with an only box.

2.4 Classification

After scaled to size of 32 \({\times }\) 32, the classifier can be applied to these candidate regions. The classification performance is evaluated with confusion matrix. We adopt two indicators: precision and recall. For each class, they are defined as:

$$\begin{aligned} Precision = \frac{number\; of\; correctly\; classified\; targets}{number\; of\; predicted\; targets} \; . \end{aligned}$$
(6)
$$\begin{aligned} Recall = \frac{number\; of\; correctly\; classified\; targets}{number\; of\; actual\; targets} \; . \end{aligned}$$
(7)

3 Experiments and Results

In total, 35 optical satellite images are selected as test data from different oil depots on Google Earth. Two of them are shown In Fig. 5. These 35 images totally contain 729 flat crest tanks and 894 cone-shaped crest tanks.

Fig. 4.
figure 4

Recall-precision graph of the ratio threshold in candidate selection. The ratio indicates proportion that aligned pixels occupy in total pixels of a ring.

Fig. 5.
figure 5

Figure (b) shows the final candidate regions extracted from figure (a) and figure (c) shows the classification results of figure (b).

The optimization method in [6] used to filter out false alarms suggested a threshold of 0.4. Through experiment shown in Fig. 4, 0.3 is selected as our threshold. Eventually 2 flat crest tanks and 97 cone-shaped crest tanks are missed. The Fig. 5 (b) shows the final candidate regions extracted from Fig. 5 (a). The reasons for missed detection are mainly the appearance of corrosion and poor imaging conditions. Besides, the scales and shapes of cone-shaped crest tanks can be quite different. Both the smaller cross sectional area and short distance between small tanks make them easier to be missed. These factors challenge the process of classification too.

The Fig. 5 (c) shows the classification results. The confusion matrix (Table 2) shows the performance of flat crest tanks is better than cone-shaped crest tanks. It proves what we analyze before. For detection of all oil tanks, the precision and recall are respectively 0.924 and 0.953. It seems a poor performance in distinguishing the background, because most of the false alarms have been filtered out and produce a very small test dataset of non-tanks.

Table 2. Confusion matrix obtained from this experiment. The row header indicates actual labels and the column header indicates predicted labels.

4 Conclusion

In this paper, we demonstrate how to detect and classify oil tanks into two types (flat crest and cone-shaped crest) from optical satellite images. The framework of ELSD+clustering is more suitable for candidate selection in oil tank detection compared with Hough transform and selective-search. CNN can efficiently learn features from images and classify them. Experiments show that performance of detection and classification are both outstanding. One of the future works could be furtherly classifying cone-shaped crest tanks into several classes based on the crest texture and shape. Another one is to analyze high level information of oil depots for information analysis, such as depot scale or storage structure of oil.