Keywords

1 Introduction

Long-distance power transmission mainly uses overhead transmission lines and undertakes huge power transmission. The safe operation of overhead power transmission lines has significant influence on the stability of the power grid. With the unprecedentedly rapid development of urban construction, construction operations are widespread throughout rural areas. Consequently, the potential external damages mainly caused by illegal construction operation of engineering vehicles have been a major threat to the safe and stable operation of transmission lines. As the scale of power grids is surprisingly tremendous, the automatic and accurate transmission lines inspection techniques are in high demand. Recently, many image recognition based algorithms have been proposed [3, 16] to deal with this issue. However, these existing techniques still suffers from low accuracy and high costs.

The past few years witnessed the impressive success of of many computer vision tasks by using deep learning techniques. The deep learning based object detection techniques also find its great potential in the automatic inspection of transmission line, including the detection and localization of engineering vehicles. Current object detection algorithm can be roughly divided into two major categories, i.e., one-stage detectors [2, 10,11,12] and two-stage detectors [1, 4,5,6, 8, 13]. One-stage detectors are applied over a regular, dense sampling of possible object locations. Although one-stage detectors are faster and simpler, they have trailed the accuracy of two-stage detectors thus far. In the two-stage object detectors, the first stage generates a set of candidate object locations and the second stage classifies each candidate location as one of the foreground classes or as the background. Motivated by the fact that the state-of-the-art Faster R-CNN framework of two-stage object detectors can be efficiently trained on large-scale datasets and achieve top accuracy, we adopt Faster R-CNN framework to detect external damage risks of transmission lines. External damage risk detection of transmission lines often suffers from complex background and small-sized objects. Moreover, the detection accuracy is sensitive to the variations of illumination and viewing angle. In practical applications, many false negatives may be produced, which stands a big obstacle for our task.

The main cause of false negatives can be boiled down to data imbalance and fewer hard examples in our task. Although the model of Faster R-CNN is elegant, the challenge of learning from data remains an open problem. In Fast R-CNN and Faster R-CNN, it is difficult to find a balance between foreground and background, even though the foreground-to-background ratio (1:3) is fixed at the second classification stage. And the training procedure is dominated by more easily classified examples of random sampling. Therefore, sampling by a fixed foreground-to-background ratio (1:3) is not an effective strategy to tackle data imbalance. Ross Girshick proposed a hard example mining scheme named Online Hard Example Mining (OHEM) [14] to cope with data imbalance. This algorithm integrates bootstrapping technique [15] with region-based detectors, which can be effortlessly implemented on most of the region-based detectors. However, OHEM algorithm only increases the weight of hard classified examples, whilst ignoring easily classified examples. To address this issue, we propose an variant of OHEM named Enhanced Online Hard Example Mining (E-OHEM) algorithm to overcome the imbalance of hard classified examples and easily classified examples with region-based detectors. On the PASCAL VOC2007 dataset, the mAP of Fast R-CNN with E-OHEM outperforms Fast R-CNN with OHEM by 0.4%. Moreover, it boosts the performance of Faste R-CNN and surpasses that of Faster R-CNN OHEM algorithm by 0.6%.

We detect and locate engineering vehicles based on the Faster R-CNN framework, such as excavators, cement tankers, cement pump trucks, scoops, tower cranes, cranes, bulldozers and engineering cars. Additionally, the E-OHEM algorithm and fine-tuned model are used to achieve the mAP value of 73.2%.

2 Related Work

2.1 The Framework of Transmission Lines External Damage Risk Detection System

As shown in Fig. 1, we use the Faster R-CNN framework as the basic framework for external damage risk detection of transmission lines. At first, we make our datasets and mark the samples of engineering vehicles. Secondly, we input the samples into the RPN network until the network converges. Then, we extract the bounding boxes from the trained binary class detection model by RPN network, and train Fast R-CNN with E-OHEM algorithm until the network converges. Finally, the trained model is used to classify the engineering vehicles.

Fig. 1.
figure 1

The framework of transmission lines external damage risk detection

Fig. 2.
figure 2

Selecting negative examples mechanism in RPN stage

2.2 Sampling Heuristics

Balancing Foreground and Background RoIs in Mini-batch Sampling. Using stochastic gradient descent (SGD), Fast R-CNN and Faster R-CNN are trained. Besides, SGD mini-batches are created to share convolution network computation between RoIs. For each mini-batch, N images are first sampled from the dataset, and then B / N (B = 128, N = 2) RoIs are sampled from each image. Each RoI, which will be labeled as foreground or background according to its intersection over union (IoU), overlaps with a ground-truth bounding box. Figure 3 shows the algorithm structure of Faster R-CNN. To handle the data imbalance, [4] proposed heuristics to fix the foreground-to-background ratio in each mini-batch to 1:3. Consequently, it ensures that 25% of a mini-batch is foreground.

Fig. 3.
figure 3

Architecture of the Faster R-CNN algorithm

Online Hard Examples Mining. However, most of randomly-selected samples for training are easily classified in mini-batch sampling. OHEM can not only more efficiently learn hard examples but also remove the need for several heuristics and hyperparameters. As shown in Fig. 4, for inputted images, we first compute a convolution feature map using the convolution network. Furthermore, the RoI network uses this feature map and all the input RoIs (R), instead of a sampled mini-batch, to do a forward pass. Besides, we calculate the loss of each example and apply non-maximum suppression (NMS). Hard examples are selected by sorting the input RoIs by loss and taking the B / N (B = 128, N = 2) examples for which the current network performs worst.

Fig. 4.
figure 4

Architecture of Faster R-CNN using OHEM algorithm

3 Model Design

In this section, we discuss that online hard example mining algorithm focuses on enough hard examples. We will show that our approach results in better training and higher average precision. Firstly, we discuss the design motivation. Then, we present the design and implementation of Enhance Online Hard Example Mining algorithm (E-OHEM). Finally, a method is proposed to fine-tune the Faster R-CNN model.

3.1 Motivation

In Sect. 2.2, we introduce the algorithm of a fixed foreground-to-background ratio (1:3) in mini-batch sampling. The majority of selected examples are easily classified in mini-batch sampling. Besides, the OHEM algorithm is proposed to select more hard examples. Although the OHEM algorithm can achieve high detection accuracy, the problem of data imbalance still exists. The data imbalance indicates that OHEM does not consider the ratio of positive and negative examples and completely ignores the easily classified examples. Additionally, the experiment [9] found that the ratio (1:3) of positive and negative examples is limited and the recognition accuracy is reduced. Thus, the important part of OHEM data imbalance is that the easily classified examples are discarded completely. Moreover, we propose a method to increase the weight of easily classified examples. In fact, most of examples by a fixed foreground-to-background ratio (1:3) in mini-batch sampling are easily classified. In the proposed study, we add the sampled mini-batch based on OHEM to increase the weight of easily classified examples.

3.2 Enhance Online Hard Example Mining Algorithm

We propose an effective enhance online hard example mining algorithm for training Fast R-CNN and Faster R-CNN so as to overcome the problem of ignoring easily classified examples to OHEM algorithm. E-OHEM algorithm is the fusion of mini-batch sampling by balancing foreground and background RoIs and OHEM algorithm, which can be effortlessly implemented on most of the region-based detectors. The architecture of Faster R-CNN using E-OHEM algorithm is shown in Fig. 5. The Enhance Online Hard Example Mining algorithm (E-OHEM) proceeds as follows.

Calculation of forward network in the VGG16 network, the ROI pooling layer uses the con5_3 convolution feature map and all inputted RoIs (about 2000 RoIs) to calculate the forward network through the fully-connected layer. Each RoI calculates the loss value and non-maximum suppression (NMS) works by iteratively selecting the RoI with the highest loss, and then removing all lower-loss RoIs that possess high overlap with the selected region. B / N (B = 128, N = 2) hard examples are chosen by sorting the RoIs by loss.

RoIs Gradient Updating. Setting foreground and background samples for all inputted RoIs (about 2000 RoIs) by the IoU threshold. B / N (B = 128, N = 2) easily classified examples (\(R_{sel}\)) are chosen by 1:3 ratio of foreground and background samples sampling. Finally, \(2*B/N\) (B = 128, N = 2) examples (\(R_{merge-sel}\)) of hard examples and easily classified examples are used for gradient updating and the specific process can be found in Algorithm 1.

Fig. 5.
figure 5

Architecture of Faster R-CNN using E-OHEM algorithm

figure a

3.3 The Fine-Tuning Model Method

In the current study, the fine-tuning model method is proposed to overcome false positives of Faster R-CNN model. As shown in Fig. 2, the specific algorithm proceeds as follows.

Selecting Bounding Boxes. Firstly, we use the trained model to test the training set and set the threshold value of the bounding box as 0.05. Then, we perform non-maximum suppression (NMS) on the bounding box (the threshold value is set as 0.3). Finally, each image selection does not exceed 2000 bounding boxes.

Saving False Positives. We determine whether the bounding box obtained from Sect. 3.3 is false positive by calculating intersection over union (IoU) overlaps with all ground-truth bounding boxes of each image. the bounding box that has an IoU overlap lower than 0.5 is a false positive, and saved in the text.

Training Parameter Settings. According to Faster R-CNN alternate optimization training, the trained model of Faster R-CNN is employed as a pre-training model. Both the RPN and Fast R-CNN stage initial learning rates are set to 0.0001 by dropped an order of magnitude.

Selection Mechanism of Negative Samples in Training. In original RPN stage, Faster R-CNN algorithm selects samples according to the IoU overlap of anchor boxes and ground-truth bounding boxes. The anchor box with IoU overlap lower than 0.3 is taken as negative sample (\(Neg_{sel}\)) (box (b) in Fig. 2) and the anchor box with IoU overlap higher than 0.7 is taken as positive sample (box (c) in Fig. 2). In the fine-tuning stage, we propose the negative samples selection mechanism for overcoming false positives. It can be described as follows:

$$\begin{aligned} Neg = \alpha Neg_{sel} + \beta Neg_{fp} \end{aligned}$$
(1)

where \(Neg_{fp}\) is the anchor box that has an IoU overlap higher than 0.7 with any false positives box. Neg is the total negative samples. As displayed in Fig. 2, We set the different weights of \(Neg_{sel}\) and \(Neg_{fp}\). In the experiment, We can get higher recognition accuracy by \(\alpha :\beta \) as 3:1.

Fig. 6.
figure 6

Comparing training loss for E-OHEM and OHEM in Fast R-CNN

Fig. 7.
figure 7

Precision-Recall curve comparison in testing sets of external damage risk detection

4 Experiments and Results

In this section, we carry out the experiment to evaluate the proposed E-OHEM and compare with OHEM. Furthermore, we evaluate the fine-tuning model experiment and analyze the experiment result of external damage risk detection. We describe the experimental setup in Sect. 4.1, and subsequently demonstrate the efficiency and accuracy of the E-OHEM algorithm by examining the training loss and average precision in Sect. 4.2. In addition, we show the experimental results of the fine-tuning model in PASCAL VOC2007. Finally, we analyze and compare the experiment result of external damage risk detection.

4.1 Experimental Setup

In this paper, we not only use the VGG16 convolutional neural network structure in Caffe [7] framework but also evaluate the performance of the algorithm in PASCAL VOC2007. In the PASCAL VOC2007 experiment, training is conducted on the trainval sets and tested on the test set. All models are trained with stochastic gradient descent (SGD) with an initial learning rate of 0.001. We use gradient accumulation with \(N=2\) forward-backward passes of single image mini-batches and \(B=128\). Moreover, our method does not exploit many popular improvement, such as multi-scale training, multi-scale testing, stronger data augmentation [10], etc. In the Fast R-CNN experiment, the model is trained for 40K iterations. Additionally, the learning rate is dropped in “steps” by a factor of 0.1 every 30K iterations. In the Faster R-CNN experiment, we employ alternate optimization training method. The RPN stage iterates 80K, and the learning rate is dropped in “steps” by a factor of 0.1 every 60K iterations. Fast R-CNN stage iterates 40K and the learning rate is dropped in “steps” by a factor of 0.1 every 30K iterations.

Table 1. VOC 2007 test detection average precision (%). All methods use VGG16 and bounding-box regression

4.2 Experimental Results and Analysis

Training Convergence. In the Fast R-CNN experiment, we compared the training loss for E-OHEM and OHEM algorithms. Figure 6 shows the training loss curve for OHEM and E-OHEM algorithm in Fast R-CNN based on VGG16 network. It can be concluded that the training loss of E-OHEM is smoother and easier to converge than that of OHEM algorithm. Besides, it is verified that the E-OHEM algorithm gets a lower loss value than the OHEM algorithm.

PASCAL VOC2007 Experimental Results Analysis. In the current work, we compare E-OHEM and OHEM algorithms for Fast R-CNN and Faster R-CNN based on VGG16 network in the PASCAL VOC2007. As shown in Table 1, on VOC2007, when the IoU threshold is 0.5, E-OHEM improves the mAP of OHEM from 71.5% to 71.9% in the Fast R-CNN. The Faster R-CNN uses the E-OHEM algorithm to obtain the mAP of 71.3% and it is increased by 0.6% than OHEM algorithm. According to the data, the E-OHEM algorithm effectively solves the problem of data imbalance for OHEM, which not only lays emphasis on misclassified examples but also focuses on easily classified examples.

The Fine-Tuning Model Experiment. In the fine-tuning model stage, we use the trained model as a pre-training model for Faster R-CNN based on VGG16 network. In addition, the initial learning rate is set to 0.0001. As shown in Table 2, on VOC2007, the fine-tuning model method improves the mAP from 69.9% to 70.5% for an IoU threshold of 0.5 in the Faster R-CNN. Interestingly, the fine-tuning model method performs quite well in overcoming false positives of Faster R-CNN model.

Table 2. VOC2007 test detection average precision (%) of fine-tuning model
Table 3. The false positives rate of negative examples detection
Table 4. Faster R-CNN uses different algorithms to compare the detection accuracy of external damage risk detection datasets

4.3 Analysis of the Experiment Result of External Damage Risk Detection

In this study, we convert Faster R-CNN detection into binary class object detection (engineering vehicles and background). The engineering vehicles include eight categories, respectively, excavators, cement tankers, cement pump trucks, scraper trucks, tower cranes, cranes, bulldozers and engineering cars. We make the dataset of 14542 images and it includes data augmentation in external damage risk detection. Data augmentation methods have rotation transformation and examples superposition in this study. We use 7300 images to form the training set and the remaining 7242 images to conduct the test. In the external damage risk detection experiment, the initial learning rate is set to 0.001. The RPN stage iterates 120K, and the learning rate is dropped in “steps” by a factor of 0.1 every 90K iterations. Fast R-CNN stage iterates 60K and the learning rate is dropped in “steps” by a factor of 0.1 every 45K iterations. The evaluation result on the datasets of external damage risk detection is presented in Table 4. Based on VGG16 network, the E-OHEM algorithm and the fine-tuning model method can get the mAP of 73.2% in the Faster R-CNN. As displayed in Fig. 7, external damage risk detection compares the OHEM and E-OHEM algorithm by plotting Precision-Recall curves in testing sets with the Faster R-CNN of VGG16 network.

Fig. 8.
figure 8

Positive and negative examples superposition

Fig. 9.
figure 9

The false positive examples of external damage risk detection

Fig. 10.
figure 10

The results of external damage risk detection

Practically, the object detection algorithm of deep learning has a common problem. That is to say, there are many false positives. Figure 9 shows the false positives of external damage risk detection. The main reason is that object detection of deep learning selects negative samples around positive samples of each image and learns less negative samples. We adopt two methods to deal with the problem of false positives in the practical application.

  • In order to learn more negative samples, as shown in Fig. 8, we make datasets by adding samples of positive and negative examples superposition.

  • External damage risk detection is trained by directly adding the negative examples. For the RPN stage, if it is a negative example, we randomly select 256 anchor boxes as negative samples in all anchor boxes. If there are any negative examples in Fast R-CNN stage, each image randomly selects B / N RoIs and the label is set to background.

Through using the above two methods, the false positive of external damage risk detection is significantly reduced to meet the practical need. As shown in Table 3, we test false positives rate (the ratio of the number of false positives and the number of total negative examples) in the 42,380 negative examples. The false positive rate of the image dropped to 0.29%. Figure 10 presents the results of external damage risk detection.

5 Conclusion

In this study, E-OHEM algorithm and fine-tuned model are applied to the Faster R-CNN framework to yield higher detection accuracy in the external damage risk detection of transmission lines.

  • Compared with OHEM algorithm, E-OHEM algorithm can be trained more easily and converge faster in the Fast R-CNN and Faster R-CNN framework. Additionally, E-OHEM algorithm makes a remedy for the weakness that OHEM ignores the easily classified examples.

  • Under the framework of Faster R-CNN, E-OHEM algorithm not only greatly reduces the number of false negatives but also achieves high recognition accuracy in the external damage risk detection.

  • False positives are inevitably produced by the deep learning based object detection algorithms. Thus, the negative example learning is incorporated during the training phase, significantly reducing the number of false positives. Furthermore, we fine-tune the model to achieve higher recognition accuracy to meet the practical application detection requirements.

The major shortage of the proposed algorithm is that it is unable to precisely identify and detect small objects in some remote scenes. We will focus on this problem in our future research.