Keywords

1 Introduction

As a kind of green energy, photovoltaic power generation has a very promising development and significant economic benefits. How to detect the fault area of solar panels efficiently and accurately is of great significance for photovoltaic power plants. During recent years, with the application of thermal imaging technology, the fault detection of large area solar panels has become possible, the UAV equipped with infrared thermal imager inspects the solar panel group overhead, getting infrared thermal imaging of the photovoltaic plate area. The scene of heat balance, transmission distance, atmospheric attenuation, instrument imaging system and other factors will result in low-resolution images for human beings eyes, advent of edge blur. The clarity is even lower than that of the visible light image. SNR (signal-to-noise ratio) is often low when the image component distribution is complex [6]. In the area extraction of an infrared thermal image, the interference caused by the environmental factors in the image composition should not be ignored. There are a variety of interference factors, among which the instrument box, the hot pipe connected with the lower part of the plate area, the bare steel pipe bracket of photovoltaic panel are the typical ones. At the same time, the influence of the weather plays a role in the infrared thermal imaging figure, for example, cloudy shooting thermal imaging makes a little temperature difference between the plate and the environment come into being.

Therefore, how to make better use of the only image information in the image, to make it better for segmentation and extraction, is the problem we should study. The extraction of the photovoltaic panel is the key step for the thermal images segmentation, whose result will directly affect the later fault detection. Therefore, it is of great importance to extract the solar panels correctly and remove the environmental interference. Infrared thermal image ROI extraction method can be divided into two types, the digital image processing method and statistical based method. From the image processing method perspective, it can be classified as methods from threshold [17], textural analysis [13] and energy-based method [2, 7] to infrared background clutter suppression filter [14, 15].

However, these methods have special requirements for the image, when the SNR of part of the region is lower, or when the gray levels gap between the target area and interference region is not so big, it often makes the hybridity results of target area and interference region mix [18]. Besides, a region is required to be extracted for further classification. Recently, image extraction has been developed based on statistical methods, such as fuzzy C-means (FCM) clustering image extraction method [10, 12]. Document [16] applies optimized Pulse-Coupled Neural Network (PCNN) into the image segmentation. However, when it comes to practical applications, the statistical method needs to pre-mark the image data manually. Besides, the built model needs a great number of labor experiments and image data, and continuous analyses to increase the effectiveness of model implementation. The introduction of statistical methods to extract ROI of images is obviously very difficult and time-consuming even for the processing of a small amount of dataset.

Therefore, an infrared thermal image ROI extraction method based on GLCM feature [1, 3] imitation gradient based on the characteristics of infrared image data sets is proposed in this paper. The gray level gap, edge change and the texture structure of the image are used to construct a feature model. Afterward, the feature model will be used to extract the image ROI. The method is used to test 17 photovoltaic panels thermal images for multiple times in this paper. The extracted mean precision and mean recall value are relatively high according to the results.

2 Method

2.1 The Proposed Method

In this paper, the algorithm is used to extract the region of the thermal image of the photovoltaic panel through three main steps. The first step is to extract the feature images, and then obtain the “contrast” and “entropy” feature images of the original image by calculating GLCM. Then obtain the gradient image by calculating original image after cross-over process. In the second step, the imitate gradient images are achieved by weighting the three feature images. The third step is to fill the imitate gradient image and complete the extraction of the ROI region.

Fig. 1.
figure 1

The diagram of the proposed method is illustrated in Fig. 1.

2.2 Extraction of the Three Feature Image

Extraction of the Gradient Image. As the imaging range of the commercial hand-held image is small so that the local temperature is easy to become higher. This paper uses the method of weakening the influence of image brightness by linear covering operator and the covering operator is as follows:

$$\begin{aligned} f_{2} (x, y)=(1-\mu )f_{0} (x, y)+\mu f_{1} (x, y) \end{aligned}$$
(1)

\((1-\mu )\) is the attenuation factor which weekend the impact of the original image \(f_{0} (x, y)\). Supposing the all black image is \(f_{1} (x, y)\), change the \(\mu \) from 0 to 1, and then get the cross-dissolve image \(f_{2} (x, y)\), which reduced the influence of the brightness factor. Change RGB mode to the HSI mode to get the I component image \(I_{2} (x, y)\) of \(f_{2} (x, y)\).

The method to calculate the difference between the horizontal and vertical component is used to solve the gradient. In this formula: \(I_{2} (x, y)\) represents a pixel in the \(f_{2} (x, y)\) and the gradient image g(xy) will be got after calculation as per the following method.

$$\begin{aligned}&I_{2} (x, y)=0.299\times R_{f_{2}} +0.587\times G_{f_{2}} +0.114\times B_{f_{2}}\end{aligned}$$
(2)
$$\begin{aligned}&g(x, y)=|I_{2} (x, y)-I_{2} (x+1, y)|+|I_{2} (x, y)-I_{2} (x, y\mathrm{+}1)| \end{aligned}$$
(3)

Extraction of Contrast Feature Image. The experiment shows that the infrared thermal images of R channel mainly represents the contour feature and that of G channel represents the high temperature and low temperature features of the thermal images. Merger of the R and G channels images can concurrently represent the contour of and the region temperature of the image.

We combine the R and G channel images, and conduct lineal weighing to make up new components, and then change the merged two channels to be \(I_{3} (x, y)\) by Eq. (26). Therein, \(f_{0} (x, y)\) is the original input image. And \(R_{f_{0}}, G_{f_{0}} \) refer to the R channel image and G channel image. And \(f_{0} (x, y)\) is the original input image.

$$\begin{aligned} I_{3} (x, y)=0.299\times R_{f_{0}} +0.587\times G_{f_{0}} \end{aligned}$$
(4)

In the image, if the pixels which deviate from the diagonal have the larger value, namely, the brightness value of image varies quickly, the contrast will have a larger value, which reflects the clarity of the image and depth of texture. The deeper the texture is, the larger the contrast value will be [11]. In turn, it is smaller. In an \(M\times N\) image, \(S_{xy} \) is set to represent the image window with distance of s from top to bottom which is in the center point (xy). In the slipping pixel block area \(S_{xy}\), P(ij) represents the probability that the point with the gray level i leaves away from a particular fixed position relation d (which is the number of the pixels with intervals in the space, recorded as \({g}_{ij}\), \(\theta \) is direction to the point with the gray j level. The contrast value con(xy) of this point in the sliding pixel block area can be obtained in the fixed distance d and fixed direction \(\theta \).

$$\begin{aligned}&P(i, j|d, \theta )=\frac{g_{ij}}{M\times N}~(i, j\in S_{xy}) \end{aligned}$$
(5)
$$\begin{aligned}&con(x, y)=(i-j)^{2}P^{2}(i, j|d, \theta ) \end{aligned}$$
(6)

The con(xy) of the four different directions is estimated and its mean value \(\overline{con} (x, y)\) is estimated which can be regarded as the GLCM of the point \(S_{xy} \) in the region (xy). We calculate \(\overline{con} (x, y)\) of the mean contrast value of a pixel and integrate them into a matrix to get \(c_{1} (x, y)\). The threshold \(K_{0} \) is selected to perform banalization for \(c_{1} (x, y)\) and then we can get the contrast feature image \(c_{2} (x, y)\) of the image.

$$\begin{aligned} \overline{con} (x, y)=\frac{1}{4}\sum \limits _\theta (i-j)^{2}P^{2}&(i, j|d, \theta )~(i, j\in S_{xy}, \theta =0^{\circ }, 45^{\circ }, 90^{\circ }, 135^{\circ }) \end{aligned}$$
(7)
$$\begin{aligned} c_{1}&(x, y)=\overline{con} (x, y) \end{aligned}$$
(8)
$$\begin{aligned} c_{2} (x, y)&={\left\{ \begin{array}{ll} 0, &{}otherwise \\ L_{\max } , &{}K_{0} <\overline{con} (x, y) \\ \end{array}\right. } \end{aligned}$$
(9)

Extraction of Entropy Feature Image. The entropy reflects the heterogeneity and complexity of the texture in the image, which is the measurement of the degree confusion in the image and the entropy is the measurement of the information possessed by the image. If the image has no any texture, and entropy will be close to 0. If the image has many fine textures, entropy value of the image will be very large; otherwise the image has less texture, the image entropy is small [13]. Similar to the process of estimation of the contrast feature, The representation form of the entropy feature is obtained as ent(xy) when the pixel in the region \(S_{xy} \), where \(\theta \) is the four directions. The mean value is estimated by calculating the entropy feature values ent(xy) in the four directions respectively to get a mean value \(\overline{ent} (x, y)\). And the entropy feature image \(e_{1} (x, y)\) of the whole image can be obtained after the completion of the image coverage of each small block.

$$\begin{aligned} ent(x, y)=-P(i, j|d, \theta )&\log _{c} P(i, j|d, \theta ) \end{aligned}$$
(10)
$$\begin{aligned} \overline{ent} (x, y)=-\frac{1}{4}\sum \limits _\theta [P(i, j|d, \theta )\log _{c}P(i, j|&d, \theta )] (\theta =0^{\circ }, 45^{\circ }, 90^{\circ }, 135^{\circ }) \end{aligned}$$
(11)
$$\begin{aligned} e_{1} (x, y)=\overline{ent}&(x, y) \end{aligned}$$
(12)

A threshold is selected to carry out the banalization for the entropy feature image \(e_{1} (x, y)\), and the gray levels of the pixels being layered assign to 0 and \(L_{\max } \) respectively. For the convenience of following superposition, we take the operation value of entropy and then get the entropy feature image \(e_{2} (x, y)\).

$$\begin{aligned} e_{2} (x, y)={\left\{ \begin{array}{ll} 0, &{}otherwise \\ L_{\max }, &{}L_{0}<e_{1} (x, y)<0 \\ \end{array}\right. } \end{aligned}$$
(13)
Fig. 2.
figure 2

(a) Three-layer feature image with linear superposition and sequence; (b) imitate gradient image.

2.3 Building of Imitate Gradient Image

After processing contrast feature image by the \(5\times 5\) kernel size median filter, we get \(c_{3} (x, y)\) as the result image by processing with median filter (Fig. 2).

$$\begin{aligned} c_{3} (x, y)=median[c_{2} (x, y)] \end{aligned}$$
(14)

The entropy feature image is carried out through adopting the morphology erosion [8], using ellipse operator and the kernel sized is as \(3\times 3\). The entropy feature image \(e_{3} (x, y)\) which is acquired through adopting the morphological method. Use of the erosion method can decrease the disconnection of boundary of entropy feature image.

$$\begin{aligned} e_{3} (x, y)=e_{2} (x, y)\mathbf {\ominus } B=\{z\left| {(B)_{z} \subseteq e_{2} (x, y)} \right. \} \end{aligned}$$
(15)

After preprocessing the two feature images, then we need to layer marked weight of two images. The layering purpose is to build the gradient images similar with the shape of the ROI region. The entire imitate gradient image is divided into three regions, such as object region (ROI), environment areas and the boundary of environment and panel group. \(g_{\max } (x, y)\) indicates the intensity of overall gray level transformation of the gradient image, namely, the degree of boundary’s obviousness. It can be represented by formula (16).

$$\begin{aligned} g_{\max } (x, y)=\max \{g(x, y)\} \end{aligned}$$
(16)

If the \(g_{\max } (x, y)\) of image with the reduction of brightness is highly valued, it is bespoken that the boundary of the image is significantly obvious. To use the value containing \(g_{\max } (x, y)\) to weigh two feature images (contrast feature image \(c_{3} (x, y)\), and entropy feature image \(e_{2} (x, y)\)) is able to get the corresponding weighing coefficient \(\gamma \) and parameters \(\alpha , \beta \), and the formula goes as:

$$\begin{aligned} \gamma =\alpha \times \mathrm{g}_{\max } (x, y)+\beta \end{aligned}$$
(17)

Contrast feature image that is marked major outline, and finished the enhancement of major outline of the gradient image, and sets up barriers for environmental boundary and panel region. The parameters adopted in this experiment are \(\alpha \mathrm{=-}1, \beta =L_{\max } \) and the weighted value is \(\gamma _{\mathrm{idm}}\), balancing indistinct gradient image and clearly gradient image.

$$\begin{aligned}&\gamma _{\mathrm{idm}} =L_{\max } -g_{\max } (x, y) \end{aligned}$$
(18)
$$\begin{aligned}&f_{idm} (x, y)={\left\{ \begin{array}{ll} \gamma _{\mathrm{idm}}, &{}c_{3} (x, y)=L_{\max } \\ 0, &{}otherwise \\ \end{array}\right. } \end{aligned}$$
(19)

The negated entropy feature image is weighed by \(\gamma _{ent}\). The parameters selected in the research are \(\alpha =0.6, \beta =0\). Entropy feature is on the first floor in the simulation of the imitate gradient image.

$$\begin{aligned}&\gamma _{ent} =0.6\times g_{\max } (x, y) \end{aligned}$$
(20)
$$\begin{aligned}&f_{ent} (x, y)={\left\{ \begin{array}{ll} \gamma _{ent}, &{}e_{3} (x, y)=L_{\max } \\ 0,&{} otherwise \\ \end{array}\right. } \end{aligned}$$
(21)

2.4 Region Filling

After using seed point filling the imitate gradient image by GLCM feature, still have miss segmentation due to the existence of holes and gray level rugged.

Fig. 3.
figure 3

The blue imaginary line is indicating the hole on entropy feature image. (Color figure online)

Pre-immersion of Gradient Image and Region Filling Process. The green line is sign for adopting the pre-immersion. The seed points are labelled by blue point, and the region of seed points are labelled by red line (Fig. 3).

For this reason, while selecting the seed point randomly, due to those disconnected regions, the region filling shall not be fulfilled in the targeted region. The pre-immersion is to eliminate the interference due to those disconnected regions, simplify the parameters and elevate the accuracy of region filling.

The immersion shall be carried out on imitate gradient in the lower level. The method adopted is to realize the reversed filling through the threshold segmentation. Such process is named as the pre-immersion, with the purpose of easily setting the parameters of negative difference. The negative difference is valued as \(\tau \), regulated among the interval of \((0<\tau <\gamma _{ent})\), which means this value shall be not more than the filling value of the entropy feature image. The \(\tau \) value shall firstly immerse the imitate gradient image. The image immersed result remarked as \(f_{dst} (x, y)\).

$$\begin{aligned} f_{dst} (x, y)={\left\{ \begin{array}{ll} f(x, y), &{}otherwise \\ \tau , &{}f(x, y)\le \tau \\ \end{array}\right. } \end{aligned}$$
(22)

The region filling [9] is to combine the pixels or the subdomains to become the larger region in accordance with the formation standard if the pre-formation. The method initiates from a group of seed points. The up and low differences are adopted to control the gray scale scope for the seed point filling, and are expressed as \([D_{up}, D_{low}]\).

The seed point and the pixel of its connected region are expressed as (xy), the difference of gray level is set within the range of \([D_{up}, D_{low}]\), the pixel of the connected region is labelled as 1, and the pixel not within such range is labelled as 0. The pixel labelled as 1 shall be added to the seed point set S, until the labelling is ended.

Our method fills the region through selecting the seed point, and such seed point can be selected manually, or can be acquired through fitting the center of maximized inscribed polygon based on the adoption of morphological method for the contract feature image. After the region filling, as the up difference is limited in scope, the holes shall be emerged internally. Hence, the extracted image from the overall region shall be more perfected through by hole filling.

3 Experiment

The image data collected from Jiangsu LINYANG Power Station, locating at the No. 666, LINYANG Road, Qidong Economic Development Area, Jiangsu Province 226200, China, with the coordinate position on the map of (121.639278, 31.817825). The thermal imaginary instrument adopted here is modeled as DM63 series thermal infrared image, which is researched and developed by Zhejiang Dali Technology Co., Ltd... The time of image acquisition was from Jul. to Aug. 2016. The image was shot in the morning, high noon and evening respectively. The weather while shooting the image was conditioned as sunny and cloudy weather. The pixel of the adopted thermal imager instrument is \(320\times 240\).

Our method compared with the “grab cut” method. And grab cut is a high-efficient segmentation algorithm based on foreground and background, which comprehensively uses two types of information of texture and boundary to perform the image segmentation. The algorithm of grab cut is set forth in the paper [2]. While in the method proposed in this paper acquiring the gradient image, the attenuation factor \(1-\mu \) is adopted as 0.4. The edge of the image has white laces because of the imaging. So three pixels should be excluded from the upper and lower boundary of the image when calculating the \(g_{\max } (x,y)\). The images in the dataset are all sized as \(320\times 240\), in order to conveniently select the threshold value. So in the process of experiment, the \(M\times N\times \overline{con} (x,y)\) and \(M\times N\times \overline{ent} (x,y)\) is adopted to threshold the feature image use GLCM method (M and N refer to the number of pixel of the image from horizontal and vertical direction).

The parameters for obtainment of contrast feature image are set as follows:

As for \(S_{xy} \) the distance \(s=5\), the distance between pixels is \(d=4\), the gray level is set as \(L=32\), and the threshold values of contrast feature image are \(K_0 =50\), \(L_{\max } =255\). The parameters for obtainment of entropy feature image are set as follows: the separation distance between upper and lower boundaries is \(s=5\), the distance between pixels is \(d=4\), the gray level is set as \(L=16\), and the threshold values of entropy feature image are \(L_0 =-64\), \(L_{\max } =255\). we choose seeds by manual in Experiment, the number of seeds there are less than three seed points by manual. In the course of region filling, there are less than 3 seed points by manual. In the course of prefill, the selected \(\tau =20\). In the step of region filling, the seed points are selected interactively, the negative difference value \(D_{low} \) is acquired as 20, and the positive difference value \(D_{up} \) is acquired as 15.

Grab cut method which is for comparison adopts two types of seed points One seed point is the foreground labelling, viz. the PV plane (Photovoltaic Panel) region, and the other is background labeling, viz. the non-PV region. The interactive operation is carried out twice.

The quantitative calculation method is adopted in the process of evaluation in order to evaluate the accuracy of different algorithms better. The image extracted by manual operation in region, as benchmark image, compares with the results extracted by two interactive methods (our method and the grab cut method). The 17 images are carried out the ROI extraction test, and each image operates 10 times. The selected evaluation standards include: precision, P, R, F [4], \(F_\alpha \) (confidence \(\alpha =1\)) the F measure is used to value between precision and recall, the results of F measure value more closely to 1, the more accurate are the results obtained by the algorithm [5], and Jaccard, J [5] for the evaluation. Being as a statistical magnitude, If J value is more approaching to 1, shows that the segmentation results more closely to the standard [4].

$$\begin{aligned} P=\frac{S_1 }{S_2} \end{aligned}$$
(23)
$$\begin{aligned} R=\frac{S_1 }{S_3 } \end{aligned}$$
(24)
$$\begin{aligned} F_\alpha =\frac{(1+\alpha ^2)PR}{(\alpha ^2P+R)},(\alpha =1) \end{aligned}$$
(25)
$$\begin{aligned} J(S_1 ,S_3 )=\frac{\vert S_1 \cap S_3 \vert }{\vert S_1 \cap S_3 \vert } \end{aligned}$$
(26)

The set \(S_1 \) is perceived as the correct pixel point acquired through the algorithm (including the boundary point), the set \(S_2 \) refers to regional pixel points extracted by test method, and the set \(S_3 \) is the pixel set from the region extracted by manual operation. The operator \(\vert \cdot \vert \) therein refers to the totality of pixel in the region.

4 Result

Figure 4 shows the comparison of the ROI extraction results between the algorithm presented in this paper and the “grab cut” method. The first set represents the original image, and the second set stands for the area extracted through the algorithm presented in the paper, while the third set shows the image area extracted through the comparison method of “grab cut”. Five columns of images from left to right are featured by five typical forms respectively in accordance with the probabilities of occurrence. The images in the column are most representative among the dataset, in which huge thermal differences between areas of the photovoltaic panel and external environment can be found and both of them possess abundant textures. Compared with the grab cut algorithm, the algorithm proposed in this paper can produce superior area extraction results. According to the images in the column, it can be seen that when the target area is heavily blended with the environment, such as the pipe existing below the plate zone in the image, it is excluded from the target photovoltaic panel areas. This can easily cause the error of judgment for images at later phases (misjudgments of faults in non ROI). However, the method in this paper can accurately and integrally extract the ROI and demonstrate a better boundary integrity exacted from the areas. The images in the column represent another sort of images in the dataset, which has a large proportion of target areas and abundant texture details. Under such circumstance, extracted area images could be fragmented when the threshold segmentation method is applied. Fortunately, the method this paper puts forward can achieve a relatively higher recall ratio with respect to the area extraction of images. The images in the column refer to the target areas of images in the dataset, which possess complex textures and obvious color differences, leading up to the large information in the image contents. Although the algorithm in this paper has also accurately completed the target area extraction of the photovoltaic panel, there are still some drawbacks concerning the discrete boundaries. The column demonstrates a relatively small thermal difference between ROI and non ROI in the image set of thermal imaging. In addition, the boundary is rather blurry. In the case when ROI areas possess fewer textures, the imaging weather photographed under the UAV inspection might be cloudy. The algorithm presented in this paper transcends the “grab cut” method in the boundary accuracy of area extraction, as well as the recall value.

Fig. 4.
figure 4

Comparison of extraction results between the method in this paper and the grab cut method (Color figure online)

Figure 5 shows the two discussions of the combination about feature image faced with two conditions. The first row corresponds to the original image, the second row corresponds to the contrasted feature image, the third row corresponds to the entropy feature image, and the fourth row corresponds to the regional extraction result. There are three major combinations: I. It can be seen from the Fig. 5b and f that the ridge of the contrasted feature image is discontinuous, like Fig. 5b, the edge of its lower part of the plate area is discontinuous, and the area Fig. 5c marked with the entropy feature can make up for these disconnections in the filling area. II. When the marked entropy feature image is sparse and scattered Fig. 5g, combining with the ridge built in the contrasted feature image and the pre-immersion process of the next step, to mutually complement and optimize the extracted result (Fig. 5h). The two feature images characteristics can complement each other on the defects of feature extraction, and they both can get more excellent extracted results. III. The two feature images have defects: the edge of the contrasted feature image is discontinuous, and the entropy feature image has many closed discrete regions, which will make the ridge incomplete during the area-filling and lead to under-segmentation (as shown in the second column of row d of Fig. 4). It needs to be improved in the latter period. The two feature images are complementary to each other. And also the automatic pre-immersion will improve the robustness of algorithm proposed in this paper.

Fig. 5.
figure 5

Extraction of contrast feature image and entropy feature image

Fig. 6.
figure 6

Imitated gradient image after weighting, (a) imitated gradient image of original image 7; (b) imitated gradient image of original Image 17

Figure 6 shows that the boundary of the ROI is continuously, region inside is plain and gray-level variance relatively small. While interference region has small but numerous holes, average gray-level of it is lower than object region and gray-level variance is larger. So the method builds a model that uses three different gray-level to make object regions, boundary and interference regions become layers. And its the base for the following region filling process.

Table 1. Comparison of performance evaluation index between the method in this paper and grab cut

Table 1 is the comparison between our method and Grab Cut algorithm on ROI extraction of an infrared thermal image. The algorithm in this paper is better than the grab cut algorithm in the Recall value and J-index (contrasted with the standard value 1). There are a higher Precision P and a lower Recall value of Grab Cut method in the test of this dataset, which indicates that the grab cut method is under-segmentation which universally exists in this data set. Refer to the paper [19]. Use of Otsu method is a conventional method for regional extraction of photovoltaic panels in thermal imaging. We also used the Otsu method to test the data of this paper, the average Recall of 17 images we got is 69.02%, the average Precision is 93.16%, F-index is 0.7773, and J-index is 0.6670. Therefore, it is aware that the automatic image thresholding is ineffective for the regional extraction of this data set, and the Otsu method applies to the regional ex-traction of thermal imaging photovoltaic panels in desert areas. Therefore, the primary object of comparison in this paper is: grab cut method, which is also a regional extraction method based on seed.

Based on the calculation of the mean precision value, mean recall value, J index average and F test average, one-sided t-test is carried out on these performance indexes in paired, with results shown in Table 2.

Table 2. One-sided t-test on performance indexes of method here and grab cut

From the statistical result of the one-sided t-test, it can be seen that p is less than 0.05, which means it has statistical significance. For grab cut algorithm, it uses two types of seed points for testing the performance of non-ROI and ROI respectively, which thus increases the workload and calculation amount. Based on the performance index of the two algorithms in Table 1, it shows that the algorithm has a few of advantages over dividing the data set of the thermal imaging solar panels. The performance indexes R, F and J have statistical significance.

5 Conclusions

In order to enhance the accuracy in the fault extraction in the thermal imaging areas of photovoltaic panel, this paper puts forward a method to extract the regional thermal images based on the contrasted feature and entropy feature of GLCM algorithm. The novelty of this algorithm is that we combined the characteristics of the thermal image and built a model of the target area to carry out extraction of ROI area. The experiment results have demonstrated that the algorithm proposed in this paper is advantageous in area extraction performances, with a relatively high precision value and recall value, which can adapt with the area extraction of the infrared thermal image. Meanwhile, a comparison study has been conducted with the grab cut method. Less manual interference features in our method. Additionally, the operation and calculation of algorithm here are much easier. Further study will increase the accuracy and robustness of the algorithm under circumstances of a low signal-to-noise ratio and a complex interference composition.