1 Introduction

Grassland resource is an important component of agriculture and animal husbandry, which has a significant impact on the local ecological environment. In recent decades, the grassland in Three-River Headwaters Region has been in a long state of degradation. Compared with the 1950s, the yield per unit area has decreased by 30%–50%, the proportion of high-quality forage grass decreased by 20%–30%, and the proportion of toxic and harmful weeds increased by 70%–80%. The vegetation coverage of grassland decreased by 15%–25%, the height of dominant pasture decreased by 30%–50%, and the height of grass decreased by more than 20% [1, 2].

From an ecological perspective, grassland species have been classified into categories (native plant species and noxious weeds) based on their grazing value and changes in their relative abundance in the presence or absence of grazing. The identification and mapping of native plant species and noxious weeds coverage is an important work for grassland monitoring. The native plant species (mainly Kobresias) are harmless and eatable for livestock while the noxious weeds (mainly Compositaes, Labiataes and Gentianideas) are uneatable or even poisonous that makes it not suitable for grazing. The degradation of grassland is not only the change of productivity, but also the change of population structure, such as the decrease of native plant species and the increase of the proportion of noxious weeds. Mapping the native plant species and noxious weeds can help direct resource managers to critical areas in need of conservation measures.

Mapping the spatial distribution of grass using traditional methods is a complex work and requires intensive field work, including the identification of species characteristics and the visual estimation of species percentage, all of which are costly and time-consuming and are sometimes impossible to accomplish due to poor accessibility [3]. With the development of sensor and image processing technology, remote sensing technology is widely used in grassland degradation monitoring.

Multi-spectral remote sensing has made great progress in the recognition of grassland vegetation for the advantages of high resolution, high quality and easy acquisition of data. Friedl et al. [4] used TM data and ground observation data to calculate many vegetation indexes and used the most accurate vegetation index to estimate the biomass change of grassland. Hostert et al. [5] used TM data and MSS data to monitor vegetation coverage in Crete, Greece, using linear spectral method. The results showed that it was feasible to monitor vegetation coverage in this region with Landsat image data. Elmore et al. [6] used 1991–1996 TM image data and normalized vegetation index (NDVI) data to monitor vegetation coverage in the Irvine valley, California, using a spectral hybrid analysis method. These studies show that multispectral data have great advantages in vegetation coverage and classification recognition.

With the development of high-resolution remote sensing images (WorldView-2 for example), the classification of grassland vegetation using high-resolution images has become a research hotspot at home and abroad. Meyer et al. [7] pointed out that when studying vegetation coverage on the Tibetan plateau, it is very effective to combine hyperspectral and multispectral methods and then use machine learning method to calculate vegetation coverage. Wiesmair et al. [8] used Worldview2 high-resolution images to calculate the NDVI and MXAVI2 indexes, and used the random forest algorithm to calculate the FVC of the Russian Georgian region. Santos et al. [9] pointed out that the four bands added to the high-resolution Worldview2 data contributed a lot to the research on urban vegetation extraction. From the perspective of grassland identification methods, the current research contents of grassland classification and identification are mostly the analysis and comparison of spectral reflection features [10], and the identification research also stays on the establishment of grassland classification decision tree based on the simple analysis of spectral features [11].

Machine learning method (RF for example) has been widely used to identify species. Ham et al. [12] used hyperspectral remote sensing data to perform random forest classification by setting binary hierarchy structure in the multi-classifier system under the limited training data of hyperspectral data set. Wilschut et al. [13] defined different landscape units based on the distribution of rat holes, and used random forest to establish scene and classify images. RF has several advantages compared with other conventional classification trees, such as being able to provide better performance, having reasonable accuracies, and being relatively easy to implement, as well as its capability in ranking important prediction variables [14, 15].

Efforts have been made in grassland recognition and mapping using high resolution image and random forest algorithm. This high-resolution, multi-spectral and random forest algorithm has been proved to be effective. However, there are few studies on the identification of native plant species and noxious weeds in the Three-River Headwaters Region located in the hinterland of the Tibetan Plateau. The objective of this study was therefore to investigate the potential of WorldView-2 imagery in identifying the native plant species and noxious weeds and mapping their coverage.

2 Materials and Methods

2.1 Study Area and Mainly Native Plant Species and Noxious Weeds

This study area is located in the Tongtian river reserve of Yushu Tibetan autonomous prefecture, covering part of the southeast of Zhiduo county of Qinghai, China. The area was selected from the of Three-rivers Headwaters region, with a typical alpine and cold plateau continental climate. Annual temperatures average −0.4 °C with an annual rainfall ~394 mm/year. The average altitude of the research area is above 4,000 m, with sufficient sunlight radiation. The cold season lasts nearly 10 months (Feb–Jun, Sep–Dec) with large temperature difference between day and night. About 70%–90% of the study area is covered by vegetation and the alpine meadow is the main grassland type in this region. The location of the research area is shown in Fig. 1.

Fig. 1.
figure 1

Location of study area in the Three-River Headwaters Region of the Tibetan Plateau

In the study area, our field investigation showed that Kobresias (such as K.pygmaea & K.humilis, K.capillifolia & K.tibetica) were the main native plant species, accompanied by a variety of invasive noxious weeds (such as Heteropappus Altaicus (Willd.) Novopokr, Lamiophlomis Rotata, Gentiana Straminea Maxim, Ajania Tenuifolia, Leontopodium Nanum, Morina Kokonorica Hao). Although the native plant species and noxious weeds have been mix-living for years, we could still distinguish the two species from their leaf shape and texture. The leaves of native plant species are always slender and intricately textured, while the leaves of noxious weeds are always broader. Also, the colors of the native plant species in the growing season are light green with a little gray, while the colors of the noxious weeds are mainly dark green, milky white and yellow. The pictures of major native plant species and noxious weeds were shown in Fig. 2.

Fig. 2.
figure 2

Native plant species and noxious weeds in the study area (a) K. Pygmaea & K. Humilis (b) K. Capillifolia & K. Tibetica (c) Heteropappus Altaicus (Willd.) Novopokr. (d) Lamiophlomis Rotate (e) Gentiana Straminea Maxim (f) Ajania Tenuifolia (g) Leontopodium Nanum (h) Morina Kokonorica Hao (Color figure online)

2.2 Imagery Acquisition and Processing

As mentioned before, the study area is located in the hinterland of the Tibetan plateau, cold and dry. The growing season of surface vegetation lasts from May to September, thus ideal for species mapping [16, 17]. Due to the limited choice of image data, most of the image has clouds and snow and the shooting date was mostly in autumn and winter, which is not suitable time for grasses identification. Therefore, after comparative analysis of the existing worldview-2 images, we chose the imagery obtained on August 12, 2012 at 05:03:15 UTC +8 (purchased from Beijing Space Will Information Technology co. LTD), with eight multispectral bands at 2.0-m resolution and a panchromatic band at 0.46 m resolution. Compared with the traditional multi-spectral remote sensing imagery (such as Quickbird, IKONOS), the WorldView-2 imagery with unique bands (Table 1) has showed great capability in identification and classification of vegetations [18, 19].

Table 1. Spectral wavelength properties for WorldView-2 multispectral image

The bands of Worldview-2 Imagery, supplied subtle spectral features of plants, are important for plant identification and classification. Therefore, space characteristics and texture characteristics from WorldView-2 data would be very useful for identification of the native plant species and noxious weeds. After converting the image data from radiance to surface reflectance by the fast line-of-sight atmospheric analysis of spectral hypercubes (FLAASH [20]) algorithm built-in Environment for Visualizing Images (ENVI 5.3) software, the accurate reflectance of both native plant species and noxious weeds was obtained. Gram-Schmidt (GS) method was used to fuse panchromatic bands and multi-spectral bands of the image, and the multi-spectral images with a resolution of 0.5 m and 8 bands were obtained.

2.3 Grassland Coverage Data Collection

Field data were obtained from two field surveys. The first field survey started on 16 August and finished on 28 August, 2013. The second field survey was from 10 to 19 August, 2017. South S750 hand-held sub-meter GPS was utilized to record sample location information with an accuracy of ~0.5 m after differential correction. An “X” Sampling method was defined in 30 m × 30 m square. The collected samples, which were located in the vertices and center of the square, were 0.5 m × 0.5 m squares (Fig. 3).

Fig. 3.
figure 3

“X” sampling method sampling and spectrometer measurement (a) Field measurement of 0.5 m sample square (b) A schematic of “X” sampling method (c) Our team measured the spectrum of native plant species and noxious weeds using spectrometer (d) Main information attribute table structure of field collection.

Due to the harsh highland environment and unreachable in the study area, all-together 145 samples were collected along major roads and rivers in the study area as shown in Fig. 4. The first 30 samples were collected along the Tongtian river in the year 2013. The rest 115 samples, were collected in the year 2017 to expand the sample of our research. After filed data processing and analysis, 128 unduplicated and accurate samples were selected for classification. For each collected sample, location and grassland coverage information were recorded as shown in Fig. 3(d).

Fig. 4.
figure 4

Distribution of samples in the study area

According to our field surveys, the grass coverage and component had not much changed, possibly because of sparsely populated and protective grazing policies in the study area. Although the time span of image acquisition and the second field investigation is 5 years long, we believed that the vegetation in the study area have not change much that the time span could be ignored. In addition, the influence of season is more significant than that of year for grass in the study area. Thus, the two field surveys were conducted in august, which was consistent with the acquisition time of the WorldView-2 imagery, to ensure the consistency of vegetation identification features.

2.4 The Random Forest Algorithm

Random forest (RF) classification algorithm has been widely used in the field of remote sensing data classification due to its characteristics of fast classification speed, high accuracy and high dimensional adaptability. The random forest classification algorithm model used in this study is based on enmap-box (Environmental Mapping and Analysis Program), which can be used in combination with ENVI5.1 plug-in. Enmap-box is a remote sensing data processing toolkit developed by the German environmental mapping and analysis program based on IDL [21]. 100 default classification trees with the Gini coefficient, samples and classification features parameters was adapted for creating the Random Forest Classification (RFC) model file. The RFC model was utilized to calculate variable importance of features and classify coverage grades of the native plant species and noxious weeds.

2.5 Features for Extraction

In this study, 6 types of features were calculated from the spectral values extracted fromWorldView-2 imagery. These features are 8 fused multispectral bands (FMB), first derivative (FD) spectrum of the 8 bands, 8 vegetation indexes (VI), 2 biochemical indexes (BI), 3 hat transform features (KT), 8 gray level co-occurrence matrix (GLCM), all together totaling 37 features (Table 2). The 8 FMBs, a 0.5 m high resolution for identifying the grasses, were obtained from the original panchromatic and multi-spectral bands using GS pan-sharping method. The spectral curve tends sharper in higher order derivative than the origin spectral curve, so that the Spectral differences between native plant species and noxious weeds would be more significant. Thus, first derivate spectrum was calculated for each band.

Table 2. All 6 type features (n = 37) derived from the WorldView-2 eight origin bands

The 8 VIs were Normalized Difference Vegetation Index (NDVI), Visible Atmospherically Resistant Index (VARI), Worldview Improved Vegetative Index (WV-IVI), Worldview Built-Up Index (WV-BU), Worldview New Iron Index (WV-NII), Worldview Soil Index(WV-SI), Worldview Non-Homogenous Feature Difference Index (WV-NHFDI) and Worldview Water Index (WV-WI), generated by spectral index tools of ENVI5.3. The Worldview Indexes were designed especially for vegetable identification of Worldview image. We expected to distinguish the two types species by worldview exclusive vegetation index.

The biochemical and biophysical parameters of plant leaves and canopy, such as chlorophyll, carotene and anthocyanin, would be important factors affecting spectral reflectance of vegetation. Related research shows that anthocyanins and carotenoids are the most important biochemical indexes to distinguish the noxious weeds from native plant species in the Three-River Headwaters Region [22, 23]. Thus, the Anthocyanin Reflectance Index 1 (ARI1) and Carotenoid Reflectance Index 1 (CRI1) were calculated. The first three components of tasseled cap transformation, Soil Brightness, Greenness and Wetness, were selected for classification. The mean value of Gray-level Co-occurrence Matrix (GLCM) were also selected.

3 Results

3.1 Definition of Grown Types for Native Plant Species and Noxious Weeds

The vegetation type in Three-River Headwaters Region is mainly alpine meadow, including about 10 kinds mix-living grasses. It is almost impossible to identify all the vegetation species one by one. We divided these meadows into two major categories: native plant species and noxious weeds and assumed that each pixel of the grass imagery was composed by the native plant species, noxious weeds and land. According the field investigation, the maximum of coverage of native plant species (CNPS) is 75%, while the minimum is 5%. The maximum of coverage of noxious weeds (CNW) is 55%, while the minimum is 0%. According to the expert experience, the grown types of native plant species coverage could be divided into 0−10% (10%), 10−20%, 20−30%, 30−40%, 40−50%, 50−60%, 60−70% and 70−100%, a total of 8 grades. The grown types of noxious weeds coverage could be divided into 6 grades: 0−10% (10%), 10−20%, 20−30%, 30−40%, 40−50%, 50−60%. Thus, the growing types of the study area is the arrangement and combination of all grades of native plant species and noxious weeds, forming the information of composition of grassland in 37 forms (Table 3).

Table 3. 37 grown types of native plant species and noxious weeds

3.2 Features’ Importance

Using a smaller number of features may result in a non-inferior accuracy compared to the use of larger feature sets, and provides potential advantages regarding data storage and computational processing costs. Thus, RF was applied to measure the relative importance of the 6 types of features for mapping the native plant species and noxious weeds. The importance of all together 37 features was calculated by descending ordering. In this study, a threshold value of 0.3 was set for the normalized and raw variable importance of the RF. 17 features was selected as Table 4.

Table 4. Importance of selected 17 feature calculated by RF (>=0.3)

The selected features were divided into 6 types, with a total of 17 features: (1) 5 features of the original band, (2) 3 features of the first derivative, (3) 2 features of the hat transform, (4) 1 feature of the biochemical index, (5) 5 features of the texture, (6) 1 feature of the vegetation index.

3.3 Classification Result and Accuracy

In this study, the method of direct verification of measured points is used to evaluate the classification accuracy. The training samples and verification samples in this study were all from the field investigation in study area. All 128 samples were divided into training samples (n = 91) and verification samples (n = 37) according to the expert experience.

There is no overlap between the training samples and the verification samples, which are independent on each other. All the native plant species and noxious weeds coverage types need to be included to ensure the reliability and accuracy of the verification. These 37 samples were used to verify the pixel-based RF recognition results. However, when defining the category of “growth type” of native plant species and noxious weeds, the threshold value of grade interval of 10% was an experiential definition, and there was a certain deviation between the classification result and the actual coverage information. For example, assuming that the CNPS or CNW of sample A is 19% and B is 21%, A would be classified as 10−20% while B would be classified as 20−30% though the difference is only 2%. Considering the errors of the above self-defined coverage grade, this paper uses the direct verification accuracy results-“overall accuracy” (OA: exact match between estimated and measured grades) and overage difference of one grade (10%) for accuracy-“grade expansion accuracy” (GEA: for example, “20%–30%” grade is recognized as “30%–40%” grade or as “10%–20%” grade) to evaluate the classification accuracy.

The classification accuracy of native plant species is 43.2% (OA) and 59.4% (GEA) and the accuracy of noxious weeds is 62.1% (OA)and 86.4% (GEA) using 8 origin bands, while using 37 features extracted form Worldview-2, the accuracy of native plant species is 45.9% (OA) and 64.8% (GEA) and the accuracy of noxious weeds is 64.8% (OA) and 83.7% (GEA). After using the optimized 17 features, the accuracy of native plant species is 51.3% (OA) and 70.2% (GEA) and the accuracy of noxious weeds is 67.5% (OA) and 89.1% (GEA) as shown in Table 5.

Table 5. The OA&GEA of native plant species and noxious weeds using 8 bands, 37 features and 17 selected features

4 Discussion

4.1 Variables’ Importance for Classifying Grasses in Study Area

WorldView-2 data offer 8 original bands to identify grassland. These bands and the derivative features (such as FD, VI, BI, KT and GLCM) have different characteristics with regard to grass classification. Feature selection is an effective method for selecting the optimal number of top-ranked features of WorldView-2 data for better classification.

For the 8 WorldView-2 bands, RF has successfully described and explored the relative importance of each individual band. Six selected bands (coast, yellow, red, NIR1, NIR2) might contribute more in reclassifying the native plant species and noxious weeds using RF than blue and green band. However, the importance of five derivative features did not seemed to be the exact same response as the corresponding band origin bands. For the FD features, the first derivative of green, NIR1 and NIR2 were more important than the other bands. Also, there is no obvious relativity of the feature importance between these derivative features and origin bands. As shown in Table 1, 17 features selected by RF reflected their superiority in identifying the native plant species and noxious weeds. The new band NIR1 was the most important feature for classifying the two kind of species. The variation in spectral reflectance of these species in the NIR1 portion (770 to 895 nm) may be due to significant variations in internal leaf structure and water content [24, 25]. CRI1 was the second important feature due to the differences between the native plant species and noxious weeds on chlorophyll a and b, β-carotene, α-carotene, and xanthophylls [26].

5 original bands, 5 GLCM features, 3 FD features, 2 KT features, 1BI features and 1 WI features were finally selected as the most suitable features for classification in the study. The use of RF for classification of the native plant species and noxious weeds with selected features confirmed its utility as a variable selection method [27]. As shown in Table 5, the noxious weeds have a greater potential for being distinguished than the native plant species using WorldView-2 imagery and its derivative features. This result also confirms the previous studies that RF has been applied in remote-sensing image classifications with much better performance.

4.2 Classification Assessment

RF classification for native plant species using 8 bands, 37 Features and 17 selected features yielded an OA of 43.2%, 45.9% and 51.3%, and GEA of 59.4%, 64.8% and 70.2%. This might verify the importance of the feature selection of RF classification on the impact of classification results (8 Bands < 37 Features < 17 selected features), which was also a similar phenomenon in the OA (62.1%, 64.8%, 67.5) and GEA (86.4%, 83.7%, 89.1%) of noxious weeds classification. As shown in Fig. 5, the identification accuracy of noxious weeds was better compared to the native plant species, both the OA and GEA. This could be due to more regular textures and relatively high variance of species’ biochemical and biophysical properties such as chlorophyll a and b, β-carotene, α-carotene, and xanthophylls. Considering the continuity of native plant species and noxious weeds coverage, GEA seems to be more suitable for the evaluation of grass classification accuracy.

Fig. 5.
figure 5

(a) Histogram of OA for native plant species and noxious weeds (b) Histogram of GEA for native plant species and noxious weeds

In summary, the results from this study demonstrate the possibility of classifying native plant species and noxious weeds using WorldView-2 data and confirm the robust of random forest algorithm for both variable selection and classification application.

5 Conclusions

In this study, Worldview-2 imagery was used to mapping the native plant species and noxious weeds in typical area of the Three-River Headwaters Region of China by Random Forest algorithm. The experimental results indicate that the classification using WorldView-2 data shows an GEA of 86.4%, 83.7%, 89.1% (noxious weeds) and GEA of 59.4%, 64.8% and 70.2% (native plant species) for 8 original bands, 37 derivate features, and 17 optimized features, respectively. Therefore, the multispectral and derivative data provided by WorldView2 could distinguish between native plant species and noxious weeds. RF algorithm could be applied to the feature optimization and mapping of native plant species and noxious weeds, and more features did not lead to better classifying precision. The 17 optimization features in this study generated higher classification accuracy than 34 features.

In summary, the invasion of noxious weeds poses a serious threat to the growth of native plant species and causes the instability of the ecosystem in the study area. In order to maintain the ecological balance, it is necessary to mapping for the native plant species and noxious weeds. In this regard, we expect that the results of this study can be used to support precision rangeland analysis and provide support for the treatment of invasive and degraded grassland in Three-River Headwaters Region.