Selection of spectral features for land cover type classification

https://doi.org/10.1016/j.eswa.2018.02.028Get rights and content

Highlights

  • A selection scheme for spectral features has been proposed.

  • Classification performance has been evaluated using four types of classifiers.

  • True Skill Statistics, a unified performance metric, has been employed.

  • Features have been ranked using Condorcet ranking.

  • Even with only three features, a classification score of 81.84% has been obtained.

Abstract

Sophisticated sensors of satellites help researchers collect detailed maps of land surface in various image wavebands. These wavebands are processed to form spectral features identifying distinct land structures. However, depending on the structures subject to the research topic, only a portion of collected features might be sufficient for identification. Aim of this study is to present a scheme to pick most valuable spectral features derived from ASTER imagery in order to distinguish four types of tree ensembles: ‘Sugi’ (Japanese Cedar), ‘Hinoki’ (Japanese Cypress), ‘Mixed deciduous’, and ‘Others’. Forward selection, a type of wrapper techniques, was employed with four types of classifiers in several train/test splits. Final rank of each feature was determined by Condorcet ranking after application of each classifier. Results showed that among four classifiers, artificial neural networks helped the selection process choose the most valuable features and a high accuracy value of 90.42% (with a true skill statistics score of 91.26%) was obtained using only top-ten features. For feature sets in smaller sizes, support vector machines classifier also performed well and provided an accuracy of 80.33% (with a true skill statistics score of 81.84%) using only top-three features. With help of these findings, landscape data can be represented in smaller forms with spectral features having most discriminative power. This will help reduce processing time and storage needs of expert systems.

Introduction

The data about earth's surface is necessary in many areas, especially land cover data is needed in many ecologic applications. By monitoring the surface of earth with remote sensing, various data can be collected in a wide range of spatial and spectral resolutions within a short time together with lower expenditure with respect to land surveying. Thus, map updating and land cover classification can be considerably utilized (Kavzoglu & Colkesen, 2009). Today, many methods are employed to gain information about the biophysical structures, land cover and surface of earth. Remotely sensed images are one of these methods that can collect categorical and continuous data about earth. In addition to collect huge amounts of data with remotely sensed images and multi-temporal analysis, data sources can easily be monitored, thus, vital innovations and alterations can be detected. Remotely sensed data is convenient to produce land cover types including thematic maps. Because they present data in a spatially continuous format together with a range of spatial and temporal scales. To acquire thematic maps, image classification is used in remote sensing. Furthermore, radiance and reflectance like continuous measurements of a multispectral imagery should be converted to a thematic description of the land, thus a land cover classification is attained (Atkinson & Naser, 2010).

For land cover classification, various types of machine learning algorithms have been utilized in applications of remote sensing. Atkinson (2004) studied on usage of a non-parametric K-Nearest Neighbor (KNN) classifier with a spatial weighting in remote sensing. Boolean simulation was utilized together with geostatistical simulation over a remotely sensed image which was composed of four wavebands. Thus, managed grassland, woodland and rough grassland classes were offered. Eventually, the non-centered covariance and a simple inverse spatial distance weightings were compared. In another study, Bosch, Zisserman, and Munoz (2007) worked on image classification according to objects with spatial pyramid matching based shape representations and automatic region selection in training. And also, as a multi-way classifier, random ferns and random forests were utilized because of their easy testing and training structure when compared with the multi-way SVM. The collected results were used to classify the data sets “Caltech-101′' and “Caltech-256′' and also to get the comparison between the random forests and random fern classifier and a multi way SVM classifier. Dormann et al. (2007) summarized methods for improving the spatial autocorrelation which is a promising methodology for land cover examination. Generalized least squares (GLS), simultaneous autoregressive models (SAR) and generalized estimation equations (GEE) are some of these methods that are briefly given with their usage areas in their study. Horning (2010) used random forests algorithm for their ability to provide an incorporation between many data layers together with image data. Random forests can be used for satellite and aerial image classification because it is a classification and regression type algorithm. By the way, it can utilize continuous data sets and compute ancillary information. Moosmann, Nowak, and Jurie (2008) addressed usage of randomized clustering forests (ERC-Forests) for content based image organization and retrieval issue. They also proposed a new algorithm for image patch quantization and training of randomized clustering forests faster. It is reported that proposed methodology provides good results at learning distances between images of never-seen before objects.

Although, aforementioned methods have a great potential of generalization, they suffer when the complexity of the processed dataset increases. In order to reduce this complexity and increasing storage needs, feature selection methods like wrapper, filter or ensemble techniques are mostly applied. Because of ascending feature dimensionality, classifiers need a rising number of training samples. But, it is not always possible to obtain large sets of training samples because of the steep costs that should be provided. To reduce the crowdedness, feature selection and feature extraction are utilized (Gilbertson & Niekerk, 2017). Basically, in feature extraction, the original data is displaced with a new group of features (Sowmya & Trinder, 2000). Feature selection presents an intuitive solution to minimize data dimensionality by collecting a subset of valuable features from the main dataset, thus the number of variables are decreased to the most important ones (Meyer, Reudenbach, Hengl, Katurji, & Nauss, 2018; Guyon & Elisseeff, 2003). To handle high dimensional datasets efficiently, feature selection has emerged as a remarkable way to provide more manageable analysis. Because, the efficiency of the feature selection has been proven in the high dimensional classification problems with owning cost effective and faster predictors. In literature, wide variety of feature selection techniques are presented (Dessì & Pes, 2015). Feature selection algorithms are classified according to the dataset as it is labeled or not. Constitutively, the algorithms are classified as supervised (Han et al., 2015), unsupervised (Romary, Ors, Rivoirard, & Deraisme, 2015), and semi-supervised (Sechidis & Brown, 2018). Supervised selection method is basically composed of filter, wrapper and embedded methods as presented in Table 1. The subcategorization is based on their interplaying with the learning algorithm which is known as classifier. This classifier is utilized for inferring a model. Main advantage of filter approach is its computational and statistical scalability however, it neglects the interplay with the classification algorithm which is utilized to form the predictor (Lin, Liang, Yeh, & Huang, 2014). Wrapper method interacts with classifier but it has classifier-dependent selection. To forecast the labels of the unlabeled data, wrapper approach utilizes a single learner or an ensemble learning model that can be also referred as ensemble method (Sheikhpour, Sarram, Gharaghani, & Chahooki, 2017). In embedded method, the performance of a learning algorithm is improved. In this way, embedded method is similar to wrapper method. During model formation, for the accuracy of the classification, embedded method learns which one of the features makes most of the contribution. A fine distinction of the embedded method and the wrapper method is, the preceding one uses an intrinsic model building metric while learning (Gilbertson & Niekerk, 2017).

In this study, it is intended to determine most valuable features for land cover classification on a sample dataset handled in study of Johnson, Tateishi, and Xie (2012) containing spectral information (derived from ASTER image bands) of samples from four distinct tree types: ‘Sugi’, ‘Hinoki’, ‘Mixed deciduous’, and ‘Others’. For this aim, forward feature selection which is a type of wrapper techniques, is employed over four types of classifiers: Random Forests, Naive Bayes, Support Vector Machines (SVM), and Artificial Neural Networks (ANN), separately.

Main disadvantage of wrapper techniques are their O(D2) time complexity, where D is the number of features, making them time consuming applications for datasets with high number of features. On the other hand, they have the capability to catch relations between features and thus reduce redundancy. Because of this property and need for smallest feature sets with most reliable results, a forward selection type wrapper is used in this study.

One of the main problems in classification task is the bias effect in the data. This effect can cause selection of different features when using different training/test sets each time. In order to eliminate effect of possible bias in the data, feature selection process is repeated thousand times. Afterwards, among these trials, chosen features are ordered using Condorcet ranking according to their rate of preference. It was observed that an ANN classifier using only ten features and five neurons in its hidden layer, samples from four distinct classes can be distinguished with an accuracy of 90.42%.

The rest of the paper is organized as follows: Section 2 contains detailed information about materials and methods used in the study. Results of the study are presented in Section 3 with details, and finally, the study is concluded in Section 4.

Section snippets

Material and methods

This section presents materials and methodologies subject to the study.

Results

After application of feature selection on training&validation sets using four types of classifiers, features were ranked using Condorcet algorithm as mentioned earlier. Table 3 presents ranks of features along with their mean rates of getting preferred against rest of remaining features in 1000 trials.

From Table 3, it can be clearly seen that apart from chosen classifier, features #3 (b3) and #10 (pred_minus_obs_H_b1) are the top-two preferred ones for discriminating samples of different

Conclusions

Collecting information about biophysical structures and land cover from surface of earth is the main objective for remote sensing studies. Data obtained from these studies can effectively be used for reclamation and tree breeding which provides remarkable benefits and gains in environmental and economical fields. Satellite imagery is the generally applied methodology for collecting the considered data. By the help of advanced acquisition techniques, images from surface of earth are collected in

Compliance with ethical standards

No human participants or animals were involved in this study.

Conflict of interest

The authors declare that they have no conflict of interest.

References (46)

  • B. Nakisa et al.

    Evolutionary computation algorithms for feature selection of EEG-based emotion recognition using mobile sensors

    Expert Systems With Applications

    (2018)
  • T. Romary et al.

    Unsupervised classification of multivariate geostatistical data: Two algorithms

    Computers and Geosciences

    (2015)
  • R. Sheikhpour et al.

    Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer

    Applied Soft Computing

    (2016)
  • R. Sheikhpour et al.

    A Survey on semi-supervised feature selection methods

    Pattern Recognition

    (2017)
  • A. Sowmya et al.

    Modelling and representation issues in automated feature extraction from aerial and satellite images

    ISPRS Journal of Photogrammetry & Remote Sensing

    (2000)
  • Y. Ye et al.

    Stratified sampling for feature subspace selection in random forests for high dimensional data

    Pattern Recognition

    (2013)
  • X. Zhang et al.

    Embedded feature-selection support vector machine for driving pattern recognition

    Journal of the Franklin Institute

    (2015)
  • H. Zhou et al.

    Sequential data feature selection for human motion recognition via Markov blanket

    Pattern Recognition Letters

    (2017)
  • T. Abeel et al.

    Robust biomarker identification for cancer diagnosis with ensemble feature selection methods

    Bioinformatics

    (2010)
  • P.M. Atkinson et al.

    A geostatistically weighted k-NN classifier for remotely sensed imagery

    Geographical Analysis

    (2010)
  • A. Bosch et al.

    Image classification using random forests and ferns

  • A.B. Brahim et al.

    Robust ensemble feature selection for high dimensional data sets

  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • Cited by (7)

    • Improving land cover classification in an urbanized coastal area by random forests: The role of variable selection

      2020, Remote Sensing of Environment
      Citation Excerpt :

      In environmental applications, variable selection methods either based on simple statistical tests (e.g. Pearson’ correlation coefficient and Wilks criterion) or wrapped in certain algorithms (e.g. Recursive feature elimination based on SVM) have been applied in land cover classification tasks (Serpico et al., 1994; Pal, 2005). They were used to reduce data complexity by extracting relevant information from multispectral or hyperspectral satellite imagery (Pal, 2005; Sesnie et al., 2008; Ghimire et al., 2010), and to find relatively significant features in order to improve prediction accuracy for specific land types (Serpico et al.,1994; Gumus and Kirci, 2018). However, the performance of variable selection varies with different criteria used, and there is no consensus made on the preference of a variable selection method for land cover classification from remotely sensed data.

    • Vehicle engine classification using normalized tone-pitch indexing and neural computing on short remote vibration sensing data

      2019, Expert Systems with Applications
      Citation Excerpt :

      This relatively short time duration gives rise to a practical intelligent system that can classify vehicle engines using remotely and hidden laser sensor for slowly moving vehicles. Neural computing, including ANN and deep nets, has been among the most effective classifiers in many existing intelligent expert systems, such as recent work on audio clip classification (Murthy & Koolagudi, 2018), and land cover classification (Gumus, 2018), where ANN was found to be superior to other classifiers by both groups for different application domains. In this work, after extensive empirical studies, we have found that besides the ANN, the deep nets, the DBN in particular, can all produce better performances than other conventional classifiers, such as kNN, random forest and boosting approaches.

    View all citing articles on Scopus
    View full text