Classifiers vs. input variables—The drivers in image classification for land cover mapping

doi:10.1016/j.jag.2009.08.002

International Journal of Applied Earth Observation and Geoinformation

Volume 11, Issue 6, December 2009, Pages 423-430

This paper is dedicated to Professor Walter Larcher on the occasion of his 80th birthday.

https://doi.org/10.1016/j.jag.2009.08.002 Get rights and content

Abstract

The study investigates the performance of image classifiers for landscape-scale land cover mapping and the relevance of ancillary data for the classification success in order to assess and to quantify the importance of these components in image classification. Specifically tested are the performance of maximum likelihood classification (MLC), artificial neural networks (ANN) and discriminant analysis (DA) based on Landsat7 ETM+ spectral data in combination with topographic measures and NDVI. ANN produced high accuracies of more than 75% also with limited input information, while MLC and DA produced comparable results only by incorporating ancillary data into the classification process. The superiority of ANN classification was less pronounced on the level of the single land cover classes. The use of ancillary data generally increased classification accuracy and showed a similar potential for increasing classification accuracy than the selection of the classifier. Therefore, a stronger focus on the development of appropriate and optimised sets of input variables is suggested. Also the definition and selection of land cover classes has shown to be crucial and not to be simply adaptable from existing land cover class schemes. A stronger research focus towards discriminating land cover classes by their typical spectral, topographic or seasonal properties is therefore suggested to advance image classification.

Introduction

Detailed and accurate land cover data are among the most crucial information that are required for large-scale environmental research. The knowledge of the spatial configuration of the Earth's surface is the key for assessing habitat distribution, landscape composition or land use changes and is an essential requirement for landscape modelling and scenario building, particularly in times of global change. The suitability of remote sensing for acquiring land cover data has long been recognised, but the process of generating land cover information from remotely sensed data is still far from being standardised or optimised (Foody, 2002, Lu and Weng, 2007). An extensive variety of multi-spectral image classification methods have been developed, which were recently reviewed by Lu and Weng (2007), though none of the developed classifiers is described as inherently superior to any other, as their performance largely depends on the kind and quality of the input data for the classification and the desired output. Even unsupervised ISODATA classification has been used successfully, for example to extract specific, spectrally distinct features such as forests, fire scars, coastlines or urban areas (Ekercin, 2007, Heinl et al., 2006, Kaya and Curran, 2006, Souza et al., 2003). However, for obtaining thematic land cover data, supervised classification is to be preferred in most cases (Foody, 2001, Jensen, 2005, Kavzoglu, 2009), as desired output classes are already pre-defined and post-classification analyses and class aggregations are not necessarily required. Especially the use of advanced approaches such as artificial neural networks, fuzzy-sets or support vector machines produced levels of accuracy higher than, e.g. the popular maximum likelihood classifier or discriminant analysis (Berberoglu et al., 2007, Dixon and Candade, 2008, Jensen, 2005, Kavzoglu and Mather, 2003, Kavzoglu and Reis, 2008, Pal and Mather, 2005). But only few specific comparisons have been published (Berberoglu et al., 2007, Hardin, 2000, Kavzoglu and Reis, 2008, Paola and Schowengerdt, 1995, Zhang et al., 2007), usually documenting a superiority of the advanced approaches, but also suggesting maximum likelihood classification as better alternative (Carvalho et al., 2004). The use of different numbers and types of land cover classes and sample sizes complicates a quantitative comparison of the results. And despite the often documented inferiority in classification success, maximum likelihood classification is still one of the most widely used classification algorithms (Jensen, 2005), most likely also due to advantages in data handling and processing times (Paola and Schowengerdt, 1995). Therefore, many applied landscape-scale studies and land use/land cover research rely on these standard classification approaches (Brandt and Townsend, 2006, Cushman and Wallin, 2000, Jianchu et al., 2005, Joy et al., 2003, Ruiz-Luna and Berlanga-Robles, 2003). In contrast, advanced approaches are primarily limited to methodological studies for optimising the classification process, often using only very limited sample sizes (Fassnacht et al., 2006, Foody, 2001, Kavzoglu and Mather, 2003, Kavzoglu and Reis, 2008, Ouyang and Ma, 2006, Paola and Schowengerdt, 1997, Yemefack et al., 2006).

Besides the type of image classifier, the use of ancillary data is recognised as being crucial for the performance of image classification. Ancillary data have been used successfully to improve image classification, especially by including topographic measures, NDVI or texture measures in the classification process additionally to the spectral information for separating features with similar spectral properties (Berberoglu et al., 2007, Carpenter et al., 1999, Giannetti et al., 2001, Islam et al., 2008, Joy et al., 2003, Kozak et al., 2008, Lu and Weng, 2007, Saadat et al., 2008, Watanachaturaporn et al., 2008).

Despite extensive research on classifiers and ancillary data since decades, comparisons and applications of image classifiers using standardised samples on landscape-scale are largely missing (Lu and Weng, 2007). To overcome this discrepancy, the present study was conducted mutually both on the performance of different classifiers and on the importance of ancillary data for landscape-scale land cover assessments using pre-defined land cover classes. The present study investigates therefore the effect of a variety of selected and widely accessible input variables and classifiers on classification accuracy overall and on the level of specific land cover classes, and assesses and especially quantifies the importance of these components in image classification. We hypothesize that advanced classification approaches achieve higher overall accuracies compared to standard classifiers with little or no ancillary data, while incorporating ancillary data reduces the importance of the type of classifier. Specifically compared are the performance of maximum likelihood classification, discriminant analysis and artificial neural networks, covering presumably the most widely used hard classifiers and representing parametric and non-parametric classifiers. Ancillary data in the form of topographic measures and NDVI were incorporated step-wise into the classification to document the relevance of these input data. Classification results on the level of land cover classes are discussed in the context of reference data selection and land cover class definition.

Section snippets

Spectral data properties and study region

The spectral information for the image classification was acquired by the Landsat7 ETM+ sensor (path193/row027) on 13 September 1999. The imagery was provided by the Global Land Cover Facility (GLCF) (www.landcover.org) as orthorectified GeoCover data set in GeoTIFF format with UTM projection (UTM 32N), WGS-84 datum, and 28.5 m pixel size. The six bands representing the visible and infrared spectrum (ETM+ bands 1–5, 7) were used in the study. The scene was cut to 1650 × 3300 pixels to fit to the

Overall classification accuracy related to classifiers and input variables

The classifications by DA and MLC produced very similar overall accuracies for all input combinations. Accuracies were in the range of 55–60% for using only spectral data (ETM) as input variables and reached about 75% when ancillary data were included (Fig. 2). The classifications using ANN produced higher overall accuracies for all input combinations compared to MLC and DA, reaching about 75% for using only spectral data (ETM) and 85% with ancillary data. Maximum overall classification

The relevance of input variables and classifiers for image classification accuracy

Spectral data, topographic measures and NDVI data were used to test their performance in image classifications by maximum likelihood classification (MLC), discriminant analysis (DA) and artificial neural networks (ANN). The use of ancillary data significantly improved the classification accuracy for the present data set compared to using spectral data (ETM) only. These increases in overall accuracy were observed independent of the classifier. Especially incorporating topographic information

Conclusion

The comparison of the performance of MLC, DA and ANN in image classification revealed advantages of ANN classifications in image accuracy overall and for single land cover classes. The incorporation of ancillary data into the classification process clearly increased classification accuracy overall and on the level of single land cover classes, independent of the used classifier. However, ANN produced high accuracies also with limited input information, while MLC and DA produced comparable

Acknowledgements

The research was kindly supported by the University of Innsbruck Vice Rectorate for Research and the European Academy Bolzano (EURAC). The authors thank two anonymous reviewers for their valuable comments and suggestions.

References (57)

M. Bach et al.
Accuracy and congruency of three different digital land-use maps
Landscape and Urban Planning
(2006)
S. Berberoglu et al.
Texture classification of Mediterranean land cover
International Journal of Applied Earth Observation and Geoinformation
(2007)
E.C. Brown de Colstoun et al.
National Park vegetation mapping using multitemporal Landsat 7 data and a decision tree classifier
Remote Sensing of Environment
(2003)
G.A. Carpenter et al.
A neural network method for efficient vegetation mapping
Remote Sensing of Environment
(1999)
H. Carrão et al.
Contribution of multispectral and multitemporal information from MODIS images to land cover classification
Remote Sensing of Environment
(2008)
L.M.T.D. Carvalho et al.
Selection of imagery data and classifiers for mapping Brazilian semideciduous Atlantic forests
International Journal of Applied Earth Observation and Geoinformation
(2004)
J. Cihlar et al.
Land cover classification with AVHRR multichannel composites in northern environments
Remote Sensing of Environment
(1996)
K.S. Fassnacht et al.
Key issues in making and using satellite-based maps in ecology: a primer
Forest Ecology and Management
(2006)
G.M. Foody
Status of land cover classification accuracy assessment
Remote Sensing of Environment
(2002)
F. Giannetti et al.
Integrated use of satellite images, DEMs, soil and substrate data in studying mountainous lands
International Journal of Applied Earth Observation and Geoinformation
(2001)

X. Jianchu et al.

Exploring the spatial and temporal dynamics of land use in Xizhuang watershed of Yunnan, southwest China

International Journal of Applied Earth Observation and Geoinformation

(2005)

T. Kavzoglu

Increasing the accuracy of neural network classification using refined training data

Environmental Modelling & Software

(2009)

S. Kaya et al.

Monitoring urban growth on the European side of the Istanbul metropolitan area: A case study

International Journal of Applied Earth Observation and Geoinformation

(2006)

J. Kozak et al.

European forest cover mapping with high resolution satellite data: the Carpathians case study

International Journal of Applied Earth Observation and Geoinformation

(2008)

S.K. Langley et al.

A comparison of single date and multitemporal satellite image classifications in a semi-arid grassland

Journal of Arid Environments

(2001)

R. Latifovic et al.

Accuracy assessment using sub-pixel fractional error matrices of global land cover products derived from satellite data

Remote Sensing of Environment

(2004)

J.F. Mas

Mapping land use/cover in a tropical coastal area using satellite sensor data, GIS and artificial neural networks

Estuarine Coastal and Shelf Science

(2004)

D.M. Muchoney et al.

Pixel- and site-based calibration and validation methods for evaluating supervised classification of remotely sensed data

Remote Sensing of Environment

(2002)

H. Saadat et al.

Landform classification from a digital elevation model and satellite imagery

Geomorphology

(2008)

D.P. Shrestha et al.

Land use classification in mountainous areas: integration of image processing, digital elevation data and field knowledge (application to Nepal)

International Journal of Applied Earth Observation and Geoinformation

(2001)

C. Souza et al.

Mapping forest degradation in Eastern Amazon from SPOT4 through spectral mixture models

Remote Sensing of Environment

(2003)

M. Yemefack et al.

Investigating relationships between Landsat-7 ETM+ data and spatial segregation of LULC types under shifting agriculture in southern Cameroon

International Journal of Applied Earth Observation and Geoinformation

(2006)

C. Bishop

Neural Networks for Pattern Recognition

(1996)

Bossard, M., Feranec, J., Otahel, J., 2000. CORINE Land Cover Technical Guide—Addendum 2000. Technical Report No 40...

J.S. Brandt et al.

Land use–land cover conversion, regeneration and degradation in the high elevation Bolivian Andes

Landscape Ecology

(2006)

L. Capao et al.

An approach for land cover mapping with multi-temporal MERIS imagery

S.A. Cushman et al.

Rates and patterns of landscape change in the Central Sikhote-alin Mountains, Russian Far East

Landscape Ecology

(2000)

R.S. Defries et al.

NDVI-derived land-cover classifications at a global-scale

International Journal of Remote Sensing

(1994)

Cited by (42)

Land cover classification in an era of big and open data: Optimizing localized implementation and training data selection to improve mapping outcomes
2022, Remote Sensing of Environment
Citation Excerpt :
Ancillary datasets are a relevant information source to characterize the physical geographic context and establish the relationship of various land cover types to particular environmental conditions (Yang et al., 2018). Ancillary data provide descriptive information on factors such as topographic characteristics (derived from digital elevation models, DEM), climate, and hydrological conditions, which can enhance large-area land cover classification models (Amatulli et al., 2018; Hurskainen et al., 2019), allowing for the separation of classes with similar spectral characteristics (Heinl et al., 2009). Besides elevation data and related derivatives (Franklin, 2020), an example of a unique and underutilized ancillary data source for land cover classification is lidar.
Deriving land cover from remotely sensed data is fundamental to many operational mapping and reporting programs as well as providing core information to support science activities. The ability to generate land cover maps has benefited from free and open access to imagery, as well as increased storage and computational power. The accuracy of the land cover maps is directly linked to the calibration (or training) data used, the predictors and ancillary data included in the classification model, and the implementation of the classification, among other factors (e.g., classification algorithm, land cover heterogeneity). Various means for improving calibration data can be implemented, including using independent datasets to further refine training data prior to mapping. Opportunities also arise from a profusion of possible calibration datasets from pre-existing land cover products (static and time series) and forest inventory maps through to observation from airborne and spaceborne lidar observations. In this research, for the 650 Mha forested ecosystems of Canada, we explored approaches to refine calibration data, integrate novel predictors, and optimize classifier implementation. We refined calibration data using measures of forest vertical structure, integrated novel spatial (via distance-to metrics) model predictors, and implemented a regionalized approach for optimizing training data selection and model-building to ensure local relevance of calibration data and capture of regional variability in land cover conditions. We found that additional vetting of training data involved the removal of 44.7% of erroneous samples (e.g. treed vegetation without vertical structure) from the training pool. Nationally, distance to ephemeral waterbodies was a key predictor of land cover, while the importance of distance to permanent water bodies varied on a regional basis. Regionalization of model implementation ensured that classification models used locally relevant descriptors and resulted in improved classification outcomes (overall accuracy: 77.9% ± 1.4%) compared to a generalized, national model (70.3% ± 2.5%). The methodological developments presented herein are portable to other land cover projects, monitoring programs, and remotely sensed data sources. The increasing availability of remotely sensed data for land cover mapping, as well as non-image data for aiding with model development (from calibration data to complementary spatial data layers) provide new opportunities to improve and further automate land cover mapping procedures.
Value of dimensionality reduction for crop differentiation with multi-temporal imagery and machine learning
2017, Computers and Electronics in Agriculture
Citation Excerpt :
Nevertheless, the value of multi-temporal data for crop discrimination has been demonstrated by Wardlow et al. (2007), Ozelkan et al. (2015), Zheng et al. (2015). Multi-temporal data allows for the generation of a large number of features (variables) for each image acquisition date, which has been shown to substantially improve results (Heinl et al., 2009). However, the use of multi-temporal data often leads to very high feature counts (Lu and Weng, 2007; Heinl et al., 2009).
This study evaluates the use of automated and manual feature selection – prior to machine learning – for the differentiation of crops in a Mediterranean climate (Western Cape, South Africa). Five Landsat-8 images covering the different crop class phenological stages were acquired and used to generate a range of spectral and textural features within an object-based image analysis (OBIA) paradigm. The features were used as input to decision trees (DTs), k-nearest neighbour (k-NN), support vector machine (SVM), and random forest (RF) supervised classifiers. Testing was done by performing classifications (using all spatial variables) and then incrementally reducing the feature counts (based on importance allocated to features by filters), feature extraction, and manual (semantic) feature selection. Classification and regression trees (CART) and RF were used as methods to filter feature selection. Feature-extraction methods employed include principal components analysis (PCA) and Tasselled cap transformation (TCT). The classification results were analysed by comparing the overall accuracies and kappa coefficients of each scenario, while McNemar’s test was used to assess the statistical significance of differences in accuracies among classifiers. Feature selection was found to improve the overall accuracies of the DT, k-NN, and RF classifications, but reduced the accuracy of SVM. The results showed that SVM with feature extraction (PCA) on individual image dates produced the most accurate classification (96.2%). Semantic groupings of features for classification also revealed that using the image bands and indices is not sufficient for crop classification, and that additional features are needed. The accuracy differences of the classifiers were, however, not statistically significant, which suggests that, although dimensionality reduction can improve crop differentiation when multi-temporal Landsat-8 imagery is used, it had a marginal effect on the results. For operational crop-type classification in the study area (and similar regions), we conclude that the SVM algorithm can be applied to the full set of features generated.
Object-Oriented Random Forest for High Resolution Land Cover Mapping Using Quickbird-2 Imagery
2017, Handbook of Neural Computation
This chapter presents an experimental research study on the use of random forest (RF) ensemble learning in conjunction with the object-based image analysis (OBIA) for classification of land use and land cover types using a Quickbird-2 image. It starts by presenting research objectives together with the explanations for the need of advanced classifiers and image analysis methods for satellite image classification that is a complex process affected by some uncertainties and decisions made by the analysts. Limitations of conventional parametric classifiers are discussed, and the relevancy of random forest ensemble method, which considers portions of the samples for training trees in the forest to eliminate the negative effect of mixed and atypical pixels, is established. An application is presented for the object-based random forest method with optimal parameter setting of the RF and OBIA approaches. Results are thoroughly analyzed using common accuracy metrics, and McNemar's statistical test.
Integrating rapideye and ancillary data to map alpine habitats in South Tyrol, Italy
2015, International Journal of Applied Earth Observation and Geoinformation
Citation Excerpt :
However, SVM have not been fully tested in mapping vegetation in alpine regions. Ancillary data such as topographic parameters have proven useful for land cover mapping (Heinl et al., 2009; Fan, 2013) especially in alpine regions where vegetation types are closely related to topographic relief (Hoersch et al., 2002; Schirpke et al., 2012). Moreover, texture features can increase the classification accuracy for heterogeneous land cover compositions (Franklin et al., 2000; Rodriguez-Galiano et al., 2012; Paneque-Gálvez et al., 2013) and classifications in mountainous areas (Hurni et al., 2013) such as the alpine landcaspe since they can represent vegetation patterns and capture differences between classes (Corbane et al., 2013).
In this paper, we present a two-stage method for mapping habitats using Earth observation (EO) data in three Alpine sites in South Tyrol, Italy. The first stage of the method was the classification of land cover types using multi-temporal RapidEye images and support vector machines (SVMs). The second stage involved reclassification of the land cover types to habitat types following a rule-based spatial kernel. The highest accuracies in land cover classification were 95.1% overall accuracy, 0.94 kappa coefficient and 4.9% overall disagreement. These accuracies were obtained when the combination of images with topographic parameters and homogeneity texture was used. The habitat classification accuracies were rather moderate due to the broadly defined rules and possible inaccuracies in the reference map. Overall, our proposed methodology could be implemented to map cost-effectively alpine habitats over large areas and could be easily adapted to map other types of habitats.
The urgent need to develop a new grassland map in China: based on the consistency and accuracy of ten land cover products
2023, Science China Life Sciences
Sand Dunes Spectral Index Determination Using Machine Learning Model: Case Study of Baiji Sand Dunes Field Northern Iraq
2022, Iraqi Geological Journal

View all citing articles on Scopus

View full text

Classifiers vs. input variables—The drivers in image classification for land cover mapping

Abstract

Introduction

Section snippets

Spectral data properties and study region

Overall classification accuracy related to classifiers and input variables

The relevance of input variables and classifiers for image classification accuracy

Conclusion

Acknowledgements

Landscape and Urban Planning

International Journal of Applied Earth Observation and Geoinformation

Remote Sensing of Environment

Remote Sensing of Environment

Remote Sensing of Environment

International Journal of Applied Earth Observation and Geoinformation

Remote Sensing of Environment

Forest Ecology and Management

Remote Sensing of Environment

International Journal of Applied Earth Observation and Geoinformation

International Journal of Applied Earth Observation and Geoinformation

Environmental Modelling & Software

International Journal of Applied Earth Observation and Geoinformation

International Journal of Applied Earth Observation and Geoinformation

Journal of Arid Environments

Remote Sensing of Environment

Estuarine Coastal and Shelf Science

Remote Sensing of Environment

Geomorphology

International Journal of Applied Earth Observation and Geoinformation

Remote Sensing of Environment

International Journal of Applied Earth Observation and Geoinformation

Neural Networks for Pattern Recognition

Land use–land cover conversion, regeneration and degradation in the high elevation Bolivian Andes

Landscape Ecology

An approach for land cover mapping with multi-temporal MERIS imagery

Rates and patterns of landscape change in the Central Sikhote-alin Mountains, Russian Far East

Landscape Ecology

NDVI-derived land-cover classifications at a global-scale

International Journal of Remote Sensing