Classifiers vs. input variables—The drivers in image classification for land cover mapping

This paper is dedicated to Professor Walter Larcher on the occasion of his 80th birthday.
https://doi.org/10.1016/j.jag.2009.08.002Get rights and content

Abstract

The study investigates the performance of image classifiers for landscape-scale land cover mapping and the relevance of ancillary data for the classification success in order to assess and to quantify the importance of these components in image classification. Specifically tested are the performance of maximum likelihood classification (MLC), artificial neural networks (ANN) and discriminant analysis (DA) based on Landsat7 ETM+ spectral data in combination with topographic measures and NDVI. ANN produced high accuracies of more than 75% also with limited input information, while MLC and DA produced comparable results only by incorporating ancillary data into the classification process. The superiority of ANN classification was less pronounced on the level of the single land cover classes. The use of ancillary data generally increased classification accuracy and showed a similar potential for increasing classification accuracy than the selection of the classifier. Therefore, a stronger focus on the development of appropriate and optimised sets of input variables is suggested. Also the definition and selection of land cover classes has shown to be crucial and not to be simply adaptable from existing land cover class schemes. A stronger research focus towards discriminating land cover classes by their typical spectral, topographic or seasonal properties is therefore suggested to advance image classification.

Introduction

Detailed and accurate land cover data are among the most crucial information that are required for large-scale environmental research. The knowledge of the spatial configuration of the Earth's surface is the key for assessing habitat distribution, landscape composition or land use changes and is an essential requirement for landscape modelling and scenario building, particularly in times of global change. The suitability of remote sensing for acquiring land cover data has long been recognised, but the process of generating land cover information from remotely sensed data is still far from being standardised or optimised (Foody, 2002, Lu and Weng, 2007). An extensive variety of multi-spectral image classification methods have been developed, which were recently reviewed by Lu and Weng (2007), though none of the developed classifiers is described as inherently superior to any other, as their performance largely depends on the kind and quality of the input data for the classification and the desired output. Even unsupervised ISODATA classification has been used successfully, for example to extract specific, spectrally distinct features such as forests, fire scars, coastlines or urban areas (Ekercin, 2007, Heinl et al., 2006, Kaya and Curran, 2006, Souza et al., 2003). However, for obtaining thematic land cover data, supervised classification is to be preferred in most cases (Foody, 2001, Jensen, 2005, Kavzoglu, 2009), as desired output classes are already pre-defined and post-classification analyses and class aggregations are not necessarily required. Especially the use of advanced approaches such as artificial neural networks, fuzzy-sets or support vector machines produced levels of accuracy higher than, e.g. the popular maximum likelihood classifier or discriminant analysis (Berberoglu et al., 2007, Dixon and Candade, 2008, Jensen, 2005, Kavzoglu and Mather, 2003, Kavzoglu and Reis, 2008, Pal and Mather, 2005). But only few specific comparisons have been published (Berberoglu et al., 2007, Hardin, 2000, Kavzoglu and Reis, 2008, Paola and Schowengerdt, 1995, Zhang et al., 2007), usually documenting a superiority of the advanced approaches, but also suggesting maximum likelihood classification as better alternative (Carvalho et al., 2004). The use of different numbers and types of land cover classes and sample sizes complicates a quantitative comparison of the results. And despite the often documented inferiority in classification success, maximum likelihood classification is still one of the most widely used classification algorithms (Jensen, 2005), most likely also due to advantages in data handling and processing times (Paola and Schowengerdt, 1995). Therefore, many applied landscape-scale studies and land use/land cover research rely on these standard classification approaches (Brandt and Townsend, 2006, Cushman and Wallin, 2000, Jianchu et al., 2005, Joy et al., 2003, Ruiz-Luna and Berlanga-Robles, 2003). In contrast, advanced approaches are primarily limited to methodological studies for optimising the classification process, often using only very limited sample sizes (Fassnacht et al., 2006, Foody, 2001, Kavzoglu and Mather, 2003, Kavzoglu and Reis, 2008, Ouyang and Ma, 2006, Paola and Schowengerdt, 1997, Yemefack et al., 2006).

Besides the type of image classifier, the use of ancillary data is recognised as being crucial for the performance of image classification. Ancillary data have been used successfully to improve image classification, especially by including topographic measures, NDVI or texture measures in the classification process additionally to the spectral information for separating features with similar spectral properties (Berberoglu et al., 2007, Carpenter et al., 1999, Giannetti et al., 2001, Islam et al., 2008, Joy et al., 2003, Kozak et al., 2008, Lu and Weng, 2007, Saadat et al., 2008, Watanachaturaporn et al., 2008).

Despite extensive research on classifiers and ancillary data since decades, comparisons and applications of image classifiers using standardised samples on landscape-scale are largely missing (Lu and Weng, 2007). To overcome this discrepancy, the present study was conducted mutually both on the performance of different classifiers and on the importance of ancillary data for landscape-scale land cover assessments using pre-defined land cover classes. The present study investigates therefore the effect of a variety of selected and widely accessible input variables and classifiers on classification accuracy overall and on the level of specific land cover classes, and assesses and especially quantifies the importance of these components in image classification. We hypothesize that advanced classification approaches achieve higher overall accuracies compared to standard classifiers with little or no ancillary data, while incorporating ancillary data reduces the importance of the type of classifier. Specifically compared are the performance of maximum likelihood classification, discriminant analysis and artificial neural networks, covering presumably the most widely used hard classifiers and representing parametric and non-parametric classifiers. Ancillary data in the form of topographic measures and NDVI were incorporated step-wise into the classification to document the relevance of these input data. Classification results on the level of land cover classes are discussed in the context of reference data selection and land cover class definition.

Section snippets

Spectral data properties and study region

The spectral information for the image classification was acquired by the Landsat7 ETM+ sensor (path193/row027) on 13 September 1999. The imagery was provided by the Global Land Cover Facility (GLCF) (www.landcover.org) as orthorectified GeoCover data set in GeoTIFF format with UTM projection (UTM 32N), WGS-84 datum, and 28.5 m pixel size. The six bands representing the visible and infrared spectrum (ETM+ bands 1–5, 7) were used in the study. The scene was cut to 1650 × 3300 pixels to fit to the

Overall classification accuracy related to classifiers and input variables

The classifications by DA and MLC produced very similar overall accuracies for all input combinations. Accuracies were in the range of 55–60% for using only spectral data (ETM) as input variables and reached about 75% when ancillary data were included (Fig. 2). The classifications using ANN produced higher overall accuracies for all input combinations compared to MLC and DA, reaching about 75% for using only spectral data (ETM) and 85% with ancillary data. Maximum overall classification

The relevance of input variables and classifiers for image classification accuracy

Spectral data, topographic measures and NDVI data were used to test their performance in image classifications by maximum likelihood classification (MLC), discriminant analysis (DA) and artificial neural networks (ANN). The use of ancillary data significantly improved the classification accuracy for the present data set compared to using spectral data (ETM) only. These increases in overall accuracy were observed independent of the classifier. Especially incorporating topographic information

Conclusion

The comparison of the performance of MLC, DA and ANN in image classification revealed advantages of ANN classifications in image accuracy overall and for single land cover classes. The incorporation of ancillary data into the classification process clearly increased classification accuracy overall and on the level of single land cover classes, independent of the used classifier. However, ANN produced high accuracies also with limited input information, while MLC and DA produced comparable

Acknowledgements

The research was kindly supported by the University of Innsbruck Vice Rectorate for Research and the European Academy Bolzano (EURAC). The authors thank two anonymous reviewers for their valuable comments and suggestions.

References (57)

  • X. Jianchu et al.

    Exploring the spatial and temporal dynamics of land use in Xizhuang watershed of Yunnan, southwest China

    International Journal of Applied Earth Observation and Geoinformation

    (2005)
  • T. Kavzoglu

    Increasing the accuracy of neural network classification using refined training data

    Environmental Modelling & Software

    (2009)
  • S. Kaya et al.

    Monitoring urban growth on the European side of the Istanbul metropolitan area: A case study

    International Journal of Applied Earth Observation and Geoinformation

    (2006)
  • J. Kozak et al.

    European forest cover mapping with high resolution satellite data: the Carpathians case study

    International Journal of Applied Earth Observation and Geoinformation

    (2008)
  • S.K. Langley et al.

    A comparison of single date and multitemporal satellite image classifications in a semi-arid grassland

    Journal of Arid Environments

    (2001)
  • R. Latifovic et al.

    Accuracy assessment using sub-pixel fractional error matrices of global land cover products derived from satellite data

    Remote Sensing of Environment

    (2004)
  • J.F. Mas

    Mapping land use/cover in a tropical coastal area using satellite sensor data, GIS and artificial neural networks

    Estuarine Coastal and Shelf Science

    (2004)
  • D.M. Muchoney et al.

    Pixel- and site-based calibration and validation methods for evaluating supervised classification of remotely sensed data

    Remote Sensing of Environment

    (2002)
  • H. Saadat et al.

    Landform classification from a digital elevation model and satellite imagery

    Geomorphology

    (2008)
  • D.P. Shrestha et al.

    Land use classification in mountainous areas: integration of image processing, digital elevation data and field knowledge (application to Nepal)

    International Journal of Applied Earth Observation and Geoinformation

    (2001)
  • C. Souza et al.

    Mapping forest degradation in Eastern Amazon from SPOT4 through spectral mixture models

    Remote Sensing of Environment

    (2003)
  • M. Yemefack et al.

    Investigating relationships between Landsat-7 ETM+ data and spatial segregation of LULC types under shifting agriculture in southern Cameroon

    International Journal of Applied Earth Observation and Geoinformation

    (2006)
  • C. Bishop

    Neural Networks for Pattern Recognition

    (1996)
  • Bossard, M., Feranec, J., Otahel, J., 2000. CORINE Land Cover Technical Guide—Addendum 2000. Technical Report No 40...
  • J.S. Brandt et al.

    Land use–land cover conversion, regeneration and degradation in the high elevation Bolivian Andes

    Landscape Ecology

    (2006)
  • L. Capao et al.

    An approach for land cover mapping with multi-temporal MERIS imagery

  • S.A. Cushman et al.

    Rates and patterns of landscape change in the Central Sikhote-alin Mountains, Russian Far East

    Landscape Ecology

    (2000)
  • R.S. Defries et al.

    NDVI-derived land-cover classifications at a global-scale

    International Journal of Remote Sensing

    (1994)
  • Cited by (42)

    • Land cover classification in an era of big and open data: Optimizing localized implementation and training data selection to improve mapping outcomes

      2022, Remote Sensing of Environment
      Citation Excerpt :

      Ancillary datasets are a relevant information source to characterize the physical geographic context and establish the relationship of various land cover types to particular environmental conditions (Yang et al., 2018). Ancillary data provide descriptive information on factors such as topographic characteristics (derived from digital elevation models, DEM), climate, and hydrological conditions, which can enhance large-area land cover classification models (Amatulli et al., 2018; Hurskainen et al., 2019), allowing for the separation of classes with similar spectral characteristics (Heinl et al., 2009). Besides elevation data and related derivatives (Franklin, 2020), an example of a unique and underutilized ancillary data source for land cover classification is lidar.

    • Value of dimensionality reduction for crop differentiation with multi-temporal imagery and machine learning

      2017, Computers and Electronics in Agriculture
      Citation Excerpt :

      Nevertheless, the value of multi-temporal data for crop discrimination has been demonstrated by Wardlow et al. (2007), Ozelkan et al. (2015), Zheng et al. (2015). Multi-temporal data allows for the generation of a large number of features (variables) for each image acquisition date, which has been shown to substantially improve results (Heinl et al., 2009). However, the use of multi-temporal data often leads to very high feature counts (Lu and Weng, 2007; Heinl et al., 2009).

    • Integrating rapideye and ancillary data to map alpine habitats in South Tyrol, Italy

      2015, International Journal of Applied Earth Observation and Geoinformation
      Citation Excerpt :

      However, SVM have not been fully tested in mapping vegetation in alpine regions. Ancillary data such as topographic parameters have proven useful for land cover mapping (Heinl et al., 2009; Fan, 2013) especially in alpine regions where vegetation types are closely related to topographic relief (Hoersch et al., 2002; Schirpke et al., 2012). Moreover, texture features can increase the classification accuracy for heterogeneous land cover compositions (Franklin et al., 2000; Rodriguez-Galiano et al., 2012; Paneque-Gálvez et al., 2013) and classifications in mountainous areas (Hurni et al., 2013) such as the alpine landcaspe since they can represent vegetation patterns and capture differences between classes (Corbane et al., 2013).

    View all citing articles on Scopus
    View full text