Using multiple Landsat scenes in an ensemble classifier reduces classification error in a stable nearshore environment

doi:10.1016/j.jag.2013.11.015

International Journal of Applied Earth Observation and Geoinformation

Volume 28, May 2014, Pages 90-101

https://doi.org/10.1016/j.jag.2013.11.015 Get rights and content

Highlights

•
A multi-scene ensemble classifier outperforms single-scene classifications.
•
Complete coverage of the study area is achieved from multiple cloudy scenes.
•
Random forest outperforms other classification algorithms.
•
Balancing of the training data reduces classification bias.

Abstract

Medium-scale land cover maps are traditionally created on the basis of a single cloud-free satellite scene, leaving information present in other scenes unused. Using 1309 field observations and 20 cloud- and error-affected Landsat scenes covering Zanzibar Island, this study demonstrates that the use of multiple scenes can both allow complete coverage of the study area in the absence of cloud-free scenes and obtain substantially improved classification accuracy. Automated processing of individual scenes includes derivation of spectral features for use in classification, identification of clouds, shadows and the land/water boundary, and random forest-based land cover classification. An ensemble classifier is then created from the single-scene classifications by voting. The accuracy achieved by the ensemble classifier is 70.4%, compared to an average of 62.9% for the individual scenes, and the ensemble classifier achieves complete coverage of the study area while the maximum coverage for a single scene is 1209 of the 1309 field sites. Given the free availability of Landsat data, these results should encourage increased use of multiple scenes in land cover classification and reduced reliance on the traditional single-scene methodology.

Introduction

Land cover mapping is one of the most common applications of remote sensing. In its supervised form, field observations are used to train a classifier to predict the land cover of an area from its spectral radiance or reflectance, texture (Purkis et al., 2006) and, in object-based classification, the shape, size and context of image segments (Blaschke, 2010). These values are typically derived from a single scene of remote sensing data, or from a combination of scenes acquired from different sensors (data fusion, e.g. Bejarano et al., 2010). For tropical nearshore areas, the focus of this study, supervised classification has been used extensively with passive optical satellite and airborne data to map both terrestrial and submerged areas, including ecologically and economically important mangrove forests, seagrass beds, and coral reefs (Green et al., 2000). Submerged but optically shallow areas have typically been defined on the basis of either reef geomorphology (Purkis et al., 2010, Andréfouët and Guzman, 2005, Smith et al., 1975, Suzuki et al., 2001) or dominant substrate (Green et al., 1996, Ahmad and Neil, 1994, Purkis et al., 2002), and the relationship between sensor spatial and spectral properties, the number of classes mapped, and the achievable map accuracy has been explored for commonly used satellite sensors as well as airborne hyperspectral instruments (Mumby et al., 1997). Despite difficulties in comparing results across the relevant spatial resolutions (Capolsini et al., 2003), collectively results suggest that availability of one or more bands operating in the 400–500 nm wavelength range improves accuracy (Hedley et al., 2012), high spatial resolution improves accuracy (Andréfouët et al., 2003, Mumby and Edwards, 2002), high spectral resolution improves accuracy when relatively few classes are mapped (Capolsini et al., 2003), and airborne hyperspectral sensors, which combine very high spatial and spectral resolution, consistently outperform satellite sensors (Mumby et al., 1997, Kutser et al., 2003, Botha et al., 2013). Many studies have adopted the maximum likelihood algorithm as a standard supervised per-pixel classifier, and rigorous comparisons of a wider range of per-pixel classifiers, as has been carried out elsewhere (Pal and Mather, 2005, Pal, 2005, Brenning, 2009, Duro et al., 2012), has not been performed for tropical nearshore areas. However, it has been demonstrated that the use of contextual editing can improve classification accuracy by providing additional information necessary to separate spectrally similar substrate types, either by re-classing areas in an initial classification result (Benfield et al., 2007), or through pre-segmentation (e.g. by geomorphologic zone) and independent classification of individual segments (Andréfouët et al., 2003, Suzuki et al., 2001). Several studies have also compared the obtainable accuracies between a per-pixel classifier and Object-Based Image Analysis (OBIA) approaches and concluded, despite problems associated with direct comparisons, that OBIA produces substantially more accurate map products (Phinn et al., 2012, Benfield et al., 2007, Leon and Woodroffe, 2011).

High-resolution multispectral satellite sensors (IKONOS, QuickBird, WorldView-2) combine the presence of one or more bands in the 400–500 nm wavelength range with high spatial resolution while being more affordable than airborne hyperspectral data, and these sensors have therefore increasingly been used for mapping of small tropical nearshore areas. However, with a few exceptions (Purkis et al., 2010, Rowlands et al., 2012) Landsat TM/ETM+ data have remained the mainstay of large-scale mapping efforts (UNEP/WCMC, 2010, Andréfouët et al., 2006) due to the large extent of their scenes and their free availability. For the same reasons, the Landsat archive is still the best source of remote sensing data in parts of the world where at-cost satellite data are not a feasible option for mapping projects. However, use of Landsat data for mapping large tropical nearshore areas is challenging for a number of reasons. Data availability is limited because Landsat 5 TM has not produced data for large parts of the world since 2011, Landsat 7 ETM+’s scan line corrector (SLC) malfunctioned on May 31, 2003, after which approximately 22% of the data from each ETM+ scene has been missing, and Landsat 8 has only been producing data since April 11, 2013. Although some continuous coverage has thus been provided, Landsat 7 ETM+ with its SLC malfunction has been the only source of Landsat data for a significant period of time. The frequent presence of clouds in the tropics further limits the availability of useful Landsat scenes, and identification of a single appropriate Landsat scene for a large area of interest may therefore be difficult or impossible. For example, not a single Landsat TM/ETM+ scene exists for the area used in this study (Zanzibar Island, Path 166 Row 64), in which the entire nearshore area is free of clouds. This situation is not unique to Zanzibar, and is problematic because the methodology used in the large majority of studies mapping tropical nearshore environments is based on the presence of a single cloud-free scene covering the entire study area. This methodology can be appropriate for small-scale studies for which cloud-free scenes can be obtained (Andréfouët, 2008), but fails with increasing frequency as the size of the study area increases. For remote sensing to realize its full potential in mapping tropical nearshore areas, an alternative methodology that allows utilization of cloud- and error-affected scenes is therefore needed.

For regional- and global-scale terrestrial land cover mapping with coarse-resolution satellite data, the presence of clouds is dealt with by combining cloud- and shadow-free pixels from several scenes to create a cloud-free composite, which then forms the basis for land cover classification (Eva et al., 2004, Latifovic et al., 2004, Bartalev et al., 2003). The success of compositing is heavily dependent on the identification of an appropriate compositing algorithm (Cihlar et al., 1994) used to identify the best pixel from two or more candidate scenes, however, no effective compositing criteria exist for nearshore environments so an alternative is necessary. Using two Landsat scenes, Leon and Woodroffe (2011) modified the compositing approach by identifying clouds and cloud shadows in a master scene and then replaced the affected pixels with those from a second scene to create a cloud-free composite scene. In addition to accurate cloud- and shadow-detection, the success of this approach depends on effective radiometric normalization, which is likely to be difficult for nearshore areas because these typically comprise a small part of the scene for which normalization statistics are derived and applied, and because differences in tidal stage, turbidity and sea state between scene acquisitions can produce large effects on the observed reflectance spectrum. The presence of residual radiometric effects inherited from individual scenes will reduce accuracy when a single classification algorithm is applied to the composite scene, although the use of OBIA approaches may be effective in reducing this effect (Zhou et al., 2009). More importantly, the use of a single cloud-free composite does not utilize the information available in the unused but cloud-free pixels. In this study we develop and test an alternative approach: the use of an ensemble classifier that relies on classification results from multiple cloud- and error-contaminated Landsat scenes, all acquired within a period of time in which relevant land cover change can be assumed negligible. We compare a range of statistical and machine-learning classification algorithms, assess the number of scenes needed to optimize the overall accuracy classification accuracy, and assess the influence of cloud shadow detection, balancing of the training data, and use of distance-based contextual features on classification accuracy. We demonstrate how the use of multiple Landsat scenes in an ensemble classifier can achieve both a complete coverage of the study area and substantially improve classification accuracy compared to the traditional single-scene approach.

Section snippets

Study area

Zanzibar Island (Unguja) is located in the Western Indian Ocean, 30 km from the Tanzanian mainland, and is surrounded by numerous islets (Fig. 1). Its climate is tropical with two main seasons characterized by northern winds and high temperatures from November to March and southern winds and cooler temperatures from June to October (Ngoile, 1990, Ngusaru, 2002). The tidal range varies between approximately 1.5 m (neap tide) and 4.5 m (spring tide). The nearshore environment around Zanzibar

Methods

The data used in this study consist of 1309 georeferenced benthic substrate data points and 20 Landsat scenes. The field observations (Section 3.1) were split into nine classes according to the dominant substrate at each location. The Landsat data (Section 3.2) were first processed to derive 16 spectral features, which were then used to produce maps of the nine substrate classes using both the traditional single-scene classifiers (Section 3.3) and an ensemble classifier created from the

Classifier comparison

Results from the comparison of the four classifiers, based on valid (cloud- and error-free) validation sites with and without removal of sites affected by tidal mismatch, are shown in Table 3. The random forest classifier achieved the highest overall accuracies for both the individual scenes and the ensemble for both calculations, and was therefore used for the remaining analysis. The removal of sites subject to tidal mismatch increased the accuracy estimate for both the single-scene and

Discussion

The traditional approach to classification, using field observations and a single satellite scene, relies on identification and use of an ‘optimal’ scene while leaving the information available in other scenes unused. For nearshore environments, characteristics of an ‘optimal’ scene will typically include that it is acquired at (spring) low tide, has no clouds, haze or sunglint, and has calm seas and high optical water quality, all uniformly throughout the scene. Although Landsat scenes that

Conclusion

This study demonstrated that multiple sub-optimal Landsat scenes can be used in an ensemble classifier to produce a single land cover map with both complete coverage (not achieved by any of the individual scenes) and improved accuracy (70.4 ± 2.4% for the ensemble vs. 62.9 ± 3.5% for the individual scenes). The ensemble classifier relied on automated processing of individual Landsat scenes, which included calculation of exo-atmospheric reflectance, brightness temperature, band ratios,

Acknowledgements

The study was financially supported by SIDA (Swedish International Development Cooperation Agency). Administrative assistance was provided by the Institute of Marine Science, University of Dar es Salaam, Tanzania.

References (74)

S. Andréfouët et al.
Multi-site evaluation of IKONOS data for classification of tropical coral reef environments
Remote Sensing of Environment
(2003)
S. Bejarano et al.
Combining optical and acoustic data to enhance the detection of Caribbean forereef habitats
Remote Sensing of Environment
(2010)
T. Blaschke
Object based image analysis for remote sensing
ISPRS Journal of Photogrammetry and Remote Sensing
(2010)
E.J. Botha et al.
Increased spectral resolution enhances coral detection under varying water conditions
Remote Sensing of Environment
(2013)
A. Brenning
Benchmarking classifiers to optimally integrate terrain analysis and multispectral remote sensing in automatic rock glacier detection
Remote Sensing of Environment
(2009)
D.C. Duro et al.
A comparison of pixel-based and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery
Remote Sensing of Environment
(2012)
M. Gullström et al.
Assessment of changes in the seagrass-dominated submerged vegetation of tropical Chwaka Bay (Zanzibar) using satellite remote sensing
Estuarine, Coastal and Shelf Science
(2006)
J. Hedley et al.
Capability of the Sentinel 2 mission for tropical coral reef mapping and coral bleaching detection
Remote Sensing of Environment
(2012)
A. Knudby et al.
Predictive mapping of reef fish species richness, diversity and biomass in Zanzibar using IKONOS imagery and machine-learning techniques
Remote Sensing of Environment
(2010)
A. Knudby et al.
A cloud detection algorithm for AATSR data, optimized for daytime observations in Canada
Remote Sensing of Environment
(2011)

A. Knudby et al.

Simple and effective monitoring of historic changes in nearshore environments using the free archive of Landsat imagery

International Journal of Applied Earth Observation and Geoinformation

(2010)

K.E. Kohler et al.

Coral Point Count with Excel extensions (CPCe): a Visual Basic program for the determination of coral and substrate coverage using random point count methodology

Computers & Geosciences

(2006)

R. Latifovic et al.

Land cover mapping of north and central America—Global Land Cover 2000

Remote Sensing of Environment

(2004)

Y. Luo et al.

Developing clear-sky, cloud and cloud shadow mask for producing clear-sky composites at 250-meter spatial resolution for the seven MODIS land bands over Canada and North America

Remote Sensing of Environment

(2008)

G. Mountrakis et al.

Support vector machines in remote sensing: a review

ISPRS Journal of Photogrammetry and Remote Sensing

(2011)

P. Mumby et al.

Mapping marine environments with IKONOS imagery: enhanced spatial resolution can deliver greater thematic accuracy

Remote Sensing of Environment

(2002)

A.W. Mwandya et al.

Spatial and seasonal variations of fish assemblages in mangrove creek systems in Zanzibar (Tanzania)

Estuarine Coastal and Shelf Science

(2010)

S. Purkis et al.

Enhanced detection of the coral Acropora cervicornis from satellite imagery using a textural operator

Remote Sensing of Environment

(2006)

G. Rowlands et al.

Satellite imaging coral reef resilience at regional scale. A case-study from Saudi Arabia

Marine Pollution Bulletin

(2012)

J.R. Schott et al.

Radiometric scene normalization using Pseudoinvariant features

Remote Sensing of Environment

(1988)

J. van Hulse et al.

Knowledge discovery from imbalanced and noisy data

Data & Knowledge Engineering

(2009)

W. Zhou et al.

Object-based land cover classification of shaded areas in high spatial resolution imagery of urban areas: a comparison study

Remote Sensing of Environment

(2009)

W. Ahmad et al.

An evaluation of Landsat Thematic Mapper (TM) digital data for discriminating coral reef zonation: Heron Reef (GBR)

International Journal of Remote Sensing

(1994)

S. Andréfouët

Coral reef habitat mapping using remote sensing: a user vs. producer perspective. Implications for research, management and capacity building

Journal of Spatial Science

(2008)

S. Andréfouët et al.

Coral reef distribution, status and geomorphology–biodiversity relationship in Kuna Yala (San Blas) archipelago, Caribbean Panama

Coral Reefs

(2005)

S. Andréfouët et al.

Global assessment of modern coral reef extent and diversity for regional science and management applications: a view from space

S. Bartalev et al.

A new SPOT4-VEGETATION derived land cover map of Northern Eurasia

International Journal of Remote Sensing

(2003)

S.L. Benfield et al.

Mapping the distribution of coral reefs and associated sublittoral habitats in Pacific Panama: a comparison of optical satellite sensors and classification methodologies

International Journal of Remote Sensing

(2007)

L. Breiman

Random forests

Machine Learning

(2001)

P. Capolsini et al.

A comparison of Landsat ETM+, SPOT HRV, IKONOS, ASTER, and airborne MASTER data for coral reef habitat mapping in South Pacific islands

Canadian Journal of Remote Sensing

(2003)

J. Cihlar et al.

Evaluation of compositing algorithms for AVHRR data over land

IEEE Transactions on Geoscience and Remote Sensing

(1994)

C. Cortes et al.

Support-vector networks

Machine Learning

(1995)

S. Džeroski et al.

Is combining classifiers with stacking better than selecting the best one?

Machine Learning

(2004)

B. Efron et al.

An Introduction to the Bootstrap

(1993)

B. Efron et al.

A Leisurely look at the bootstrap, the Jackknife, and cross-validation

American Statistician

(1983)

H. Eva et al.

A land cover map of South America

Global Change Biology

(2004)

A.E. Gill

Atmosphere–Ocean Dynamics

(1982)

Cited by (27)

Coastal aquaculture in Zanzibar, Tanzania
2022, Aquaculture
This study provides an overview of the multi-sectoral coastal aquaculture development in Zanzibar (Tanzania) over the last thirty years based on empirical evidence from interviews, field observations, policy reports and literature reviews. Despite the immense potential of aquaculture for food and livelihoods, only seaweed farming has so far established into commercial-scale production. This activity is dominated by women and became widespread in the early 1990s as a small but regular source of income. However, seaweed farming constraints such as frequent seaweed die-offs, as well as economic and institutional constraints inhibit its development. Other types of aquaculture activities such as fish farming, mud crab fattening, half-pearl farming, sea cucumber farming and sponge and coral cultures are under development with limited production or in experimental stages. Common constraints among these activities are economic limitations, lack of technical infrastructure and skills, small and irregular production, and limited trade and market availabilities. At the same time, there is a lack of sufficient management and monitoring systems, while there are no formal regulations or clear strategies to boost aquaculture at the national level. In addition, new aquaculture initiatives are often dominated by donor-driven projects instead of local entrepreneurships. This situation does not encourage engagement in aquaculture and thus such activities are outcompeted by other already established sectors (e.g. agriculture and fisheries). We conclude that aquaculture has great potential to evolve due to high environmental capacity. Nevertheless, achieving profitable production and a stronger commitment within local communities, as well as developing effective mariculture governance through support mechanisms and clear strategies to boost the sector at the national level, are essential for sustainable mariculture development in Zanzibar.
A validated ensemble method for multinomial land-cover classification
2020, Ecological Informatics
Citation Excerpt :
Cross-validation is a useful and important tool in model-selection for machine learning, but it has its drawbacks, especially regarding its limited capacity to ‘stress test’ a model with novel data (Rao et al., 2008; Zhu and Rohwer, 1996). The highest overall accuracy was achieved by different algorithms in different cases while the weighted ensemble consistently achieved accuracy equivalent to the best performing base algorithm in all cases except one, making using it more effective than selecting a single algorithm, similar to what was reported by other authors (Foody et al., 2007; Knudby et al., 2014; Man et al., 2018). A benefit of model averaging is the incorporation of the uncertainty in model selection rather than completely ignoring it (Liang et al., 2011).
Land-cover data provides valuable information for landscape management and can be generated using machine learning algorithms. Ensemble models or model averaging can overcome difficulties in selecting an adequate algorithm and improve model predictions, but its use is limited among ecologists. The objective of this study is to highlight the benefits and limitations of weighted and unweighted majority voting ensemble models for land-cover classification and to enable easy and wider implementation of the method by providing an R-script (for use in the R software). Using a case study of three mixed-use landscapes from southern Australia (Tasmania), land cover was classified into six classes using Landsat 8 imagery and ancillary data, and support vector machine, random forest, k-nearest neighbour and naïve Bayesian as base algorithms. The predicted classifications of the base algorithms were then averaged using both an unweighted and weighted (using the true skill statistic) majority voting ensemble algorithm. Cross-validation results showed the base algorithms achieved similar accuracy making algorithm selection difficult. The base algorithms achieved high and similar predictive accuracy when the classified land-cover and training data belong to the same geographic region but lower and different predictive accuracy when the classified land-cover and training data belong to different geographic regions. The weighted and unweighted ensemble achieved similar overall accuracy, equivalent to the best performing base algorithm. We conclude that the majority voting ensemble can be adopted to overcome difficulties in model selection during land-cover classification.
Landsat study of deforestation in the Amazon region of Colombia: Departments of Caquetá and Putumayo
2018, Remote Sensing Applications: Society and Environment
Citation Excerpt :
The prevalence of clouds in the tropics makes data selection challenging regardless of satellite data resolution. Hence, finding an appropriate Landsat scene for a large area of interest (AOI) is especially difficult and sometimes impossible (Knudby et al., 2014). We were able to find five suitable pairs of Landsat satellite scenes covering the study area, courtesy of the U.S. Geological Survey, covering the period from 2000 to 2016.
The Amazon rainforest, the largest in the world in terms of size and diversity, is recognized as a source of ecological services for both local and worldwide communities. Ten percent of its territory belongs to Colombia, covering approximately 35% of the country's total area. In spite of global efforts, it continues to be vulnerable to deforestation pressures. In the departments of Caquetá and Putumayo, human activities such as logging and cattle ranching are the main causes of deforestation, which is increasing the department's vulnerability to climate change and natural hazards. However, there is a lack of consistent quantitative monitoring in the Colombian Amazon region, not only of total deforestation rates but also of specific locations and causes of deforestation, despite its high rates of deforestation compared to neighboring Ecuador. In this study, Landsat 7 ETM+ and Landsat 8 images over a sixteen-year period were used to map recent changes in land cover in the departments affected by deforestation. Supervised classifications were made using the Maximum Likelihood Classifier (MLC) and expert classification system, using False Color IR composites which were effective for identifying vegetation land cover. Over sixteen years, the Tropical Rainforest land cover in the study region lost 5.2% of its area, corresponding to 3020.56 square kilometers. Overall, the study region shows an annual rate of deforestation of 0.46%, and for Caquetá alone that rate is 0.77% - about twice as high as estimates from previous studies of between 0.38% and 0.4% loss rates for tropical South America. No single factor driving deforestation was found; rather, different regions within the study area displayed different rates and causes. The particularly rapid deforestation in the department of Caquetá and the encroachment of agricultural activities into national park reserves may be explained by higher colonization pressures and intensification of illegal coca cultivation.
Earth observation for ecosystem accounting: spatially explicit national seagrass extent and carbon stock in Kenya, Tanzania, Mozambique and Madagascar
2022, Remote Sensing in Ecology and Conservation
Large-Scale High-Resolution Coastal Mangrove Forests Mapping Across West Africa With Machine Learning Ensemble and Satellite Big Data
2021, Frontiers in Earth Science
Branching Algorithm to Identify Bottom Habitat in the Optically Complex Coastal Waters of Atlantic Canada Using Sentinel-2 Satellite Imagery
2020, Frontiers in Environmental Science

View all citing articles on Scopus

View full text

Using multiple Landsat scenes in an ensemble classifier reduces classification error in a stable nearshore environment

Highlights

Abstract

Introduction

Section snippets

Study area

Methods

Classifier comparison

Discussion

Conclusion

Acknowledgements

Remote Sensing of Environment

Remote Sensing of Environment

ISPRS Journal of Photogrammetry and Remote Sensing

Remote Sensing of Environment

Remote Sensing of Environment

Remote Sensing of Environment

Estuarine, Coastal and Shelf Science

Remote Sensing of Environment

Remote Sensing of Environment

Remote Sensing of Environment

International Journal of Applied Earth Observation and Geoinformation

Computers & Geosciences

Remote Sensing of Environment

Remote Sensing of Environment

ISPRS Journal of Photogrammetry and Remote Sensing

Remote Sensing of Environment

Estuarine Coastal and Shelf Science

Remote Sensing of Environment

Marine Pollution Bulletin

Remote Sensing of Environment

Data & Knowledge Engineering

Remote Sensing of Environment

An evaluation of Landsat Thematic Mapper (TM) digital data for discriminating coral reef zonation: Heron Reef (GBR)

International Journal of Remote Sensing

Coral reef habitat mapping using remote sensing: a user vs. producer perspective. Implications for research, management and capacity building

Journal of Spatial Science

Coral reef distribution, status and geomorphology–biodiversity relationship in Kuna Yala (San Blas) archipelago, Caribbean Panama

Coral Reefs

Global assessment of modern coral reef extent and diversity for regional science and management applications: a view from space

A new SPOT4-VEGETATION derived land cover map of Northern Eurasia

International Journal of Remote Sensing

Mapping the distribution of coral reefs and associated sublittoral habitats in Pacific Panama: a comparison of optical satellite sensors and classification methodologies

International Journal of Remote Sensing

Random forests

Machine Learning

A comparison of Landsat ETM+, SPOT HRV, IKONOS, ASTER, and airborne MASTER data for coral reef habitat mapping in South Pacific islands

Canadian Journal of Remote Sensing

Evaluation of compositing algorithms for AVHRR data over land

IEEE Transactions on Geoscience and Remote Sensing

Support-vector networks

Machine Learning

Is combining classifiers with stacking better than selecting the best one?

Machine Learning

An Introduction to the Bootstrap

A Leisurely look at the bootstrap, the Jackknife, and cross-validation

American Statistician

A land cover map of South America

Global Change Biology

Atmosphere–Ocean Dynamics