Methods for improving accuracy and extending results beyond periods covered by traditional ground-truth in remote sensing classification of a complex landscape

https://doi.org/10.1016/j.jag.2015.01.001Get rights and content

Highlights

  • Classified 99% of the complex landscape of western Oregon into 57 landuse categories.

  • Majority-rule within previously defined polygons and over time improved accuracy.

  • Accurate classifications were achieved without current year ground-truth.

  • Infrequently occurring annually disturbed agriculture classes tended to be lost.

  • Synthetic ground-truth generated from 2005–2011 allowed classification of 2004.

Abstract

Successful development of approaches to quantify impacts of diverse landuse and associated agricultural management practices on ecosystem services is frequently limited by lack of historical and contemporary landuse data. We hypothesized that ground truth data from one year could be used to extrapolate previous or future landuse in a complex landscape where cropping systems do not generally change greatly from year to year because the majority of crops are established perennials or the same annual crops grown on the same fields over multiple years. Prior to testing this hypothesis, it was first necessary to classify 57 major landuses in the Willamette Valley of western Oregon from 2005 to 2011 using normal same year ground-truth, elaborating on previously published work and traditional sources such as Cropland Data Layers (CDL) to more fully include minor crops grown in the region. Available remote sensing data included Landsat, MODIS 16-day composites, and National Aerial Imagery Program (NAIP) imagery, all of which were resampled to a common 30 m resolution. The frequent presence of clouds and Landsat7 scan line gaps forced us to conduct of series of separate classifications in each year, which were then merged by choosing whichever classification used the highest number of cloud- and gap-free bands at any given pixel. Procedures adopted to improve accuracy beyond that achieved by maximum likelihood pixel classification included majority-rule reclassification of pixels within 91,442 Common Land Unit (CLU) polygons, smoothing and aggregation of areas outside the CLU polygons, and majority-rule reclassification over time of forest and urban development areas. Final classifications in all seven years separated annually disturbed agriculture, established perennial crops, forest, and urban development from each other at 90 to 95% overall 4-class validation accuracy. In the most successful use of subsequent year ground-truth data to classify prior year landuse, an overall 57-class accuracy of 75% was achieved despite the omission of 10 entire classes, most of which were annually disturbed or perennial crops grown on very few fields. Synthetic ground-truth data for the 2004 harvest year based on the most common landuse classes over the following 7 years classified 49 of 57 categories at an overall accuracy of 96% in a final version that included CLU polygon majority rule, default smoothing and aggregation, and forcing of urban development and forest from multi-year majority-rule.

Introduction

Given current trends in species’ extinction rates, climate change, and challenges in feeding increasing human populations, questions regarding sustainability of agricultural production systems must quickly move from realms dominated by political rhetoric to ones of insightful measurement. Doing so will require collective commitment to a rigorous synthesis of modelling, monitoring, and mitigation efforts. An example of such endeavours is the USDA Conservation Effects Assessment Project (CEAP), which attempts to quantify environmental impacts of government-sponsored conservation programs and on-farm production practices used in modern agriculture (Mausbach and Dedrick, 2004, Schnepf and Cox, 2007, Anonymous, 2009).

Monitoring and modelling of agricultural production in the United States face multiple challenges. Methods must simultaneously account for several major factors: (1) variation in temperature, humidity, precipitation, and wind patterns among the nation’s main agricultural production regions, (2) variation in soil fertility, drainage, water holding capacity, health, and availability of irrigation among production sites within regions, and (3) spatial and temporal differences at sites in the prevalence of insects, weeds, and diseases. Inherent challenges in accounting for these sources of variation in models are compounded by that fact that detailed information on field locations, crop choice, and general agronomic practices employed in production collected by the USDA-National Agricultural Statistics Service (NASS) are viewed as confidential business information by congressional mandate, and are therefore only publicly available in aggregated summary format. Policymakers should greatly benefit from information describing real-world relationships between crop production practices and measurable changes to wildlife diversity and abundance, water quality, and soil health/fertility. Such data would be useful in defining trade-offs among ecosystem services and applying this knowledge to identify optimal sets of practices that retain a healthy environment in the long term without compromising the availability of food and fibre for humankind in the short term.

Increased availability of satellite imagery following the decision by USGS in 2008 to provide free access to the Landsat archives overcame a major hurdle frequently mentioned regarding remote sensing classification, the cost of acquiring the necessary images (Hansen and Loveland, 2012, Lillesand et al., 2004, Lunetta et al., 2006, Thoma et al., 2004, Vogelmann et al., 1998, Zhang et al., 2009, Aplin et al., 1999). This action provoked numerous studies to classify land use within local watersheds utilizing a combination of satellite imagery and available ground-truth survey data, thereby achieving more detailed and accurate classifications (Gitau et al., 2010, Hansen and Loveland, 2012, Julien et al., 2011, Srivastava et al., 2012, Zhu et al., 2012).

In order to maintain a consistent catalogue of crop cover and landuse, Cropland Data Layers (CDL) were created by NASS using confidential farm survey information as ground-truth data in conjunction with a variety of remotely sensed data, including Landsat and MODIS imagery. While CDLs provide some of the needed data for effective monitoring of the impact of agriculture on ecosystem services, their utility is limited by several factors. First, they are not produced annually for all states and crops (e.g., only circa once every 3 year in the Pacific Northwest [PNW]). Second, they omit many details that are critical for evaluating the ecological impact of minor crops. Minor crops receiving very different management inputs (such as tillage or fertilization practices) and producing outputs varying widely in terms of water quality and wildlife habitat suitability (Young, 2008) are often lumped into a single CDL category (e.g., grass seed crops, class 59) (Anonymous, 2007).

Communication with PNW researchers studying phenomenon ranging from pollinator decline to avian diversity revealed strong interests in a more thorough landuse classification scheme, one that would provide nearly complete coverage of the landscape and extend several decades into the past. Subjects already investigated included spatial patterns in clustering of weeds across landscapes and potential locations for biomass to energy conversion plants (Mueller-Warrant et al., 2008, Mueller-Warrant et al., 2010). Other research found significant relationships between percentages of sub-basin areas in particular crops and concentrations of nitrogen and sediment in streams and rivers draining sub-basins in western Oregon (Mueller-Warrant et al., 2012, Nelson, 2005, Nelson et al., 2006, Steiner et al., 2004, Wentz et al., 1998). While these methods applied to a high percentage of the agricultural landscape in Linn County, the heart of Oregon’s grass seed industry, the 16 classes previously used by Mueller-Warrant et al. (2012) failed to include about a third of agricultural landuse in other western Oregon counties. They also did not cover forests or cities, for which static data from the National Land Cover Dataset (NLCD) (Homer et al., 2004) were relied upon.

The ground-truth data on which the 16-category classification for 2005 through 2007 was based were sufficiently detailed to define many additional minor crops grown in the region. Our ground-truth data now extends over a 7-year time period, creating the possibility of nearly complete maps of agriculture, urban development, and forest landuse over multiple years. Given the general stability of urban and forest landuse, we assumed that data from NLCD could provide satisfactory ground-truth for major classes of urban development and forest landuse, supplementing our in-house agricultural ground-truth geographic information system (GIS). These combined sources of ground-truth provide an opportunity for testing a variety of ‘boot-strap’ approaches that might reduce the effort and expense in obtaining adequate quantity and quality of future ground-truth data. The same approaches could also serve as mechanisms to look backwards into time to classify landuse in years for which Landsat data is present but ground-truth data lacking.

Our primary objective was to test the hypothesis that ground-truth data from one year could identify landuses in previous or subsequent years in a complex landscape where a majority of landuse practices do not change from year to year because crops are established perennials or the same annual crops grown in the same fields over multiple years. Before this objective could be tested, it was first necessary to create a detailed, accurate characterization of agricultural production and other landuses in the Willamette Valley of western Oregon, one in which 57 major landuses from 2005 to 2011 were classified using normal same year ground-truth.

Section snippets

Study area

This research was conducted across a 25,303 km2 area of the Willamette River basin and nearby drainages in western Oregon and south-western Washington, bounded on the east, west, and south by consistent footprints of Landsat imagery and on the north by the edge of the greater Portland metropolitan area. The study area included all of the traditional major and minor crop production in the Willamette Valley and much of forests on west slopes of the Cascade and east slopes of the Coastal mountain

Ground-truth data

We used drive-by surveys of agricultural fields in four western Oregon counties during cropping years ending with harvests in July of 2005, 2006, 2007, 2008, 2009, 2010, and 2011 to develop a comprehensive list of crops, stand establishment practices and conditions, and post-harvest residue management methods (Mueller-Warrant et al., 2011 and 2012). Approximately 7100 fields were surveyed in western Oregon counties each growing season for the first four years, and 3200 fields from the 2009

Success in classifying 57 landuses

All 57 desired classes were generated by normal remote sensing classifications in each year except for the 2005 and 2006 harvest years. In the first of those years, spring-plant peas and unidentified grass seed crops (# 41) and new plantings of blueberries, hops, and poplars (# 42) were lost, while in the second year corn or sudangrass (# 27) was lost (Table 1). Training set accuracies ranged from 54.3 to 70.1% and exceeded test validation accuracies by a median value of only 0.4%. In all

Performance of subsequent year information as ground-truth data

Potential concerns over use of shift-year ground-truth data for remote-sensing classification include both loss of individual categories and reduced overall classification accuracy. Because a certain amount of landuse change is inevitable over time, previous or subsequent year ground-truth data will generally contain more errors than would be found in current year ground-truth data. It is possible to measure the accuracy of shift-year classifications against several ground-truth alternatives.

Conclusions

The possibility of using our extensive set of GIS ground-truth data and our multi-year series of successful remote sensing classifications to extend knowledge of where specific crops have been grown retrospectively across the whole period of archived Landsat images is intriguing. If successful, such data could be applied to assist the Soil and Water Assessment Tool (SWAT) in modeling the historical distribution of sediment and nutrients in western Oregon streams and rivers to better understand

Acknowledgements

Contribution of USDA-ARS. The use of trade, firm, or corporation names in this publication (or page) is for the information and convenience of the reader. Such use does not constitute an official endorsement or approval by the United States Department of Agriculture or the Agricultural Research Service of any product or service to the exclusion of others that may be suitable. The authors wish to express their thanks to the USDA Cooperative State Research, Education, and Extension Service for

References (26)

  • T.M. Lillesand et al.

    Remote Sensing and Image Interpretation

    (2004)
  • P.M. Mather

    Computer Processing of Remotely-Sensed Images, An Introduction

    (2004)
  • M.J. Mausbach et al.

    The length we go, measuring environmental benefits of conservation practices

    J. Soil Water Conserv.

    (2004)
  • Cited by (0)

    View full text