Original papers
Error analysis and correction of spatialization of crop yield in China – Different variables scales, partitioning schemes and error correction methods

https://doi.org/10.1016/j.compag.2018.03.031Get rights and content

Highlights

  • We explored the influence of variables scales on precision of crop yield spatialization.

  • We detected the relationship between partitioning schemes and precision of crop yield spatialization.

  • We compared the pros and cons of seven different error correction methods.

Abstract

Spatialization of crop yield is beneficial to comprehensive analysis between interdisciplinary data. Multivariable linear regression models are often applied to spatialization of attribute data. The variables scales and the partitioning of China should be considered when the model is constructed. Different variables scales and partitioning schemes will inevitably results in different spatialization errors. Spatialization errors can be reduced by error correction methods. Different methods have different influence on the accuracy of crop yield spatialization. In this study, three variables scales were selected including prefectural scale, county scale and grid cell (1 km × 1 km). Five partitioning schemes (no partition of China, 7 regions of China, 9 regions of China, 10 regions of China, partitions of China by province) were considered. A total of 28 kinds of multivariable linear regression models were constructed with area of different types of farmland as independent variables, crop yields as dependent variables. Then, seven kinds of error correction methods were used to correct crop yield spatialization results. Three error evaluation indicators were selected to investigate the influence of different variables scales, partitioning schemes and error correction methods on the precision of spatialization results. The conclusions can be drawn as follows: (a) Nine models with intercept based on variables at regional scale could not be used to spatialize crop yield, while the others can be used for spatialization of crop yield. (b) The precision of the spatialization result based on the model without intercept is higher than that based on the model with intercept. (c) For models without intercept, precision of spatialization results increased first and then decreased with the refinement of partitioning scheme. (d) For models without intercept, the precision of spatialization results improved with scaling down of the variables scale from prefectural scale to county scale and grid scale. (e) Among the seven kinds of error correction methods, average correction method, weight coefficient correction methodⅡ and weight coefficient correction method III can’t be used to correct initial spatialization results. (f) Proportional coefficient correction method, weight coefficient correction methodⅠ, weight coefficient correction method Ⅳ and weight coefficient correction method Ⅴ can be used to correct initial results of spatialization. (g) The precisions of corrected spatialization products based on error correction methods, which can improve the precision of initial spatialization products, are very closely. This research made up for the deficiency of spatial error analysis of crop yield, explored the relationship between different sample scales and partitioning schemes and spatial error, compared the pros and cons of different error correction methods. Meanwhile, it also provided valuable information for other types of social and economic statistical data.

Introduction

Given a backdrop of global environmental dynamism and climate change, traditional geo-ecological processes have undergone drastic changes over the past few decades. The geographical processes are no longer simple natural processes, and the researches of ecological processes also are no longer confined to the dynamics and development in ecosystem. The integration and intersection of multiple disciplines is becoming an important characteristic of modern geo-ecological processes (Fu et al., 2006).

It is an important symbol of the combination of human activities and geo-ecological processes to apply statistics to the study of geo-ecological processes. Socio-economic statistics are collected and published based on administrative division. So they have low spatial resolution and lack of the description to spatial distribution characteristics of socioeconomic statistics. It is difficult to use them for comprehensive analysis of socio-economic data and other data in practical application, which limits their application to geographical research to a great extent. There are three major problems. First, the contradiction between the spatial heterogeneity of geographical elements and the homogeneity of statistics in the same administrative division; Second, the disagreement between landscape scale and statistical scale; Third, the statistical indicators in different regions are inconsistent (Liu and Li, 2012). The spatialization of socio-economic statistics can solve the above problems effectively (Liao and Zhang, 2009).

Numerous studies focused on the spatialization of socio-economic statistics, including spatialization of population (Tobler et al., 1995, Tobler et al., 1997, Sutton et al., 2001, Tian et al., 2005) and gross domestic product (GDP) statistics (Ebener et al., 2005, Doll et al., 2006, Sutton et al., 2007, Elvidge et al., 1997, Elvidge et al., 2009a, Elvidge et al., 2009b and Ghosh et al., 2009). With the rapid development of Remote Sensing (RS) and Geographic Information System (GIS) technology, the spatialization of agricultural production data are frequently studied, mainly including spatialization of crop acreage (Qiu et al., 2003, Leff et al., 2004, You and Wood, 2006, You et al., 2009, Monfreda et al., 2008, Khan et al., 2010, Zhang et al., 2013, Jin et al., 2015, Salmon et al., 2015, Liu et al., 2017) and agricultural production inputs (Potter et al., 2010, Sun et al., 2010, Yan and Pan, 2014). However, there are fewer researches on crop yield spatialization. For instance, Shi et al. used the cultivated land data to spatialize maize yield per unit area statistics by multivariable linear regression model, and got a spatial distribution map of maize yield per unit area in Jilin province (Shi et al., 2011). Liu et al. took population density as the dependent variables and crop yield as independent variables to construct a regression model with the support of land use data. The model was then applied to spatialize provincial-level crop yield statistics, resulting in a distribution map of crop yield of China at 1 km by 1 km in 2000 and the precision of crop yield spatialization results were analyzed from provincial scale down to prefectural scale and county scale (Liu and Li, 2012). But few studies explored the influence of variables scales and partitioning schemes on precision of crop yield spatialization.

As one of frequently-used geo-data processing methods, spatialization of attribute data inevitably results in errors during data processing. Spatialization errors can be reduced by correcting initial spatialization results. Many error modifying methods have been used to correct spatialization errors, such as average correction method (Wu et al., 2015), proportional coefficient correction method (Shi et al., 2016), weight coefficient correction method based on the basic idea that different farmland types have the same weight (Liao and Qin, 2014). However, there are few researches about comparing the pros and cons of different error correction methods. So, in this study we will discuss the influence of some new error correction methods on crop output spatialization and compare them with the existing error correction methods to improve spatialization precision.

This study attempts to simulate the spatial distribution of crop yield in China using land use data with the following objectives: (1) exploring the influence of variables scales on precision of crop yield spatialization; (2) detecting the influence of partitioning schemes on precision of crop yield spatialization; and (3) comparing the pros and cons of different error correction methods.

Section snippets

Data sources

Five datasets are used for this study.

  • 1.

    County-level and prefecture-level crop yield statistics of China in 2010. The data come from Statistical Yearbook of China in 2011.

  • 2.

    Land use dataset of China in 2010. The data set is provided by Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (RESDC) (http://www.resdc.cn).

  • 3.

    County-level administrative map of China in 2010. It mostly includes vector data of county-level administrative boundary in China and other attribute data,

Research method

Crop output is proportional to farmland area, and different farmland types have different influence on crop output, and multivariate linear regression analysis method (MLRAM) is the most frequently used method to realize spatialization of attribute data. So, we chose MLRAM to spatialize crop output. Its basic formula is as follows:

Supposing one dependent variable y is affected by k independent variables (x1, x2, …, xk), and there are n groups of observed values (ya, x1a, x2a, …, xka), a = 1, 2, …,

Error correction

Error is the difference between the analog value of model and the actual observation value. The basic formula is as follows:ε=y-yiε represents error, y is a statistics and yi is a analog value.

The purpose of error correction is to improve spatialization precision by assigning errors to initial spatialization results based on some methods, such as average correction method (Wu et al., 2015), proportional coefficient correction method (Shi et al., 2016), weight coefficient correction method based

Conclusions

In this paper, three variables scales including prefectural scale, county scale and grid cell (1 km × 1 km) were selected. Five partitioning schemes (no partition of China, 7 regions of China, 9 regions of China, 10 regions of China, partitions of China by province) were considered. A total of 28 kinds of multivariable linear regression models were constructed with area of different types of farmland as independent variables, crop yields as dependent variables. Then, seven kinds of error

Acknowledgements

This work was supported by the National Key R&D Program of China [Grant number 2016YFA0602702].

References (39)

  • C.D. Elvidge et al.

    Relation between satellite observed visible-near infrared emissions, population, economic activity and electric power consumption

    Int. J. Remote Sens.

    (1997)
  • C.D. Elvidge et al.

    A fifteen year record of global natural gas flaring derived from satellite data

    Energies

    (2009)
  • B. Fu et al.

    Progress and perspective of geographical-ecological processes

    Acta Geogr. Sinica

    (2006)
  • T. Ghosh et al.

    Estimation of Mexico’s Informal Economy and Remittances Using Nighttime Imagery

    Remote Sensing

    (2009)
  • X. Jin et al.

    Farmland dataset reconstruction and farmland change analysis in China during 1661–1985

    J. Geogr. Sci.

    (2015)
  • B. Leff et al.

    Geographic distribution of major crops across the world

    Global Biogeochem. Cycles

    (2004)
  • S. Liao et al.

    A spatialization method for survey data of theoretical stock-carrying capacity of grassland in China and its application

    Geogr. Res.

    (2014)
  • S. Liao et al.

    Study on error evaluating index for spatialisation of attribute data

    J. Geo-inform. Sci.

    (2009)
  • Z. Liu et al.

    Spatial distribution of China crop output based on land use and population density

    Trans. Chinese Soc. Agric. Eng. (Trans. CSAE)

    (2012)
  • Cited by (5)

    • Agri-biomass supply chain optimization in north China: Model development and application

      2022, Energy
      Citation Excerpt :

      Growing numbers of researchers have invested great efforts with the aim of developing advanced technologies that can convert agri-biomass to energy and fuel. These benefits notwithstanding, the use of agri-biomass for energy and fuel comes with high collection and transportation costs and this is because of its intrinsic characteristics, including its low energy density, scattered geographical distribution, and seasonal and weather sensitivity [10–12]. In addition, it should be immediately disposed of as this will make it possible to plant the next season crops after they are produced.

    • Economic analysis of different straw supply modes in China

      2021, Energy
      Citation Excerpt :

      However, China has developed an agricultural mode where individual households are responsible for production. Combining this with the special characteristics of straw, such as low density, and obvious seasonality, there is significant uncertainty in the process of straw collection, storage, and transportation [14–16]. Insufficient supply and high costs are the main obstacle restricting the business utilization of straw.

    • A Hybrid System Based on Dynamic Selection for Time Series Forecasting

      2022, IEEE Transactions on Neural Networks and Learning Systems
    View full text