Normalization of data for delineating management zones

doi:10.1016/j.compag.2017.10.017

Computers and Electronics in Agriculture

Volume 143, December 2017, Pages 238-248

https://doi.org/10.1016/j.compag.2017.10.017 Get rights and content

Abstract

Management zones (MZs) are a viable economic alternative to variable-rate application (VRA) based on prescription maps; however, unlike the latter, MZs can employ conventional machinery. The use of management zones (MZs) is considered an economically viable alternative because of its low initial cost and high return in economic and environmental benefits. Data clustering techniques and the Fuzzy C-Means algorithm are the most widely used processes for delineating MZs. The most common similarity measurement used is Euclidean distance; however, because the algorithm is sensitive to the range of the input variables, these variables are typically normalized dividing the value by the standard deviation, maximum value, average, or data set range. The objective of this study was to assess the influence of data normalization methods for delineating MZs. The experiment was conducted in three experimental fields with 9.9, 15.0, and 19.8 ha, located in Southern Brazil between 2010 and 2014. The variables used for delineating MZs were selected using spatial correlation statistics and data were normalized using methods of standard score, range, and average. The MZs were delineated using the Fuzzy C-Means algorithm, which created two, three, and four clusters. The normalization methods were evaluated by five indices (modified partition entropy [MPE], fuzziness performance index [FPI], variance reduction [VR], smoothness index [SI], and kappa), and ANOVA. It was found that when the MZs delineation uses more than one variable with different scales in the clustering process using Euclidean distance, normalization is required. The range method was considered the overall best normalization method.

Introduction

The study of the spatial distribution of soil and plant variables is important to the establishment of appropriate management zones (MZs) to be used in application of the fertilizer, soil management, and irrigation. Appropriate MZs may maximize yield, while reducing costs and minimizing potential environmental damage (Tilman et al., 2011, Li et al., 2013, Bansod and Pandey, 2013, Hedley, 2015).

A MZ is defined as a subregion of a field that exhibits similar combinations of yield-limiting factors (Tagarakis et al., 2013). This facilitates the application of precision agriculture (PA) techniques by reducing the costs of its adoption and implementation, since MZs can use constant rate equipment and may reduce the number of samples needed to characterize the soil nutrients availability. Delineating MZs is not a simple task because numerous variables may influence crop yield. Considering that a MZ is often used for several years, the considered variables should be temporally stable (Doerge, 2000) and correlated to the yield. Among the variables identified in the literature good potential to delineate temporally stable MZs are elevation (Bazzi et al., 2015, Fraisse et al., 2001, Jaynes et al., 2005, Peralta and Costa, 2013, Farid et al., 2016; Schepers et al., 2004), soil electrical conductivity (ECa) (Li et al., 2007;Farid et al., 2016), soil penetration resistance (Gavioli et al., 2016), and soil texture (Farid et al., 2016).

Techniques such as principal component analysis (PCA) (Bansod and Pandey, 2013) and the Moran's bivariate spatial autocorrelation statistic proposed by Czaplewski and Reich (1993), and used by Reich et al. (1994) and Bonham et al. (1995) can be used to create (when PCA is used) or select layers for delineation MZs. When there is more than one crop cultivated in the same field during the year, which is a common practice in Brazil, normalizing yield data makes possible to create a more representative variable (Bunselmeyer and Lauer, 2015) to be used in ANOVA and Tukey's test.

Several techniques to delineate MZs are proposed in the literature (Pedroso et al., 2010; Xiang et al., 2007), however the most used is cluster analysis (Li et al., 2007, Iliadis et al., 2010). The most commonly used clustering methods to delineate MZs are the K-means algorithm (Rodrigues Junior et al., 2011, Ortega and Santibáñez, 2007) and fuzzy C-means (Li et al., 2007, Li et al., 2013, Fu et al., 2010, Zhang et al., 2013, Moral et al., 2010). This algorithm, that incorporates the theory of fuzzy logic in the division algorithm, uses a weighting exponent to control the degree of sharing between classes (Bezdek, 1981), allowing individuals to exhibit partial adhesion in each of the classes, which is important when dealing with the continuous variability of natural phenomena (Burrough, 1989). Before a dataset can be formed, it is necessary to establish an appropriate measure of similarity. Euclidean distance is most regularly used; this measure gives equal weight to all measured variables and is sensitive to correlated variables (Bezdek, 1981). In geometrical terms, the Euclidean distance creates agglomerates having a spherical shape, which rarely occur in a soil (Odeh et al., 1992). Fridgen et al. (2004) reports that Euclidean distance should be used only for statistically independent variables demonstrating equal variances. In this sense, when the Euclidean distance is used to clustering, the normalization data can be very important step before creating MZs.

The normalization methods such as Standard score or Z-score method (Eq. (1)) has been used by many researchers for delineation of MZs (Anderberg, 1973, Romesburg, 1984, Larscheid and Blackmore, 1996, Stafford et al., 1996, Molin, 2002, Kitchen et al., 2005). This method is used for transforming normal variables to standard score where the transformed variable will have a mean of 0.0 and a variance of 1.00. $Z = \frac{(X - \overline{X})}{s}$ where X is the original data value; $\overline{X}$ is the sample average; and s is the standard deviation.

Several researches reported the use of the average method (Eq. (2)) for delineation MZs (Stafford et al., 1996, Molin, 2002, Kitchen et al., 2005) with the assumption that the average represents the dataset well; however, the average is sensitive, can be modified by adding any constant, and can easily change the distribution of the normalized data (Anderberg, 1973). $Z = \frac{X}{\overline{X}}$

Good results were also reported by Milligan and Cooper, 1988, Bazzi et al., 2013, Gavioli et al., 2016, and Schenatto et al. (2016) using the Range (Eq. (3)) normalization method. This method is bounded by 0.0 and 1.0 with at least one observed value at each of these end points. The Min(X) value used in Eq. (3) can be changed for Median(X) (Mielke and Berry, 2007) and have the same behavior because Min(X) and Median(X) are constants and not change the data distribution. $Z = \frac{X - Min (X)}{Max (X) - Min (X)}$

The goal of this study was to evaluate the performance of these methods, frequently used in the data clustering process by the Fuzzy C-Means algorithm to delineate MZs.

Section snippets

Materials and methods

A step-by-step flowchart (Fig. 1) was created to show the methodology used.

Yield analysis and variables selected

Descriptive statistics were calculated for the yield data at the sampling points for each considered crop, and for normalized yield data (Table 2). The better years for soybean were 2014 (Field A), 2012 (Field B) and 2011 (Field C), and for corn, 2014. For each field, a yield variable was created as a more representative variable using yield data of all evaluate years (the standard score normalization technique (Eq. (1)) was used (Kitchen et al., 2005, Milani et al., 2006, Suszek et al., 2011).

Conclusions

The normalization methods influenced the clustering process when using two or more attributes with different scales (for fields A and B). This indicates that each method influenced data differently. When only one variable (field C) was used, the maps created with and without normalization were identical.

The optimal number of zones (two zones) selected using the evaluation indices (MPE, FPI, and VR) and ANOVA was the same for each applied normalization method. Furthermore, the agreement among

Acknowledgments

The authors are grateful to the State University of Western Paraná, the Technological Federal University of Paraná, the Araucária Foundation (Fundação Araucária), the Coordination for the Improvement of Higher Education Personnel (CAPES), and the National Council for Scientific and Technological Development (CNPq) for the support received, and the agronomist engineers Aldo Tasca and Agassis Linhares for the assignment of the research area.

References (56)

H. Arslan
Spatial and temporal mapping of groundwater salinity using ordinary kriging and indicator kriging: the case of Bafra Plain, Turkey
Agric. Water Manage.
(2012)
Q. Fu et al.
Delineating soil nutrient management zones based on fuzzy clustering optimized by PSO
Math. Comput. Modell.
(2010)
A. Gavioli et al.
Optimization of management zone delineation by using spatial principal components
Comput. Electron. Agric.
(2016)
L.S. Iliadis et al.
An intelligent system employing an enhanced fuzzy C-Means clustering model: application in the case of forest fires
Comput. Electron. Agric.
(2010)
D.B. Jaynes et al.
Identifying potential soybean management zones from multi-year yield data
Comput. Electron. Agric.
(2005)
N.R. Kitchen et al.
Delineating productivity zones on claypan soil fields using apparent soil electrical conductivity
Comput. Electron. Agric.
(2005)
Y. Li et al.
Delineation of site-specific management zones using fuzzy clustering analysis in a coastal saline land
Comput. Electron. Agric.
(2007)
S.A. Mingoti et al.
Comparing SOM neural network with Fuzzy C-means, K-means and traditional hierarchical clustering algorithms
Eur. J. Oper. Res.
(2006)
F.J. Moral et al.
Delineation of management zones using mobile measurements of soil apparent electrical conductivity and multivariate geostatistical techniques
Soil Tillage Res.
(2010)
R.A. Ortega et al.
Determination of management zones in corn (Zea mays L.) based on soil fertility
Comput. Electron. Agric.
(2007)

M. Pedroso et al.

A segmentation algorithm for the delineation of agricultural management zones

Comput. Electron. Agric.

(2010)

N.R. Peralta et al.

Delineation of management zones with soil apparent electrical conductivity to improve nutrient management

Comput. Electron. Agric.

(2013)

J.V. Stafford et al.

Mapping and interpreting the yield variation in cereal crops

Comput. Electron. Agric.

(1996)

Y. Sun et al.

Comparison of interpolation methods for depth to groundwater and its temporal and spatial variations in the Minqin oasis of northwest China

Environm. Modell. Softw.

(2009)

L. Xiang et al.

Delineation and scale effect of precision agriculture management zones using yield monitor data over four years

Agric. Sci. Chin.

(2007)

M.R. Anderberg

Cluster Analysis for Applications

(1973)

J. Arno et al.

Clustering of grape yield maps to delineate site-specific management zones

Spanish J. Agric. Res.

(2011)

B.S. Bansod et al.

An application of PCA and fuzzy C-Means to delineate management zones and variability analysis of soil

Eurasian Soil Sci.

(2013)

C.L. Bazzi et al.

Management zones applied to pear orchard

Int. J. Food Agric. Environm.

(2015)

C.L. Bazzi et al.

Management zones definition using soil chemical and physical attributes in a soybean area

Engenharia Agrícola

(2013)

J.C. Bezdek

Pattern Recognition With Fuzzy Objective Function Algorithms

(1981)

C.D. Bonham et al.

Spatial cross-correlation of Boutelua gracilis with site factor

Grassland Sci.

(1995)

B. Boydell et al.

Identifying potential within-field management zones from cotton yield estimates

Precision Agric.

(2002)

H.A. Bunselmeyer et al.

Using corn and soybean yield history to predict subfield yield response

Agron. J.

(2015)

P.A. Burrough

Fuzzy mathematical methods for soil survey and land evaluation

Eur. J. Soil Sci.

(1989)

J.A. Cohen

Coefficient of agreement for nominal scales

Educ. Psychol. Measur.

(1960)

Czaplewski, R.L., Reich, R.M., 1993. Expected value and variance of Moran's bivariate spatial autocorrelation statistic...

A. Dobermann et al.

Classification of crop yield variability in irrigated production fields

Agron. J.

(2003)

Cited by (40)

AgDataBox-IoT - application development for agrometeorological stations in smart
2023, MethodsX
Currently, Brazil is one of the world's largest grain producers and exporters. Agriculture has already entered its 4.0 version (2017), also known as digital agriculture, when the industry has entered the 4.0 era (2011). This new paradigm uses Internet of Things (IoT) techniques, sensors installed in the field, network of interconnected sensors in the plot, drones for crop monitoring, multispectral cameras, storage and processing of data in Cloud Computing, and Big Data techniques to process the large volumes of generated data. One of the practical options for implementing precision agriculture is the segmentation of the plot into management zones, aiming at maximizing profits according to the productive potential of each zone, being economically viable even for small producers. Considering that climate factors directly influence yield, this study describes the development of a sensor network for climate monitoring of management zones (microclimates), allowing the identification of climate factors that influence yield at each of its stages.
- •
  Application of the internet of things to assist in decision making in the agricultural production system.
- •
  AgDataBox (ADB-IoT) web platform has an Application Programming Interface (API).
- •
  An agrometeorological station capable of monitoring all meteorological parameters was developed (Kate 3.0).
AGDATABOX-RS computational application: Remote sensing data management
2023, SoftwareX
Remote sensing can help evolution of agricultural practices by providing periodic information about the status of a given crop over the harvest season, at different scales, and for different segments. Applications in precision agriculture use remote sensing practices based on multispectral images to measure plants’ parameters throughout their development cycle. The aim of the current research is to develop a computational module capable of integrating and providing remote sensing data for the AgDataBox precision agriculture platform. The herein developed application enabled the persistence of a new field, the search for raster images generated by orbital satellites, the selection of vegetation indices, as well as the vectorization and insertion of images of interest in the AgDataBox platform.
Natural gas security evaluation from a supply vs. demand perspective: A quantitative application of four As
2021, Energy Policy
The rising share of natural gas in global energy mix has resulted in concern over its security recently. But research from a perspective of exporting countries has not attracted enough attention. This study examines natural gas security from a supply vs. demand perspective. Firstly, a natural gas security index (NGSI) is formulated to quantify the NGSI score (NGSItot) of 22 selected countries, and the approach is based on a quantitative application of the “four As” of energy security. Secondly, a balance NGSI score (NGSIbla) matrix is introduced to gauge how well a specific country manages the trade-offs among the “four As”. Key research findings are as follows: (1) Proven gas reserves per capita and Annual median PM_2.5 concentration are more important than other indicators. (2) Nations that have a more balanced pattern against these four-A dimensions rank better. (3) According to the NGSIbla, eight countries fall into the Good group, ten in the Limited group, and four in the Weak group. Consequently, this study suggests that differentiated policy frameworks should be tailor-made for specific country to ensure natural gas security. This article contributes a replicable method of energy security evaluation, and a comprehensive policy framework to enhance natural gas security.
FastMapping: Software to create field maps and identify management zones in precision agriculture
2020, Computers and Electronics in Agriculture
Citation Excerpt :
Bazzi et al., (2013) proposed removing the variables with no significant spatial autocorrelation; removing the variables that were not correlated with yield; decreasing ordination of the remaining variables, considering the degree of correlation with yield; and removing variables that are correlated with each other. The idea is to keep for clustering only the variables that are most correlated with yield and to delineate the management using site variables related to yield (Schenatto et al., 2017). FastMapping allows us to include yield variables to analyze correlation, even though zoning will be implemented from different variables.
Diverse on-farm agronomic data, gathered via precision agriculture (PA), require depuration and joint analysis of multiple georeferenced field characteristics such as crop yields, and soil and topographic aspects. Multivariate analysis of spatial variability has been recommended to understand variation of data within fields and classify sites into zones of broad similarity to support crop management. However, most of the statistical techniques available for spatial data require advanced skills. Cleaning and statistical analysis of big spatial data can only be effectively implemented if there are easy-to-use computer programs integrating the analytical steps. This paper explains the development and implementation of FastMapping, an interactive web application to automatically clean PA raw data, generate spatial variability field maps, and delineate multivariate management zones. The application uses an interface developed in R language that automatically depurates datafiles; in addition, through automatic spatial interpolation of each variable, it allows us to merge data layers on the same spatial grid in such a way that each site within the agricultural field has values of all the measured variables. On this grid, FastMapping identifies homogeneous zones, in a multivariate and spatial way, and provides data reports including validation of the delineated zones. FastMapping outputs for zone delineation were compared with results from Management Zone Analyst (MZA) software in several fields. The new software yielded a similar zoning to that of MZA and added graphs and statistical results that are lacking in most PA software tools. The flexibility of FastMapping to import, clean and visualize PA data in a single computer environment is highly encouraging. FastMapping can be used by the PA community to support site-specific agricultural management.
Temporal Stability of Management Zone Patterns: Case Study with Contact and Non-Contact Soil Electrical Conductivity Sensors in Dryland Pastures
2024, Sensors
Unsupervised Clustering for a Comparative Methodology of Machine Learning Models to Detect Domain-Generated Algorithms Based on an Alphanumeric Features Analysis
2024, Journal of Network and Systems Management

View all citing articles on Scopus

View full text

Normalization of data for delineating management zones

Abstract

Introduction

Section snippets

Materials and methods

Yield analysis and variables selected

Conclusions

Acknowledgments

Agric. Water Manage.

Math. Comput. Modell.

Comput. Electron. Agric.

Comput. Electron. Agric.

Comput. Electron. Agric.

Comput. Electron. Agric.

Comput. Electron. Agric.

Eur. J. Oper. Res.

Soil Tillage Res.

Comput. Electron. Agric.

Comput. Electron. Agric.

Comput. Electron. Agric.

Comput. Electron. Agric.

Environm. Modell. Softw.

Agric. Sci. Chin.

Cluster Analysis for Applications

Clustering of grape yield maps to delineate site-specific management zones

Spanish J. Agric. Res.

An application of PCA and fuzzy C-Means to delineate management zones and variability analysis of soil

Eurasian Soil Sci.

Management zones applied to pear orchard

Int. J. Food Agric. Environm.

Management zones definition using soil chemical and physical attributes in a soybean area

Engenharia Agrícola

Pattern Recognition With Fuzzy Objective Function Algorithms

Spatial cross-correlation of Boutelua gracilis with site factor

Grassland Sci.

Identifying potential within-field management zones from cotton yield estimates

Precision Agric.

Using corn and soybean yield history to predict subfield yield response

Agron. J.

Fuzzy mathematical methods for soil survey and land evaluation

Eur. J. Soil Sci.

Coefficient of agreement for nominal scales

Educ. Psychol. Measur.

Classification of crop yield variability in irrigated production fields

Agron. J.