Normalization of data for delineating management zones
Introduction
The study of the spatial distribution of soil and plant variables is important to the establishment of appropriate management zones (MZs) to be used in application of the fertilizer, soil management, and irrigation. Appropriate MZs may maximize yield, while reducing costs and minimizing potential environmental damage (Tilman et al., 2011, Li et al., 2013, Bansod and Pandey, 2013, Hedley, 2015).
A MZ is defined as a subregion of a field that exhibits similar combinations of yield-limiting factors (Tagarakis et al., 2013). This facilitates the application of precision agriculture (PA) techniques by reducing the costs of its adoption and implementation, since MZs can use constant rate equipment and may reduce the number of samples needed to characterize the soil nutrients availability. Delineating MZs is not a simple task because numerous variables may influence crop yield. Considering that a MZ is often used for several years, the considered variables should be temporally stable (Doerge, 2000) and correlated to the yield. Among the variables identified in the literature good potential to delineate temporally stable MZs are elevation (Bazzi et al., 2015, Fraisse et al., 2001, Jaynes et al., 2005, Peralta and Costa, 2013, Farid et al., 2016; Schepers et al., 2004), soil electrical conductivity (ECa) (Li et al., 2007;Farid et al., 2016), soil penetration resistance (Gavioli et al., 2016), and soil texture (Farid et al., 2016).
Techniques such as principal component analysis (PCA) (Bansod and Pandey, 2013) and the Moran's bivariate spatial autocorrelation statistic proposed by Czaplewski and Reich (1993), and used by Reich et al. (1994) and Bonham et al. (1995) can be used to create (when PCA is used) or select layers for delineation MZs. When there is more than one crop cultivated in the same field during the year, which is a common practice in Brazil, normalizing yield data makes possible to create a more representative variable (Bunselmeyer and Lauer, 2015) to be used in ANOVA and Tukey's test.
Several techniques to delineate MZs are proposed in the literature (Pedroso et al., 2010; Xiang et al., 2007), however the most used is cluster analysis (Li et al., 2007, Iliadis et al., 2010). The most commonly used clustering methods to delineate MZs are the K-means algorithm (Rodrigues Junior et al., 2011, Ortega and Santibáñez, 2007) and fuzzy C-means (Li et al., 2007, Li et al., 2013, Fu et al., 2010, Zhang et al., 2013, Moral et al., 2010). This algorithm, that incorporates the theory of fuzzy logic in the division algorithm, uses a weighting exponent to control the degree of sharing between classes (Bezdek, 1981), allowing individuals to exhibit partial adhesion in each of the classes, which is important when dealing with the continuous variability of natural phenomena (Burrough, 1989). Before a dataset can be formed, it is necessary to establish an appropriate measure of similarity. Euclidean distance is most regularly used; this measure gives equal weight to all measured variables and is sensitive to correlated variables (Bezdek, 1981). In geometrical terms, the Euclidean distance creates agglomerates having a spherical shape, which rarely occur in a soil (Odeh et al., 1992). Fridgen et al. (2004) reports that Euclidean distance should be used only for statistically independent variables demonstrating equal variances. In this sense, when the Euclidean distance is used to clustering, the normalization data can be very important step before creating MZs.
The normalization methods such as Standard score or Z-score method (Eq. (1)) has been used by many researchers for delineation of MZs (Anderberg, 1973, Romesburg, 1984, Larscheid and Blackmore, 1996, Stafford et al., 1996, Molin, 2002, Kitchen et al., 2005). This method is used for transforming normal variables to standard score where the transformed variable will have a mean of 0.0 and a variance of 1.00.where X is the original data value; is the sample average; and s is the standard deviation.
Several researches reported the use of the average method (Eq. (2)) for delineation MZs (Stafford et al., 1996, Molin, 2002, Kitchen et al., 2005) with the assumption that the average represents the dataset well; however, the average is sensitive, can be modified by adding any constant, and can easily change the distribution of the normalized data (Anderberg, 1973).
Good results were also reported by Milligan and Cooper, 1988, Bazzi et al., 2013, Gavioli et al., 2016, and Schenatto et al. (2016) using the Range (Eq. (3)) normalization method. This method is bounded by 0.0 and 1.0 with at least one observed value at each of these end points. The Min(X) value used in Eq. (3) can be changed for Median(X) (Mielke and Berry, 2007) and have the same behavior because Min(X) and Median(X) are constants and not change the data distribution.
The goal of this study was to evaluate the performance of these methods, frequently used in the data clustering process by the Fuzzy C-Means algorithm to delineate MZs.
Section snippets
Materials and methods
A step-by-step flowchart (Fig. 1) was created to show the methodology used.
Yield analysis and variables selected
Descriptive statistics were calculated for the yield data at the sampling points for each considered crop, and for normalized yield data (Table 2). The better years for soybean were 2014 (Field A), 2012 (Field B) and 2011 (Field C), and for corn, 2014. For each field, a yield variable was created as a more representative variable using yield data of all evaluate years (the standard score normalization technique (Eq. (1)) was used (Kitchen et al., 2005, Milani et al., 2006, Suszek et al., 2011).
Conclusions
The normalization methods influenced the clustering process when using two or more attributes with different scales (for fields A and B). This indicates that each method influenced data differently. When only one variable (field C) was used, the maps created with and without normalization were identical.
The optimal number of zones (two zones) selected using the evaluation indices (MPE, FPI, and VR) and ANOVA was the same for each applied normalization method. Furthermore, the agreement among
Acknowledgments
The authors are grateful to the State University of Western Paraná, the Technological Federal University of Paraná, the Araucária Foundation (Fundação Araucária), the Coordination for the Improvement of Higher Education Personnel (CAPES), and the National Council for Scientific and Technological Development (CNPq) for the support received, and the agronomist engineers Aldo Tasca and Agassis Linhares for the assignment of the research area.
References (56)
Spatial and temporal mapping of groundwater salinity using ordinary kriging and indicator kriging: the case of Bafra Plain, Turkey
Agric. Water Manage.
(2012)- et al.
Delineating soil nutrient management zones based on fuzzy clustering optimized by PSO
Math. Comput. Modell.
(2010) - et al.
Optimization of management zone delineation by using spatial principal components
Comput. Electron. Agric.
(2016) - et al.
An intelligent system employing an enhanced fuzzy C-Means clustering model: application in the case of forest fires
Comput. Electron. Agric.
(2010) - et al.
Identifying potential soybean management zones from multi-year yield data
Comput. Electron. Agric.
(2005) - et al.
Delineating productivity zones on claypan soil fields using apparent soil electrical conductivity
Comput. Electron. Agric.
(2005) - et al.
Delineation of site-specific management zones using fuzzy clustering analysis in a coastal saline land
Comput. Electron. Agric.
(2007) - et al.
Comparing SOM neural network with Fuzzy C-means, K-means and traditional hierarchical clustering algorithms
Eur. J. Oper. Res.
(2006) - et al.
Delineation of management zones using mobile measurements of soil apparent electrical conductivity and multivariate geostatistical techniques
Soil Tillage Res.
(2010) - et al.
Determination of management zones in corn (Zea mays L.) based on soil fertility
Comput. Electron. Agric.
(2007)
A segmentation algorithm for the delineation of agricultural management zones
Comput. Electron. Agric.
Delineation of management zones with soil apparent electrical conductivity to improve nutrient management
Comput. Electron. Agric.
Mapping and interpreting the yield variation in cereal crops
Comput. Electron. Agric.
Comparison of interpolation methods for depth to groundwater and its temporal and spatial variations in the Minqin oasis of northwest China
Environm. Modell. Softw.
Delineation and scale effect of precision agriculture management zones using yield monitor data over four years
Agric. Sci. Chin.
Cluster Analysis for Applications
Clustering of grape yield maps to delineate site-specific management zones
Spanish J. Agric. Res.
An application of PCA and fuzzy C-Means to delineate management zones and variability analysis of soil
Eurasian Soil Sci.
Management zones applied to pear orchard
Int. J. Food Agric. Environm.
Management zones definition using soil chemical and physical attributes in a soybean area
Engenharia Agrícola
Pattern Recognition With Fuzzy Objective Function Algorithms
Spatial cross-correlation of Boutelua gracilis with site factor
Grassland Sci.
Identifying potential within-field management zones from cotton yield estimates
Precision Agric.
Using corn and soybean yield history to predict subfield yield response
Agron. J.
Fuzzy mathematical methods for soil survey and land evaluation
Eur. J. Soil Sci.
Coefficient of agreement for nominal scales
Educ. Psychol. Measur.
Classification of crop yield variability in irrigated production fields
Agron. J.
Cited by (40)
FastMapping: Software to create field maps and identify management zones in precision agriculture
2020, Computers and Electronics in AgricultureCitation Excerpt :Bazzi et al., (2013) proposed removing the variables with no significant spatial autocorrelation; removing the variables that were not correlated with yield; decreasing ordination of the remaining variables, considering the degree of correlation with yield; and removing variables that are correlated with each other. The idea is to keep for clustering only the variables that are most correlated with yield and to delineate the management using site variables related to yield (Schenatto et al., 2017). FastMapping allows us to include yield variables to analyze correlation, even though zoning will be implemented from different variables.
Unsupervised Clustering for a Comparative Methodology of Machine Learning Models to Detect Domain-Generated Algorithms Based on an Alphanumeric Features Analysis
2024, Journal of Network and Systems Management