Keywords

1 Introduction

Due to the problems of the obtaining the sampling points data of the soil testing and formulated fertilization such as difficult sampling [1], heavy workload [2], high cost [3] and so on, the data points during the fertilizing decision stage in the process of testing soil for formulated fertilization was not enough to achieve the complete coverage for the field parcel in each region. Therefore, the sampling point soil nutrient data of the currently unknown spatial points were obtained on the basis of interpolation operation of the existing sampling point data frequently in real application. The utilizing of spatial interpolation technology was commonly used method. In the method, the statistical methods were applied to the nutrient data of some smaller density soil sample points to conduct interpolation operation to the data from the points that were not sampled to form the more dense point data distribution or the area data of different areal unit. The soil nutrient values of the points which were not sampled were predicted scientifically. Then, the precise fertilization could be implemented to the whole area.

Following the rapid development of the mobile intelligent terminal and the distributed computing technology, the computing and data processing of more and more applications was transferred to client-side [4]. Due to the frequent disconnection [5] and low reliability [6] of mobile GIS, most of space interpolation for mobile side had to rely on the service side in the using. It made the usage scenarios of the mobile-oriented space interpolation limited. Thus, making full advantage of modern computer technology could provide more configurable, extensible and customizable distributed applications and made traditional soil testing and formulated fertilization recommendation system running in embedded devices such as high performance and low cost mobile intelligent terminal possible [7, 8]. However, most of current soil testing and formulated fertilization system [911] and soil nutrient spatial interpolation technology [1214] were based on the GIS platform and hard to be transplanted to mobile platform. Hence, to realize the soil nutrient inquiry to any land parcel and the fertilization information recommendation by the users through intelligent mobile terminals and the personalized recommendation of the soil testing and fertilizer recommendation technology by fully integrating mobile technology and soil testing formula technology, the researching of a type of soil nutrient spatial interpolation technology which was divorced from the limitation of the GIS platform was significant for the application and popularization of soil testing formula fertilization technology.

2 Materials and Methods

2.1 Source of Material

The data in this paper came from the 238 soil nutrient sampling points in 10 thousand mu precision agriculture model production field of Changge city, Henan province in 2014. The sampling point data was consist of sampling number, numbering of land parcel, soil texture, preceding crop, PH values, organic matter, rapidly available phosphorus, rapidly available potassium, latitude and longitude information.

2.2 Research Method

2.2.1 Inverse Distance Weighted (IDW) Interpolation

At present, there were many kinds of methods for the soil nutrient interpolation. In which methods, the Inverse Distance Weighted interpolation (IDW) and Kriging interpolation [1517] were considered to be the most widely used two common interpolation methods. The Kriging interpolation was often influenced by many factors in actual application [18, 19]. Considering the space complexity of Kriging interpolation algorithm, the Inverse Distance Weighted interpolation was adopted in this study to design the interpolation algorithm for the soil nutrient data.

The inverse distance weighted interpolation was based on the theory of “like similarity principle”. Through the calculating of the inverse distance weighted average of the discrete points of nearby area, the value in the cell was also calculated. According to the principle that the point with closer distance had the greater weighting values, the value of the estimate point was fitted employing the linear weighting of some neighboring points around.

This algorithm was a simple and effective data interpolation method with relatively fast computing speed. However, it also had the obvious flaws. For example, the method only considered the space distance between the estimation points and the interpolation points. Moreover, the weight calculation used in the algorithm lacked the exact physical basis [20].

2.2.2 K-Nearest Neighbor (KNN) Algorithm and Spatial Index of Soil Nutrient Sampling Points Based on K-D Tree

The soil nutrient content of the sampling point was not only affected by the change of the sample point distance, but also was affected by many other factors such as soil texture variation. Therefore, the K-Nearest Neighbor algorithm was also introduced in the process of the adjacent points selecting and searching during the process of the interpolation to the soil nutrient using inverse distance weighted interpolation to improve the IDW algorithm.

K-Nearest Neighbor (KNN) sorting algorithm was proposed by Cover and Hart in 1967 [20]. It was a simple algorithm based on analogy and had become one of the most mature machine learning algorithms in theory at present. The idea of the algorithm was: if most of the k most similar (i.e., the adjacent) samples of a sample in the characteristic space belonged to a category, then the sample also belonged to this category. If the query point and the positive integer K were given, the k closest data to the query point were also found from the data set.

The realization of the KNN algorithm was based on the establishment of the spatial index for data. The commonly used algorithms for establishment of a spatial index in GIS interpolation algorithm has K-D tree, K-D-B tree, BSP tree, R tree series, quadtree, grid and many other spatial index [21]. K-D (K dimension search binary tree) was a main memory data structure to generalize binary tree to multidimensional data and a type of binary tree in K dimension space. It was also a type of dynamic index structure for the segmentation of data space and suited for the space point target indexing [22]. Therefore, this paper constructed the spatial index algorithm of the soil nutrient utilizing the K-D tree.

3 Design and Simulation of Algorithm

3.1 Workflow Design

The idea of the soil nutrient spatial interpolation algorithm based on KNN and IDW proposed in this paper was: Constructing the KD-Tree firstly to establish the index space of the soil nutrient data sampling points. After that, the sampling point data set was further screened. The specific method was: using the given soil nutrient training data set, the tree structure data index was constructed adopting the space partitioning tree KD tree to conduct the hierarchical division for the search space of soil sampling point. After that, the fast matching was conducted to the new input sampling point data to find the nearest neighboring K sampling points of the sampling point in the training data set. The K sampling point data would be used as the spatial interpolation data. The algorithm process was shown in Fig. 1.

Fig. 1.
figure 1

The process design of algorithm

  1. (1)

    Reading the current sampling point data set and ready for operation;

  2. (2)

    Conducting the data pre-processing to the sampling point data and removing critical value such as the outliers and the value beyond administrative boundary and so on to reduce the measuring error of the interpolation results;

  3. (3)

    Constructing the KD tree and establishing the spatial index of the sampling point data;

  4. (4)

    Conducting the space partition to the soil nutrient sampling point data using the KD tree;

  5. (5)

    Conducting the screening to current data set utilizing the K-Nearest Neighbor algorithm to seek the nearest neighbor sampling point data of the current interpolation sampling point data to form the preparative interpolation data set;

  6. (6)

    Traversing the current sample point data set, judging the data whether was accordant with the soil texture of the current sampling point. If the data was inconsonant with the soil texture of the current sampling point, the data would be discarded.

  7. (7)

    Selecting the final interpolation data set based on the step 6;

  8. (8)

    Judging whether the number of the data in final data set “cout” was less than 2, if “cout” was less than 2, this time interpolation was fail and returned the “false” information. Otherwise, the process continued.

  9. (9)

    Looping through the data set and calculating the weighted average of the current unknown sample points with the current data points.

  10. (10)

    Calculating and outputting the interpolation results based on inverse distance weighted interpolation.

3.2 Algorithm Implementation

Based on the above algorithm design, the process of the algorithm implementation consisted of the following three steps: 1. KD tree Constructing, 2. K neighboring searching, 3. the inverse distance weighted interpolation calculation.

  1. 1.

    KD Tree Constructing

The KD tree data structure was established firstly in the algorithm. After that, the KD tree was constructed based on KD tree data structure. The steps of the algorithm were as follows:

  1. (1)

    If the sampling point data sets “Dataset” was empty, the empty KD tree was returned. Otherwise, the node generation program would be executed.

  2. (2)

    Determine the maximum variance dimension the “Split” domain. The maximum variances in each space dimension of all the sampling point data which prepared for interpolation were counted. The maximal dimension in their variance was elected as the space division dimension. The quantitative value was used as the value of the “Split” domain.

  3. (3)

    The root Node “Node” was confirmed. The data of the selected space partition dimension was sorted. The sampling point data located in the middle was used as the data of the “Node” node.

  4. (4)

    The left subspace “leftNodeData” and the right subspace “rightNodeData” of the “Node” were determined. The data of the sampling points whose space dimension was less than the “Node” node was used as the data of the left subspace of the “Node” node. The data of the sampling points whose space dimension was larger than the “Node” node was used as the data of the right subspace of the “Node” node.

  5. (5)

    The above process was recursive to conduct the space division partition for the “leftNodeData” and the “rightNodeData” until the subspace had not data.

  1. 2.

    K-Nearest Neighbor Searching

The KD tree nearest neighbor search query algorithm was adopted in this study to construct the KD tree “Kd”. Adopting the target point “target” as the input, employing the obtained K nearest neighboring “nearest” sets of the “target” as the output, the steps of the K search algorithm establishment were as follows:

  1. (1)

    The leaf nodes containing the target point “target” were found from the KD tree. From the root node, the KD tree was recursively searched downward. If the coordinates of the current dimension of the target point “target” was less than the coordinate of the split points, the target point “target” was moved to the left child node space. Otherwise, it was moved to the right child node.

  2. (2)

    The leaf node searched in step 1 was used as the current nearest point “currnearest”.

  3. (3)

    The other possible nearest points were backtracked recursively. If there was a node closer the target point “target” than the current nearest point “currnearest”, the value of the current “currnearest” was updated.

  4. (4)

    When the backtracking backed to the root node, the search ended. The last nearest point “currnearest” was the nearest neighboring point of the “target”.

  5. (5)

    The current nearest neighboring point was deposited in the “nearest” set and removed from the tree of Kd.

  6. (6)

    The above process was repeated until the number of set elements in the “nearest” was equal to K. Stopped the search and returned back to the “nearest” set.

  1. 3.

    Inverse Distance Weighted Interpolation

Based on the KD tree generating algorithm, the spatial index tree of the current sample sampling points was constructed. On the basis of this, the K nearest-neighbor sampling points were searched out combining k-nearest neighbor algorithm to form the nearest-neighbor set “nearest”. Employing the nearest-neighbor set “nearest” and the target point “target” as the input, the sampling points data of the current sampling points was conducted the spatial interpolation processing adopting the inverse distance weighted interpolation. The algorithm steps were as follows:

  1. (1)

    Calculating the distance between the target point “target” and the sampling points in the nearest-neighbor set “nearest”. Traversal the “nearest” set and calculating the distance from the “target” to each nearest-neighbor sampling point.

  2. (2)

    Calculating the weight of each point. The weight of each sampling point in the nearest-neighbor set “nearest” was calculated to be used as the weight during the interpolation.

  3. (3)

    Calculating the results of the weighted interpolation. Traversal the “nearest” neighboring set. The results was worked out based on the weight of each point and returned.

4 Design and Analysis of Experiments

Adopting the data from 238 soil nutrient sampling points in 10 thousand mu precision agriculture model production field of Changge city, Henan province in 2014 as the data source, the ARM CPU architecture as mobile hardware environment(1.6 GHz, 4 core), the above soil nutrient interpolation calculation method was realized based on the Android 4.4 mobile platform. As shown in Fig. 2, the pH, organic material, rapid available phosphorus and rapid available potassium in the sample data were conducted the spatial interpolation processing and effect analysis during the algorithm implementation efficiency was less than 3 s.

Fig. 2.
figure 2

Algorithm running results

The Fig. 2 showed the results page after interpolation to the 10 nearest-neighbor sampling point data. By the program, the nutrient content of the current point was predicted through the interpolation to the pH, organic matter, rapid available phosphorus and rapid available potassium of the current point and contrasted with the actual value.

The effect analysis of algorithm adopting the cross validation method: a small number of samples of the sampling data of nutrients were screened and reserved to be not involved in the interpolation. These reserved samples were conducted comparative evaluation with the interpolation results. That meant that the model established by the most of the sample was used to conduct the small sample prediction. Meanwhile, the error of prediction was also obtained. The mean absolute error, average relative error and root-mean-square error was used as the standard of measuring the interpolation precision [14, 23].

In this experiment, the randomly selected 50 sample points from the total samples were used as the data waiting for interpolation and need to be predicted. The rest of the sample points were used as the interpolation data. After many times of experiment, the influence of the different number of the nearest-neighbor to the result data of the sampling point was tested under the same soil texture. The interpolation results analysis were shown in the tables below (Tables 1, 2, 3 and 4):

Table 1. Results statistic of pH values
Table 2. Results statistic of organic material
Table 3. Results statistic of rapidly available phosphorus
Table 4. Results statistic of rapidly available potassium

Because the algorithms was designed based on mobile computing, it could not only guarantee the accuracy but also provide high efficiency. Through many times of experiment contrasting, when the execution efficiency was less than 3 s, the variation situation of the soil nutrients interpolation error in the algorithm was examined in 100 neighboring set. The comparing results as follows:

  1. (1)

    The prediction error of Ph decreased gradually with the increase of the neighbor point firstly. The prediction error reach minimum when the number of the neighbor point increased to 85. The mean absolute error, average relative error and root-mean-square error was 0.0405, 0.064052 and 0.4360 respectively. Then, the error increased gradually again.

  2. (2)

    The prediction error of organic material decreased gradually with the increase of the neighbor point firstly. The prediction error reach minimum when the number of the neighbor point increased to 15. The mean absolute error, average relative error and root-mean-square error was 0.3870, 0.1417 and 2.4147 respectively. Then, the error increased gradually again.

  3. (3)

    The prediction error of rapid available phosphorus decreased gradually with the increase of the neighbor point.

  4. (4)

    The prediction error of rapid available potassium decreased gradually with the increase of the neighbor point. The prediction error reach minimum when the number of the neighbor point increased to 65. The mean absolute error, average relative error and root-mean-square error was 0.0015, 0.1885 and 23.9818 respectively. Then, the error increased gradually again.

5 Conclusions

The K-D Tree was used as the space division algorithm for the soil nutrient sampling point in this study. The spatial index of the soil nutrient sampling point was also established. On this basis, the K-nearest neighbor search of the soil nutrient sampling point was realized using KNN algorithm. Furthermore, combined with KNN and IDW algorithm, the soil nutrients spatial interpolation algorithm was implemented and breaked away from the limitation of GIS platform.

  1. (1)

    The experiment showed that the soil nutrient spatial interpolation algorithm based on KNN and IDW was effective and feasible for predicting the content values of the Ph, organic matter, rapid available phosphorus and rapidly available potassium in soil.

  2. (2)

    This paper tested the influence to different soil nutrient elements affected by the K value in KNN algorithm. The result showed that the neighboring points for the prediction of Ph, organic matter, rapid available phosphorus and rapid available potassium of respectively take 85, 15, maximal sample space, 65 could achieve the best accuracy. Of which, the optimal average absolute errors of Ph, organic matter, rapidly available potassium were 0.0405, 0.3870, and 0.0015 respectively.

  3. (3)

    Through the analysis of the error size, it was found that the interpolation precision to pH and organic matter of the algorithm is higher than the interpolation precision of the rapid available phosphorus and rapid available potassium. The error of the rapid available phosphorus and rapid available potassium was larger. This was largely relevant with their large spatial variability [24].

Following the rapid development of the mobile intelligent terminal and the distributed computing technology, more and more applications transferred the computing and the data processing to thin client. The soil nutrient spatial interpolation algorithm based on KNN and IDW enabled the traditional soil testing and formulated fertilization recommended system to run on embedded devices such as high performance and low cost mobile intelligent terminal. The advantages of the mobile technology and the soil testing formula technology could thus fully integrate to achieve the positioning and query to the soil nutrient and recommended fertilization information of any plot for the users through the intelligent mobile platform in the fields. It had an important significance for the accurate utilizing of the soil testing and formulated fertilization technology and the resolving of the problem of “the last kilometer” for the scientific fertilization information promotion.