Abstract
Spatial interpolations are commonly used in geometric modeling for life science applications. In large-scale spatial interpolations, it is always needed to find a local set of data points for each interpolated point using the k Nearest Neighbor (kNN) search procedure. To improve the computational efficiency of kNN search, spatial decomposition structures such as grids and trees are employed to fastly locate the nearest neighbors. Among those spatial decomposition structures, the uniform grid is the simplest one, and the size of the grid cell could strongly affect the efficiency of kNN search. In this paper, we evaluate the effect of the size of uniform grid cell on the efficiency of kNN search. Our objective is to find the relatively optimal size of grid cell by considering the distribution of scattered points (i.e., the data points and the interpolated points). We employ the Standard Deviation of points’ coordinates to measure the spatial distribution of scattered points. For the irregularly distributed scattered points, we perform several series of kNN search procedures in two dimensions. Benchmark results indicate that: in two dimensions, with the increase of the Standard Deviation of points’ coordinates, the relatively optimal size of the grid cell decreases and eventually converges. The relationships between the Standard Deviation of scattered points’ coordinates and the relatively optimal size of grid cell are also fitted. The fitted relationships could be applied to determine the relatively optimal grid cell in kNN search, and further, improve the computational efficiency of spatial interpolations.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
A spatial interpolation algorithm is the method in which the attributes at some known locations (data points) are used to predict the attributes at some unknown locations (interpolated points). Spatial interpolation algorithms, such as the Inverse Distance Weighting (IDW) [15], Kriging [24], Moving Least Squares method (MLS) [19], Radial Basis Functions (RBFs) Interpolation [5,6,7]. Different interpolation methods are widely used in various scientific fields, such as Geographic Information System (GIS) [9, 10], geometric modeling [2, 11], image processing [8, 18], numerical analysis [25, 27].
Interpolation algorithms are widely used in the field of life science applications. Liu et al. [13] proposed a hybrid approach to shape-based interpolation of stereotactic atlases of the human brain. Volkau et al. [26] combined a minimal distance map and cubic splines to reconstruct the subcortical structures of the Talairach-Tournoux atlas. Parrot et al. [23] focused on interpolation of scalar values in the 3-D gird of input data. Pan et al. [22] compared filter interpolation, ordinary interpolation and general partial volume interpolation in medical image interpolation.
In large-scale spatial interpolations, to improve the computational efficiency of interpolating, it always uses a local set of data points rather than the global set of data points to predict the interpolation value of each interpolated points. Thus, it commonly needs to find a local set of data points for each interpolated point using several approaches such as the k Nearest Neighbor (kNN) search procedure.
For example, Li et al. [12] proposed the Random kNN (a novel generalization of traditional nearest-neighbor modeling) for pattern analysis and modeled with high-dimensional data. Al Aghbari [1] studied the multiple kNN queries processing techniques in constrained spatial networks. Nutanong [20] studied an efficient algorithm for moving k Nearest Neighbor queries. Roberto Cavoretto [4] proposed an efficient scheme for the computation of triangular Shepard method. Mei [17] presented an efficient AIDW interpolation algorithm on the GPU by utilizing a fast kNN search method.
The space decomposition data structures such as RP-tree [21], VP-tree [14], k-d tree [3], and uniform grid [17] are employed to accelerate the kNN search procedure. Among those space decomposition structures, the uniform grid is the simplest. And a critical issue in creating the uniform grid is the size of grid cell since it could strongly affect the search efficiency and cannot be too small or too large. To the best of our knowledge, there is currently no research work specifically focusing on determining the optimal size of grid cell in the kNN search.
Based on our previous work [7, 16, 17], in this paper we first evaluate the effect of the size of uniform grid cell on the efficiency of kNN search, and then attempt to find the relatively optimal size of grid cell by considering the distribution of scattered points.
This paper is organized as follows. Section 2 briefly describes the kNN search that is commonly used in spatial interpolation. Section 3 introduces our benchmark tests. Section 4 presents and discusses the test results. Finally, Sect. 5 draws several conclusions.
2 Background: kNN Search in Spatial Interpolation
The kNN search algorithm is directly derived from our previous work [7, 16, 17]. And more details on the process of the kNN search are described as follows.
Step 1: Creating an even grid
The creating of an even planar grid is straightforward. We first determine the planar rectangular region for partitioning by finding the minimum and maximum x and y coordinates of all points. Then, the numbers of rows and columns of the grid can be easily determined by dividing the rectangle with the width of the square cell; see a simple illustration in Fig. 1.
Step 2: Distributing data points into cells
The objective of distributing all data points into the grid cells is to find out in which grid cell each data point is located. The distributing of each data point is in fact to determine the row and column indices of the cell in which it locates. Since the grid cells are indexed sequentially first by rows and then by columns, the procedure of distributing can be easily carried out. First, the differences between the coordinates of a data point and the minimum coordinates of all cells are calculated; then the indices of column and row can be determined by dividing the above differences with the cell width.
Step 3: Determining data points in each cell
The objective of this step is to determine the number and the indices of those data points located in the same cell. The number of data points located in the same cell can be determined with the use of a segmented parallel reduction. After sorting all data points according to cell indices, the data points are sequentially stored in a group of segments; each segment is flagged with the cell index and contains the indices of data points locating in the same cell. The number of those data points located in the same cell can be obtained by performing a reduction for each segment. Moreover, the head index of the first point of each segment can be determined using segmented parallel scan.
Step 4: Searching nearest neighbors
The process of kNN search for each interpolated point can be summarized as the following substeps: (1) locating the interpolate point into the even grid, (2) determining the level of cell expanding (see Fig. 1), and (3) finding the k nearest neighbors within the local region. More details on searching the nearest neighboring data points for each interpolated points were presented in our previous work [17].
3 Methods
In large-scale spatial interpolations, a local set of data points is always to be used to predict the interpolation value for each interpolated point. Therefore, there are commonly two procedures: (1) the kNN search procedure, and (2) the interpolating procedure. An efficient kNN search procedure would be helpful to improve the computational efficiency of the entire process of spatial interpolation.
In the kNN search based on a uniform grid, one of the critical steps is to determine the size of the grid cell and then create the even grid. When attempting to search for k nearest data points, the levels of grid cells are constantly expanded to find required number of data points. When the data points are intensive, the grid cell could be too large and contain too many points. In this case, the number of data points locating in the current level of grid cells is far more than the required k; and the redundant data points need to be removed by sorting. This removal may cost significant extra computational consumption. In contrast, if the grid cell is very small, it needs to expand several times to cover enough number of data points. The expanding could also cost significant extra computational consumption.
In summary, the size of the uniform grid cell could strongly affect the computational efficiency of the kNN search procedure, and it could not be too large or too small. Our objective in this paper is to find the relatively optimal size of grid cell by considering the distribution of scattered points.
In this paper, several factors may affect the determination of grid cell size which include the value of k, the data points’ density, and two metrics of data distribution (i.e., the mean and Standard Deviation).
The basic idea in this paper is as follows. By changing the size of grid cells, the efficiency of kNN search is first analyzed, and then the influences of the several factors on the size of grid cells are discussed. Finally, we fit the relationships between the several factors and the relatively optimal size of the grid cell.
The sizes of grid cells are constant for the same distribution of data points in the original formula, we multiply the original formula by a coefficient w in this paper, the original formula for calculating the cellWidth is described in Eq. (1) in two dimensions. The used formula for changing the cellWidth in two dimensions is described in Eq. (2).
where, \(cellWidth_0^{2D} \) is the size of the original grid cell in two dimensions, \(cellWidth_{used}^{2D} \) is the size of the used grid cell in two dimensions, \(w_{2D} \) is the coefficient in two dimensions, \(dnum_{2D} \) is the number of known data points in two dimensions, \(A_{Box} \) is the area of the Boundary Box, and \(V_{Box} \) is the volume of the Boundary Box. The relationship between each factor and the coefficient w of grid cell size will be directly discussed subsequently.
4 Results and Discussion
4.1 Benchmark Environment and Testing Data
We carry out five groups of benchmark tests in two-dimensions on a powerful workstation computer. The specifications of the employed workstations are listed in Table 1.
For each group of the two-dimensional testing data, each set of data points is created by randomly distributing on a parametric surface; the equation of the parametric surface is demonstrated in Eq. (3). More specifically, both x and y coordinates are randomly generated in the range of 0−1000, while the associated value is simply calculated according to Eq. (3) after the x and y coordinates have been determined. The generation of five sets of interpolated points is the same as that of the data points. Both x and y coordinates of each interpolated points are randomly generated in the range of 0−1000.
4.2 Benchmark Results in Two-Dimensions
The test data in two-dimensions are listed in Table 2, including the number of irregularly distributed data points, and the number of interpolated points, respectively. For the irregularly distributed data points, the number of interpolated points is the same.
Influence of the Value of k on the Relatively Optimal Coefficient w of Grid Cell Size for Irregularly Distributed Scattered Points. This subsection discusses the effect of different k values and different point densities on the relatively optimal coefficient w of grid cell size for irregularly distribution scattered points. When the points’ spatial distribution is irregular, the mean value is 500 and the Standard Deviation value is 166. In the benchmark tests, the k values specified as 10, 20, 50, 100, and 200 for irregularly distribution scattered points is discussed in this section.
The benchmark results illustrated in Fig. 2 indicate that: when the point density is set as the Size 1 and the w is approximately 3.0, the highest efficiency can be achieved for different values of k. Moreover, the trends of the fitted curves are similar when configuring different values of k. For other four-point densities (i.e., the sizes of data points), almost the same conclusions can be drawn. It can be concluded that: the k value is of weak effect on the relatively optimal coefficient w of grid cell size for irregularly distributed scattered points, see Fig. 3.
Influence of Point Density on the Relatively Optimal Coefficient w of Grid Cell Size for Irregularly Distributed Scattered Points. This subsection specifically discusses the relationship between different point densities and the coefficient w of grid cell size by fixing the k values. In the benchmark tests, the point densities were specified as Size 1, Size 2, Size 3, Size 4, and Size 5 for irregularly distribution scattered points.
The benchmark results illustrated in Fig. 4 indicate that: when the value of k is set as the 10 and the w is approximately 3.0, the highest efficiency can be achieved for different point densities. Moreover, the trends of the fitted curves are similar when configuring different point densities. For other values of k, almost the same conclusions can be drawn for irregularly distributed scattered points. It can be concluded that: the points densities are of weak effect on the relatively optimal coefficient w of grid cell size for irregularly distributed scattered points, see Fig. 5.
Influence of Mean of Points’ Coordinates on the Relatively Optimal Coefficient w of Grid Cell Size for Irregularly Distributed Scattered Points. This subsection specifically discusses the relationship between different mean of points’ coordinates and the coefficient w of grid cell size by fixing other factors. In the benchmark tests, the mean of points’ coordinates was specified as (400,400), (600,400), (600,600), and (400,600). The number of data points is 67766, the number of interpolated points is 72301, the Standard Deviation value is 200, and the value of k is 50.
The benchmark results illustrated in Fig. 6 indicate that: the trends of the fitted curves are similar when configuring different mean of points’ coordinates, the highest efficiency corresponding to the relatively optimal coefficient w of grid cell size is close to 2.5 for different mean of points’ coordinates. It can be concluded that: the mean of points’ coordinates is of weak effect on the relatively optimal coefficient w of grid cell size for irregularly distributed scattered points.
Influence of Standard Deviation of Points’ Coordinates on the Relatively Optimal Coefficient w of Grid Cell Size for Irregularly Distributed Scattered Points.This subsection specifically discusses the relationship between different Standard Deviation and the coefficient w of grid cell size by fixing other factors. In the benchmark tests, the number of data points is 67766, the number of interpolated points is 72301, the mean of x and y is 500, and the value of k is 50. The Standard Deviation was specified as 100, 130, 160, 190, 200, 250, 300, 350, and 400. The benchmark results indicate that with the increase of the Standard Deviation of points’ coordinates, the relatively optimal size of the grid cell decreases and eventually converges, see Table 3. We have also fitted the relationships between the Standard Deviation of scattered points’ coordinates and the relatively optimal size of the grid cell in two-dimensions, see Fig. 7, the fitted relationship is described in Eq. (4).
To evaluate the Goodness of Fit, we use the COD (Coefficient of Determination) to measure the fitted equation. The COD of fitted equation is 0.98192, which indicates the fitting is good.
5 Conclusion
In this paper, we have investigated the effect of the decomposition of uniform grid on the computational efficiency of the kNN search procedure used in spatial interpolations. More precisely, we have evaluated the influence of the size of grid cell on the efficiency of the kNN search procedure. Our objective is to find a relatively optimal size of the grid cell. We have performed several series of benchmark based on irregularly distributed scattered points, and found that the distribution of scattered points, which is measured by the Standard Deviation of points’ coordinates, is of strong influence on the determination of the relatively optimal size of the grid cell. More specifically, the benchmark results indicate that: in two dimensions, with the increase of the Standard Deviation of points’ coordinates, the relatively optimal size of the grid cell decreases and eventually converges. We have also fitted the relationships between the Standard Deviation of scattered points’ coordinates and the relatively optimal size of the grid cell, the COD of fitted equation is 0.98192, which indicates the fitting is good. The fitted relationships could be employed to determine the relatively optimal grid cell in kNN search, and further, improve the computational efficiency of spatial interpolations that could be commonly used in the geometric modeling for life science applications.
In this paper, we have only evaluated the effect of the size of grid cell on the efficiency of kNN search executed on the CPU. In the kNN search procedure, there are several logic routines. It has been widely learned that the same logic routines executed on the CPU and GPU may lead to dramatically different efficiencies. Thus, the relationships between the distributions of scattered points between the relatively optimal size of the grid cell obtained on the CPU may differ from those achieved on the GPU. In the future, we will address this problem.
References
Al Aghbari, Z., Al-Hamadi, A.: Efficient KNN search by linear projection of image clusters. Int. J. Intell. Syst. 26(9), 844–865 (2011)
Allen, G., Gandevia, S., Mckenzie, D.: Reliability of measurements of muscle strength and voluntary activation using twitch interpolation. Muscle Nerve 18(6), 593–600 (1995)
Beliakov, G., Li, G.: Improving the speed and stability of the k-nearest neighbors method. Pattern Recognit. Lett. 33(10), 1296–1301 (2012)
Cavoretto, R., Rossi, A.D., Dell’Accio, F., Tommaso, F.D.: Fast computation of triangular shepard interpolants. J. Comput. Appl. Math. (2018). https://doi.org/10.1016/j.cam.2018.03.012
Cuomo, S., Galletti, A., Giunta, G., Marcellino, L.: Reconstruction of implicit curves and surfaces via RBF interpolation. Appl. Numer. Math. 116(1), 157–171 (2017)
Cuomo, S., Galletti, A., Giunta, G., Starace, A.: Surface reconstruction from scattered point via RBF interpolation on GPU. In: Ganzha, M., Maciaszek, L., Paprzycki, M (eds.) 2013 Federated Conference on Computer Science and Information Systems (Fedcsis), pp. 433–440 (2013)
Ding, Z., Mei, G., Cuomo, S., Xu, N., Tian, H.: Performance evaluation of GPU-accelerated spatial interpolation using radial basis functions for building explicit surfaces. Int. J. Parallel Program. 46(5), 963–991 (2018)
Dong, W., Zhang, L., Lukac, R., Shi, G.: Sparse representation based image interpolation with nonlocal autoregressive modeling. IEEE Trans. Image Process. 22(4), 1382–1394 (2013)
Huang, F., Bu, S., Tao, J., Tan, X.: OpenCL implementation of a parallel universal Kriging algorithm for massive spatial data interpolation on heterogeneous systems. ISPRS Int. J. Geo Inf. 5(6), 96 (2016)
Huang, F., Liu, D., Tan, X., Wang, J., Chen, Y., He, B.: Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS. Comput. Geosci. 37(4), 426–434 (2011)
Lehmann, T., Gonner, C., Spitzer, K.: Survey: interpolation methods in medical image processing. IEEE Trans. Med. Image 18(11), 1049–1075 (1999)
Li, S., Harner, E.J., Adjeroh, D.A.: Random KNN. In: Zhou, Z.H., et al. (eds.) 2014 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 629–636 (2014)
Liu, J., Nowinski, W.L.: A hybrid approach to shape-based interpolation of stereotactic atlases of the human brain. Neuroinformatics 4(2), 177–198 (2006)
Liu, S.g., Wei, Y.w.: Fast nearest neighbor searching based on improved VP-tree. Pattern Recognit. Lett. 60–61, 8–15 (2015)
Mei, G.: Evaluating the power of GPU acceleration for IDW interpolation algorithm. Sci. World J. 2014, 8 (2014)
Mei, G., Xu, L., Xu, N.: Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit. R. Soc. Open Sci. 4(9), 170436 (2017)
Mei, G., Xu, N., Xu, L.: Improving GPU-accelerated adaptive IDW interpolation algorithm using fast kNN search. SpringerPlus 5, 1389 (2016)
Meijering, E.: A chronology of interpolation: from ancient astronomy to modern signal and image processing. Proc. IEEE 90(3), 319–342 (2002)
Mirzaei, D.: Analysis of moving least squares approximation revisited. J. Comput. Appl. Math. 282, 237–250 (2015)
Nutanong, S., Zhang, R., Tanin, E., Kulik, L.: V*-kNN: an efficient algorithm for moving k nearest neighbor queries. In: ICDE: 2009 IEEE International Conference on Data Engineering, pp. 1519–1522 (2009)
Pan, J., Manocha, D.: Bi-level locality sensitive hashing for k-nearest neighbor computation. In: 2012 IEEE 28th IEEE International Conference on Data Engineering, pp. 378–389 (2012)
Pan, M.s., Yang, X.l., Tang, J.t.: Research on interpolation methods in medical image processing. J. Med. Syst. 36(2), 777–807 (2012)
Parrott, R., Stytz, M., Amburn, P., Robinson, D.: Towards statistically optimal interpolation for 3-D medical imaging. IEEE Eng. Med. Biol. Mag. 12(3), 49–59 (1993)
Pesquer, L., Cortes, A., Pons, X.: Parallel ordinary Kriging interpolation incorporating automatic variogram fitting. Comput. Geosci. 37(4), 464–473 (2011)
Shankar, V., Wright, G.B., Kirby, R.M., Fogelson, A.L.: A radial basis function (RBF)-finite difference (FD) method for diffusion and reaction-diffusion equations on surfaces. J. Sci. Comput. 63(3), 745–768 (2015)
Volkau, I., Aziz, A., Nowinski, W.: Indirect interpolation of subcortical structures in the Talairach-Tournoux atlas, vol. 5367, pp. 533–537 (2004)
Wang, J., Liu, G.: A point interpolation meshless method based on radial basis functions. Int. J. Numer. Methods Eng. 54(11), 1623–1648 (2002)
Acknowledgements
This work was supported by the Natural Science Foundation of China (Grant Numbers 11602235 and 41772326), the China Postdoctoral Science Foundation (2015M571081), the Fundamental Research Funds for the Central Universities (2652017086).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Fan, N., Mei, G., Ding, Z., Cuomo, S., Xu, N. (2019). Effect of Spatial Decomposition on the Efficiency of k Nearest Neighbors Search in Spatial Interpolation. In: Mencagli, G., et al. Euro-Par 2018: Parallel Processing Workshops. Euro-Par 2018. Lecture Notes in Computer Science(), vol 11339. Springer, Cham. https://doi.org/10.1007/978-3-030-10549-5_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-10549-5_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10548-8
Online ISBN: 978-3-030-10549-5
eBook Packages: Computer ScienceComputer Science (R0)