ABSTRACT
R-tree is a data structure used for multidimensional indexing. Essentially, it is a balanced tree consisting of nested hyper-rectangles which are used to locate the data. One of the most performance sensitive parts of this data structure is its split algorithm, which runs during node overflows. The split can be performed in multiple ways, according to many different criteria and in general the problem of finding an optimal solution is NP-hard. There are many heuristic split algorithms. In this paper we study an existing k-means node split algorithm. We describe a number of serious issues in its theoretical foundation, which made us to re-design k-means split. We propose several well-grounded solutions to the re-emerged problem of k-means split. Finally, we report the comparison results using PostgreSQL and contemporary benchmark for multidimensional structures.
- N. Beckmann and B. Seeger. A benchmark for multidimensional index structures. http://www.mathematik.uni-marburg.de/~rstar/benchmark/distributions.pdf, 2008.Google Scholar
- N. Beckmann and B. Seeger. A revised R*-tree in comparison with related index structures. ACM SIGMOD, pages 799--812, 2009. Google ScholarDigital Library
- S. Brakatsoulas et al. Revisiting R-Tree Construction Principles. ADBIS, pages 149--162, 2002. Google ScholarDigital Library
- M. Chavent and J. Saracco. On central tendency and dispersion measures for intervals and hypercubes. Communications in Statistics--Theory and Methods, 37(9):1471--1482, 2008.Google ScholarCross Ref
- A. Guttman. R-trees: a dynamic index structure for spatial searching. SIGMOD Rec., 14(2):47--57, 1984. Google ScholarDigital Library
- A. N. Papadopoulos et al. R-Tree (and Family). In L. Liu and M. T. Özsu, editors, Encyclopedia of Database Systems, pages 2453--2459. 2009.Google ScholarCross Ref
Index Terms
- K-means Split Revisited: Well-grounded Approach and Experimental Evaluation
Recommendations
Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology
This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number ...
Automatic Cluster Number Selection Using a Split and Merge K-Means Approach
DEXA '09: Proceedings of the 2009 20th International Workshop on Database and Expert Systems ApplicationThe k-means method is a simple and fast clustering technique that exhibits the problem of specifying the optimal number of clusters preliminarily. We address the problem of cluster number selection by using a k-means approach that exploits local changes ...
Initializing K-means Clustering Using Affinity Propagation
HIS '09: Proceedings of the 2009 Ninth International Conference on Hybrid Intelligent Systems - Volume 01K-means clustering is widely used due to its fast convergence, but it is sensitive to the initial condition.Therefore, many methods of initializing K-means clustering have been proposed in the literatures. Compared with Kmeans clustering, a novel ...
Comments