poster

K-means Split Revisited: Well-grounded Approach and Experimental Evaluation

Authors:
Valentin Grigorev

Saint-Petersburg State University, Saint-Petersburg, Russian Fed.

Saint-Petersburg State University, Saint-Petersburg, Russian Fed.
View Profile

,
George Chernishev

Saint-Petersburg State University, Saint-Petersburg, Russian Fed.

Saint-Petersburg State University, Saint-Petersburg, Russian Fed.
View Profile

SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016Pages 2251–2252https://doi.org/10.1145/2882903.2914833

Published:26 June 2016Publication History

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Pages 2251–2252

ABSTRACT

R-tree is a data structure used for multidimensional indexing. Essentially, it is a balanced tree consisting of nested hyper-rectangles which are used to locate the data. One of the most performance sensitive parts of this data structure is its split algorithm, which runs during node overflows. The split can be performed in multiple ways, according to many different criteria and in general the problem of finding an optimal solution is NP-hard. There are many heuristic split algorithms. In this paper we study an existing k-means node split algorithm. We describe a number of serious issues in its theoretical foundation, which made us to re-design k-means split. We propose several well-grounded solutions to the re-emerged problem of k-means split. Finally, we report the comparison results using PostgreSQL and contemporary benchmark for multidimensional structures.

References

N. Beckmann and B. Seeger. A benchmark for multidimensional index structures. http://www.mathematik.uni-marburg.de/~rstar/benchmark/distributions.pdf, 2008.Google Scholar
N. Beckmann and B. Seeger. A revised R*-tree in comparison with related index structures. ACM SIGMOD, pages 799--812, 2009. Google ScholarDigital Library
S. Brakatsoulas et al. Revisiting R-Tree Construction Principles. ADBIS, pages 149--162, 2002. Google ScholarDigital Library
M. Chavent and J. Saracco. On central tendency and dispersion measures for intervals and hypercubes. Communications in Statistics--Theory and Methods, 37(9):1471--1482, 2008.Google ScholarCross Ref
A. Guttman. R-trees: a dynamic index structure for spatial searching. SIGMOD Rec., 14(2):47--57, 1984. Google ScholarDigital Library
A. N. Papadopoulos et al. R-Tree (and Family). In L. Liu and M. T. Özsu, editors, Encyclopedia of Database Systems, pages 2453--2459. 2009.Google ScholarCross Ref

Index Terms

K-means Split Revisited: Well-grounded Approach and Experimental Evaluation
1. Information systems
  1. Data management systems
    1. Data structures
      1. Data access methods
  2. Information systems applications
    1. Data mining
      1. Clustering
    2. Spatial-temporal systems
      1. Geographic information systems

Recommendations

Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number ...
Read More
Automatic Cluster Number Selection Using a Split and Merge K-Means Approach
DEXA '09: Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application

The k-means method is a simple and fast clustering technique that exhibits the problem of specifying the optimal number of clusters preliminarily. We address the problem of cluster number selection by using a k-means approach that exploits local changes ...
Read More
Initializing K-means Clustering Using Affinity Propagation
HIS '09: Proceedings of the 2009 Ninth International Conference on Hybrid Intelligent Systems - Volume 01

K-means clustering is widely used due to its fast convergence, but it is sensitive to the initial condition.Therefore, many methods of initializing K-means clustering have been proposed in the literatures. Compared with Kmeans clustering, a novel ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
General Chairs:
Fatma Özcan
IBM Research, USA
,
Georgia Koutrika
HP Labs, USA
,
Program Chair:
Sam Madden
Massachusetts Institute of Technology, USA
Copyright © 2016 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2016
Check for updates
Author Tags
k-means
multidimensional indexing
r-tree
r-tree split
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 196
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

K-means Split Revisited: Well-grounded Approach and Experimental Evaluation

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

Automatic Cluster Number Selection Using a Split and Merge K-Means Approach

Initializing K-means Clustering Using Affinity Propagation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

K-means Split Revisited: Well-grounded Approach and Experimental Evaluation

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

Automatic Cluster Number Selection Using a Split and Merge K-Means Approach

Initializing K-means Clustering Using Affinity Propagation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media