poster

Improving the performance of k-means clustering through computation skipping and data locality optimizations

Authors:
Orhan Kislal

The Pennsylvania State University, University Park, PA, USA

The Pennsylvania State University, University Park, PA, USA
View Profile

,
Piotr Berman

The Pennsylvania State University, University Park, PA, USA

The Pennsylvania State University, University Park, PA, USA
View Profile

,
Mahmut Kandemir

The Pennsylvania State University, University Park, USA

The Pennsylvania State University, University Park, USA
View Profile

CF '12: Proceedings of the 9th conference on Computing FrontiersMay 2012Pages 273–276https://doi.org/10.1145/2212908.2212951

Published:15 May 2012Publication History

CF '12: Proceedings of the 9th conference on Computing Frontiers

Pages 273–276

ABSTRACT

We present three different optimization techniques for k-means clustering algorithm to improve the running time without decreasing the accuracy of the cluster centers significantly. Our first optimization restructures loops to improve cache behavior when executing on multicore architectures. The remaining two optimizations skip select points to reduce execution latency. Our sensitivity analysis suggests that the performance can be enhanced through a good understanding of the data and careful configuration of the parameters.

References

Corel Corporation. http://www.corel.com/.Google Scholar
A. Frank and A. Asuncion. {UCI} machine learning repository, 2010.Google Scholar
S. P. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129--136, 1982. Google ScholarDigital Library
NU-MineBench. http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html.Google Scholar
Wind River Simics. http://www.virtutech.com/.Google Scholar

Index Terms

Improving the performance of k-means clustering through computation skipping and data locality optimizations
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Improved k- means clustering algorithm for two dimensional data
CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

Clustering is a procedure of organizing the objects in groups whose member exhibits some kind of similarity. So a cluster is a collection of objects which are alike and are different from the objects belonging to other clusters. K-Means is one of ...
Read More
Clustering stability-based Evolutionary K-Means

Evolutionary K-Means (EKM), which combines K-Means and genetic algorithm, solves K-Means' initiation problem by selecting parameters automatically through the evolution of partitions. Currently, EKM algorithms usually choose silhouette index as cluster ...
Read More
Interpolation-based k-means Clustering Improvement for Sparse, High Dimensional Data
ICCBDC '19: Proceedings of the 2019 3rd International Conference on Cloud and Big Data Computing

The k-means algorithm is characterized by simple implementation and fast speed, and is the most widely used clustering algorithm. Aiming at the shortcomings of k-means algorithm in noise sensitivity in high-dimensional sparse data sets, the IB k-means (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CF '12: Proceedings of the 9th conference on Computing Frontiers
May 2012
320 pages
ISBN:9781450312158
DOI:10.1145/2212908
General Chair:
John Feo
Pacific Northwest National Laboratory, USA
,
Program Chairs:
Paolo Faraboschi
HP Labs, Spain
,
Oreste Villa
Pacific Northwest National Laboratory, USA
Copyright © 2012 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 May 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
data mining
k-means algorithm
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate240of680submissions,35%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 228
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving the performance of k-means clustering through computation skipping and data locality optimizations

CF '12: Proceedings of the 9th conference on Computing Frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improved k- means clustering algorithm for two dimensional data

Clustering stability-based Evolutionary K-Means

Interpolation-based k-means Clustering Improvement for Sparse, High Dimensional Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving the performance of k-means clustering through computation skipping and data locality optimizations

CF '12: Proceedings of the 9th conference on Computing Frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improved k- means clustering algorithm for two dimensional data

Clustering stability-based Evolutionary K-Means

Interpolation-based k-means Clustering Improvement for Sparse, High Dimensional Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media