short-paper

Efficient k-means on GPUs

Authors:
Clemens Lutz

DFKI GmbH

DFKI GmbH
View Profile

,
Sebastian Breß

DFKI GmbH

DFKI GmbH
View Profile

,
Tilmann Rabl

TU Berlin

TU Berlin
View Profile

,
Steffen Zeuch

DFKI GmbH

DFKI GmbH
View Profile

,
Volker Markl

TU Berlin

TU Berlin
View Profile

DAMON '18: Proceedings of the 14th International Workshop on Data Management on New HardwareJune 2018Article No.: 3Pages 1–3https://doi.org/10.1145/3211922.3211925

Published:11 June 2018Publication History

DAMON '18: Proceedings of the 14th International Workshop on Data Management on New Hardware

Pages 1–3

ABSTRACT

k-Means is a versatile clustering algorithm widely-used in practice. To cluster large data sets, state-of-the-art implementations use GPUs to shorten the data to knowledge time. These implementations commonly assign points on a GPU and update centroids on a CPU.

We show that this approach has two main drawbacks. First, it separates the two algorithm phases over different processors, which requires an expensive data exchange between devices. Second, even when both phases are computed on the GPU, the same data are read twice per iteration, leading to inefficient use of memory bandwidth.

In this paper, we describe a new approach that executes k-means in a single data pass per iteration. We propose a new algorithm to updates centroids that allows us to perform both phases efficiently on GPUs. Thereby, we remove data transfers within each iteration. We fuse both phases to eliminate artificial synchronization barriers, and thus compute k-means in a single data pass. Overall, we achieve up to 20x higher throughput compared to the state-of-the-art approach.

References

2018. Amazon EC2 Pricing. (May 8 2018). https://aws.amazon.com/ec2/pricing/on-demandGoogle Scholar
2018. Microsoft Azure Pricing. (May 8 2018). https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/Google Scholar
David Arthur and Sergei Vassilvitskii. 2007. k-Means++: The advantages of careful seeding. In ACM-SIAM. 1027--1035. Google ScholarDigital Library
Hong-tao Bai, Li-li He, Dan-tong Ouyang, Zhan-shan Li, and He Li. 2009. k-Means on commodity GPUs with CUDA. In WRI CSIE. 651--655. Google ScholarDigital Library
Sebastian Breß et al. 2017. Generating custom code for efficient query execution on heterogeneous processors. CoRR abs/1709.00700 (2017).Google Scholar
Sebastian Breß, Henning Funke, and Jens Teubner. 2016. Robust query processing in co-processor-accelerated databases. In SIGMOD. 1891--1906. Google ScholarDigital Library
Feng Cao, Anthony K. H. Tung, and Aoying Zhou. 2006. Scalable clustering using graphics processors. In WAIM. 372--384. Google ScholarDigital Library
Christophe Cassou. 2008. Intraseasonal interaction between the Madden-Julian Oscillation and the North Atlantic Oscillation. Nature 455, 7212 (Sept. 2008), 523--527.Google ScholarCross Ref
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IISWC. 44--54. Google ScholarDigital Library
M Dall, DCS Beddows, Peter Tunved, Radovan Krejci, Johan Ström, H-C Hansson, YJ Yoon, Ki-Tae Park, S Becagli, R Udisti, et al. 2017. Arctic sea ice melt leads to atmospheric new particle formation. Scientific reports 7, 1 (2017), 3318.Google Scholar
Reza Farivar, Daniel Rebolledo, Ellick Chan, and Roy H. Campbell. 2008. A parallel implementation of k-means clustering on GPUs. In PDPTA. 340--345.Google Scholar
Henning Funke, Sebastian Breß, Stefan Noll, Volker Markl, and Jens Teubner. 2018. Pipelined query processing in coprocessor environments. In SIGMOD. ACM. Google ScholarDigital Library
Jesse Hall and John Hart. 2004. GPU acceleration of iterative clustering. In GPGPU. 45--52.Google Scholar
Bingsheng He, Mian Lu, Ke Yang, Rui Fang, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. 2009. Relational query coprocessing on graphics processors. TODS 34, 4 (2009). Google ScholarDigital Library
Max Heimel, Michael Saecker, Holger Pirk, Stefan Manegold, and Volker Markl. 2013. Hardware-oblivious parallelism for in-memory column-stores. PVLDB 6, 9 (2013), 709--720. Google ScholarDigital Library
Nathaniel D Heintzman, Rhona K Stuart, Gary Hon, Yutao Fu, Christina W Ching, R David Hawkins, Leah O Barrera, Sara Van Calcar, Chunxu Qu, Keith A Ching, et al. 2007. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nature Genetics 39, 3 (2007), 311.Google ScholarCross Ref
Joseph Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. 2012. The MADlib analytics library or MAD skills, the SQL. PVLDB 5, 12 (2012), 1700--1711. Google ScholarDigital Library
Tomas Karnagel, René Müller, and Guy M. Lohman. 2015. Optimizing GPU-accelerated group-by and aggregation. In ADMS. 13--24.Google Scholar
Kristin M. Kleisner, Michael J. Fogarty, Sally McGee, Analie Barnett, Paula Fratantoni, Jennifer Greene, Jonathan A. Hare, Sean M. Lucey, Christopher McGuire, Jay Odell, Vincent S. Saba, Laurel Smith, Katherine J. Weaver, and Malin L. Pinsky. 2016. The effects of sub-regional climate velocity on the distribution and spatial extent of marine species assemblages. PLOS ONE 11 (02 2016), 1--21.Google Scholar
You Li, Kaiyong Zhao, Xiaowen Chu, and Jiming Liu. 2010. Speeding up k-means algorithm by GPUs. In IEEE CIT. 115--122. Google ScholarDigital Library
Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 2 (1982), 129--136. Google ScholarDigital Library
James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., Vol. 1. 281--297.Google Scholar
Disa Mhembere, Da Zheng, Carey E. Priebe, Joshua T, Vogelstein, and Randal Burns. 2017. knor: A NUMA-optimized in-memory, distributed and semi-external-memory k-means library. In HPDC. Google ScholarDigital Library
Carlos Ordonez. 2004. Programming the k-means clustering algorithm in SQL. In SIGKDD. 823--828. Google ScholarDigital Library
Linnea Passing, Manuel Then, Nina Hubig, Harald Lang, Michael Schreier, Stephan Günnemann, Alfons Kemper, and Thomas Neumann. 2017. SQL- and operator-centric data analytics in relational main-memory databases. In EDBT. 84--95.Google Scholar
Holger Pirk, Stefan Manegold, and Martin L. Kersten. 2014. Waste not... Efficient co-processing of relational data. In ICDE. 508--519.Google Scholar
Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo - A vector algebra for portable database performance on modern hardware. PVLDB 9, 14 (2016), 1707--1718. Google ScholarDigital Library
Conrad Sanderson and Ryan Curtin. 2016. Armadillo: a template-based C++ library for linear algebra. Journal of Open Source Software (2016).Google Scholar
Arul Shalom, Manoranjan Dash, and Minh Tue. 2008. Efficient k-means clustering using accelerated graphics processors. In DaWaK. 166--175. Google ScholarDigital Library
Michael Shindler, Alex Wong, and Adam W. Meyerson. 2011. Fast and accurate k-means for large datasets. In NIPS. 2375--2383. Google ScholarDigital Library
Sarah A Vitak, Kristof A Torkenczy, Jimi L Rosenkrantz, Andrew J Fields, Lena Christiansen, Melissa H Wong, Lucia Carbone, Frank J Steemers, and Andrew Adey. 2017. Sequencing thousands of single-cell genomes with combinatorial indexing. Nature Methods 14, 3 (2017), 302.Google ScholarCross Ref
Fuhui Wu, Qingbo Wu, Yusong Tan, Lifeng Wei, Lisong Shao, and Long Gao. 2013. A vectorized k-means algorithm for Intel Many Integrated Core architecture. In APPT. 277--294. Google ScholarDigital Library
Chongzhi Zang, Tao Wang, Ke Deng, Bo Li, Sheng'en Hu, Qian Qin, Tengfei Xiao, Shihua Zhang, Clifford A. Meyer, Housheng Hansen He, Myles Brown, Jun S. Liu, Yang Xie, and X. Shirley Liu. 2016. High-dimensional genomic data bias correction and data integration using MANCIE. Nature Communications 7 (April 2016), 11305.Google Scholar
Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: An efficient data clustering method for very large databases. In SIGMOD. 103--114. Google ScholarDigital Library

Recommendations

Speeding up k-Means algorithm by GPUs

Cluster analysis plays a critical role in a wide variety of applications; but it is now facing the computational challenge due to the continuously increasing data volume. Parallel computing is one of the most promising solutions to overcoming the ...
Read More
Speeding up K-Means Algorithm by GPUs
CIT '10: Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology

Cluster analysis plays a critical role in a wide variety of applications, but it is now facing the computational challenge due to the continuously increasing data volume. Parallel computing is one of the most promising solutions to overcoming the ...
Read More
Efficient Convex Optimization on GPUs for Embedded Model Predictive Control
GPGPU-10: Proceedings of the General Purpose GPUs

GPU applications have traditionally run on PCs or in larger scale systems. With the introduction of the Tegra line of mobile processors, NVIDIA expanded the types of systems that can exploit the massive parallelism offered by GPU computing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DAMON '18: Proceedings of the 14th International Workshop on Data Management on New Hardware
June 2018
75 pages
ISBN:9781450358538
DOI:10.1145/3211922
Conference Chairs:
Wolfgang Lehner
Technische Universität Dresden
,
Ken Salem
University of Waterloo
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate80of102submissions,78%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 214
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient k-means on GPUs

DAMON '18: Proceedings of the 14th International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Recommendations

Speeding up k-Means algorithm by GPUs

Speeding up K-Means Algorithm by GPUs

Efficient Convex Optimization on GPUs for Embedded Model Predictive Control