Efficient and Scalable k‑Means on GPUs

Lutz, Clemens; Breß, Sebastian; Rabl, Tilmann; Zeuch, Steffen; Markl, Volker

doi:10.1007/s13222-018-0293-x

Efficient and Scalable k‑Means on GPUs

Schwerpunktbeitrag
Published: 06 September 2018

Volume 18, pages 157–169, (2018)
Cite this article

Datenbank-Spektrum Aims and scope Submit manuscript

Clemens Lutz ORCID: orcid.org/0000-0002-6193-4734¹,
Sebastian Breß¹,
Tilmann Rabl²,
Steffen Zeuch¹ &
…
Volker Markl²

378 Accesses
10 Citations
Explore all metrics

Abstract

k-Means is a versatile clustering algorithm widely used in practice. To cluster large data sets, state-of-the-art implementations use GPUs to shorten the data to knowledge time. These implementations commonly assign points on a GPU and update centroids on a CPU.

We identify two main shortcomings of this approach. First, it requires expensive data exchange between processors when switching between the two processing steps point assignment and centroid update. Second, even when processing both steps of k-means on the same processor, points still need to be read two times within an iteration, leading to inefficient use of memory bandwidth.

In this paper, we present a novel approach for centroid update that allows us to efficiently process both phases of k-means on GPUs. We fuse point assignment and centroid update to execute one iteration with a single pass over the points. Our evaluation shows that our k-means approach scales to very large data sets. Overall, we achieve up to 20 × higher throughput compared to the state-of-the-art approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large scale K-means clustering using GPUs

Article Open access 18 October 2022

The Parallelization and Optimization of K-means Algorithm Based on MGPUSim

CPU and GPU parallelized kernel K-means

Article 22 May 2018

Notes

We previously sketched our work as a two-page short paper [25].
Note that the Cross-Processing strategy uses the GPU for point assignment, whereas Single-Pass and Multi-Pass are executed on CPU only. Therefore we include the Cross-Processing strategy in both plots.

References

Amazon EC (2018) Amazon ec2 pricing. https://aws.amazon.com/ec2/pricing/on-demand. Accessed: 25 May 2018
Google Scholar
Arthur D, Vassilvitskii S (2007) k‑means++: The advantages of careful seeding. In: ACM-SIAM, pp 1027–1035
Google Scholar
Bai H et al (2009) k‑means on commodity GPUs with CUDA. In: WRI CSIE, pp 651–655
Google Scholar
Breß S, Funke H, Teubner J (2016) Robust query processing in co-processor-accelerated databases. In: SIGMOD, pp 1891–1906
Chapter Google Scholar
Breß S et al (2017) Generating custom code for efficient query execution on heterogeneous processors. CoRR abs/1709.00700
Google Scholar
Cao F, Tung AKH, Zhou A (2006) Scalable clustering using graphics processors. In: WAIM, pp 372–384
Google Scholar
Cassou C (2008) Intraseasonal interaction between the madden–julian oscillation and the north atlantic oscillation. Nature 455(7212):523–527
Article Google Scholar
Che S et al (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IISWC, pp 44–54
Google Scholar
Dall M et al (2017) Arctic sea ice melt leads to atmospheric new particle formation. Sci Rep 7(1):3318
Article Google Scholar
Elkan C (2003) Using the triangle inequality to accelerate k‑means. In: ICML, pp 147–153
Google Scholar
Fang W et al (2008) Parallel data mining on graphics processors. Tech. Rep. HKUST-CS08-07, HKUST
Google Scholar
Farivar R et al (2008) A parallel implementation of k‑means clustering on GPUs. In: PDPTA, pp 340–345
Google Scholar
Fernando R (2004) GPU gems: programming techniques, tips and tricks for real-time graphics. In: Pearson higher education (chap 37.2)
Google Scholar
Funke H et al (2018) Pipelined query processing in coprocessor environments. In: SIGMOD, ACM
Google Scholar
Hall J, Hart J (2004) GPU acceleration of iterative clustering. In: GPGPU, pp 45–52
Google Scholar
He B et al (2009) Relational query coprocessing on graphics processors. ACM Trans Database Syst. https://doi.org/10.1145/1620585.1620588
Article Google Scholar
Heimel M et al (2013) Hardware-oblivious parallelism for in-memory column-stores. Proceedings VLDB Endowment 6(9):709–720
Article Google Scholar
Heintzman ND et al (2007) Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet 39(3):311
Article Google Scholar
Hellerstein J et al (2012) The MADlib analytics library or MAD skills, the SQL. Proceedings VLDB Endowment 5(12):1700–1711
Article Google Scholar
Karnagel T, Müller R, Lohman GM (2015) Optimizing GPU-accelerated group-by and aggregation. In: ADMS, pp 13–24
Google Scholar
Kleisner KM et al (2016) The effects of sub-regional climate velocity on the distribution and spatial extent of marine species assemblages. PLoS ONE 11:1–21
Article Google Scholar
Lee S et al (2016) Evaluation of k‑means data clustering algorithm on intel xeon phi. In: BigData, pp 2251–2260
Google Scholar
Li Y et al (2010) Speeding up k‑means algorithm by GPUs. In: IEEE CIT, pp 115–122
Google Scholar
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–136
Article MathSciNet Google Scholar
Lutz C et al (2018) Efficient k‑means on GPUs. In: DaMoN https://doi.org/10.1145/3211922.3211925
Chapter Google Scholar
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., vol 1, pp 281–297
Google Scholar
Mhembere D et al (2017) knor: A NUMA-optimized in-memory, distributed and semi-external-memory k‑means library. In: HPDC
Google Scholar
Müller I et al (2015) Cache-efficient aggregation: hashing is sorting. In: SIGMOD, pp 1123–1136
Google Scholar
Nugteren C et al (2011) High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs. In: GPGPU, p 1
Google Scholar
Nvidia (2017a) CUDA C programming guide. Tech. Rep. PG-02829-001_v8.0. http://docs.nvidia.com/pdf/CUDA_C_Programming_Guide.pdf. Accessed: 20 Jan 2017
Google Scholar
Nvidia (2017b) Tuning CUDA applications for maxwell. Tech. Rep. DA-07173-001_v9.0. http://docs.nvidia.com/cuda/pdf/Maxwell_Tuning_Guide.pdf. Accessed: 20 Jan 2017
Google Scholar
Passing L et al (2017) SQL- and operator-centric data analytics in relational main-memory databases. In: EDBT, pp 84–95
Google Scholar
Pirk H, Manegold S, Kersten ML (2014) Waste not…efficient co-processing of relational data. In: ICDE, pp 508–519
Google Scholar
Pirk H et al (2016) Voodoo – A vector algebra for portable database performance on modern hardware. Proceedings VLDB Endowment 9(14):1707–1718
Article Google Scholar
Sanderson C, Curtin R (2016) Armadillo: a template-based c++ library for linear algebra. J Open Source Softw. https://doi.org/10.21105/joss.00026
Article Google Scholar
Shalom A, Dash M, Tue M (2008) Efficient k‑means clustering using accelerated graphics processors. In: DaWaK, pp 166–175
Google Scholar
Shindler M, Wong A, Meyerson AW (2011) Fast and accurate k‑means for large datasets. In: NIPS, pp 2375–2383
Google Scholar
Sitaridi EA, Ross KA (2013) Optimizing select conditions on gpus. In: DaMoN, p 4
Google Scholar
Stehle E, Jacobsen H (2017) A memory bandwidth-efficient hybrid radix sort on GPUs. In: SIGMOD, pp 417–432
Google Scholar
TPC-H (2017) Transaction processing performance council. http://www.tpc.org/tpch. Accessed: 29 Sep 2017
Google Scholar
Vitak SA et al (2017) Sequencing thousands of single-cell genomes with combinatorial indexing. Nat Methods 14(3):302
Article Google Scholar
Wu F et al (2013) A vectorized k‑means algorithm for intel many integrated core architecture. In: APPT, pp 277–294
Google Scholar
Zang C et al (2016) High-dimensional genomic data bias correction and data integration using mancie. Nat Commun 7:11305
Article Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: SIGMOD, pp 103–114
Chapter Google Scholar

Download references

Acknowledgements

This work was funded by the EU projects SAGE (671500) and E2Data (780245), DFG Priority Program “Scalable Data Management for Future Hardware” (MA4662-5), and the German Ministry for Education and Research as BBDC (01IS14013A).

Author information

Authors and Affiliations

DFKI GmbH, Berlin, Germany
Clemens Lutz, Sebastian Breß & Steffen Zeuch
TU Berlin, Berlin, Germany
Tilmann Rabl & Volker Markl

Authors

Clemens Lutz
View author publications
You can also search for this author inPubMed Google Scholar
Sebastian Breß
View author publications
You can also search for this author inPubMed Google Scholar
Tilmann Rabl
View author publications
You can also search for this author inPubMed Google Scholar
Steffen Zeuch
View author publications
You can also search for this author inPubMed Google Scholar
Volker Markl
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Clemens Lutz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lutz, C., Breß, S., Rabl, T. et al. Efficient and Scalable k‑Means on GPUs. Datenbank Spektrum 18, 157–169 (2018). https://doi.org/10.1007/s13222-018-0293-x

Download citation

Received: 31 May 2018
Accepted: 10 August 2018
Published: 06 September 2018
Issue Date: November 2018
DOI: https://doi.org/10.1007/s13222-018-0293-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient and Scalable k‑Means on GPUs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Large scale K-means clustering using GPUs

The Parallelization and Optimization of K-means Algorithm Based on MGPUSim

CPU and GPU parallelized kernel K-means

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now