ABSTRACT
k-Means is a versatile clustering algorithm widely-used in practice. To cluster large data sets, state-of-the-art implementations use GPUs to shorten the data to knowledge time. These implementations commonly assign points on a GPU and update centroids on a CPU.
We show that this approach has two main drawbacks. First, it separates the two algorithm phases over different processors, which requires an expensive data exchange between devices. Second, even when both phases are computed on the GPU, the same data are read twice per iteration, leading to inefficient use of memory bandwidth.
In this paper, we describe a new approach that executes k-means in a single data pass per iteration. We propose a new algorithm to updates centroids that allows us to perform both phases efficiently on GPUs. Thereby, we remove data transfers within each iteration. We fuse both phases to eliminate artificial synchronization barriers, and thus compute k-means in a single data pass. Overall, we achieve up to 20x higher throughput compared to the state-of-the-art approach.
- 2018. Amazon EC2 Pricing. (May 8 2018). https://aws.amazon.com/ec2/pricing/on-demandGoogle Scholar
- 2018. Microsoft Azure Pricing. (May 8 2018). https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/Google Scholar
- David Arthur and Sergei Vassilvitskii. 2007. k-Means++: The advantages of careful seeding. In ACM-SIAM. 1027--1035. Google ScholarDigital Library
- Hong-tao Bai, Li-li He, Dan-tong Ouyang, Zhan-shan Li, and He Li. 2009. k-Means on commodity GPUs with CUDA. In WRI CSIE. 651--655. Google ScholarDigital Library
- Sebastian Breß et al. 2017. Generating custom code for efficient query execution on heterogeneous processors. CoRR abs/1709.00700 (2017).Google Scholar
- Sebastian Breß, Henning Funke, and Jens Teubner. 2016. Robust query processing in co-processor-accelerated databases. In SIGMOD. 1891--1906. Google ScholarDigital Library
- Feng Cao, Anthony K. H. Tung, and Aoying Zhou. 2006. Scalable clustering using graphics processors. In WAIM. 372--384. Google ScholarDigital Library
- Christophe Cassou. 2008. Intraseasonal interaction between the Madden-Julian Oscillation and the North Atlantic Oscillation. Nature 455, 7212 (Sept. 2008), 523--527.Google ScholarCross Ref
- Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IISWC. 44--54. Google ScholarDigital Library
- M Dall, DCS Beddows, Peter Tunved, Radovan Krejci, Johan Ström, H-C Hansson, YJ Yoon, Ki-Tae Park, S Becagli, R Udisti, et al. 2017. Arctic sea ice melt leads to atmospheric new particle formation. Scientific reports 7, 1 (2017), 3318.Google Scholar
- Reza Farivar, Daniel Rebolledo, Ellick Chan, and Roy H. Campbell. 2008. A parallel implementation of k-means clustering on GPUs. In PDPTA. 340--345.Google Scholar
- Henning Funke, Sebastian Breß, Stefan Noll, Volker Markl, and Jens Teubner. 2018. Pipelined query processing in coprocessor environments. In SIGMOD. ACM. Google ScholarDigital Library
- Jesse Hall and John Hart. 2004. GPU acceleration of iterative clustering. In GPGPU. 45--52.Google Scholar
- Bingsheng He, Mian Lu, Ke Yang, Rui Fang, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. 2009. Relational query coprocessing on graphics processors. TODS 34, 4 (2009). Google ScholarDigital Library
- Max Heimel, Michael Saecker, Holger Pirk, Stefan Manegold, and Volker Markl. 2013. Hardware-oblivious parallelism for in-memory column-stores. PVLDB 6, 9 (2013), 709--720. Google ScholarDigital Library
- Nathaniel D Heintzman, Rhona K Stuart, Gary Hon, Yutao Fu, Christina W Ching, R David Hawkins, Leah O Barrera, Sara Van Calcar, Chunxu Qu, Keith A Ching, et al. 2007. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nature Genetics 39, 3 (2007), 311.Google ScholarCross Ref
- Joseph Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. 2012. The MADlib analytics library or MAD skills, the SQL. PVLDB 5, 12 (2012), 1700--1711. Google ScholarDigital Library
- Tomas Karnagel, René Müller, and Guy M. Lohman. 2015. Optimizing GPU-accelerated group-by and aggregation. In ADMS. 13--24.Google Scholar
- Kristin M. Kleisner, Michael J. Fogarty, Sally McGee, Analie Barnett, Paula Fratantoni, Jennifer Greene, Jonathan A. Hare, Sean M. Lucey, Christopher McGuire, Jay Odell, Vincent S. Saba, Laurel Smith, Katherine J. Weaver, and Malin L. Pinsky. 2016. The effects of sub-regional climate velocity on the distribution and spatial extent of marine species assemblages. PLOS ONE 11 (02 2016), 1--21.Google Scholar
- You Li, Kaiyong Zhao, Xiaowen Chu, and Jiming Liu. 2010. Speeding up k-means algorithm by GPUs. In IEEE CIT. 115--122. Google ScholarDigital Library
- Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 2 (1982), 129--136. Google ScholarDigital Library
- James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., Vol. 1. 281--297.Google Scholar
- Disa Mhembere, Da Zheng, Carey E. Priebe, Joshua T, Vogelstein, and Randal Burns. 2017. knor: A NUMA-optimized in-memory, distributed and semi-external-memory k-means library. In HPDC. Google ScholarDigital Library
- Carlos Ordonez. 2004. Programming the k-means clustering algorithm in SQL. In SIGKDD. 823--828. Google ScholarDigital Library
- Linnea Passing, Manuel Then, Nina Hubig, Harald Lang, Michael Schreier, Stephan Günnemann, Alfons Kemper, and Thomas Neumann. 2017. SQL- and operator-centric data analytics in relational main-memory databases. In EDBT. 84--95.Google Scholar
- Holger Pirk, Stefan Manegold, and Martin L. Kersten. 2014. Waste not... Efficient co-processing of relational data. In ICDE. 508--519.Google Scholar
- Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo - A vector algebra for portable database performance on modern hardware. PVLDB 9, 14 (2016), 1707--1718. Google ScholarDigital Library
- Conrad Sanderson and Ryan Curtin. 2016. Armadillo: a template-based C++ library for linear algebra. Journal of Open Source Software (2016).Google Scholar
- Arul Shalom, Manoranjan Dash, and Minh Tue. 2008. Efficient k-means clustering using accelerated graphics processors. In DaWaK. 166--175. Google ScholarDigital Library
- Michael Shindler, Alex Wong, and Adam W. Meyerson. 2011. Fast and accurate k-means for large datasets. In NIPS. 2375--2383. Google ScholarDigital Library
- Sarah A Vitak, Kristof A Torkenczy, Jimi L Rosenkrantz, Andrew J Fields, Lena Christiansen, Melissa H Wong, Lucia Carbone, Frank J Steemers, and Andrew Adey. 2017. Sequencing thousands of single-cell genomes with combinatorial indexing. Nature Methods 14, 3 (2017), 302.Google ScholarCross Ref
- Fuhui Wu, Qingbo Wu, Yusong Tan, Lifeng Wei, Lisong Shao, and Long Gao. 2013. A vectorized k-means algorithm for Intel Many Integrated Core architecture. In APPT. 277--294. Google ScholarDigital Library
- Chongzhi Zang, Tao Wang, Ke Deng, Bo Li, Sheng'en Hu, Qian Qin, Tengfei Xiao, Shihua Zhang, Clifford A. Meyer, Housheng Hansen He, Myles Brown, Jun S. Liu, Yang Xie, and X. Shirley Liu. 2016. High-dimensional genomic data bias correction and data integration using MANCIE. Nature Communications 7 (April 2016), 11305.Google Scholar
- Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: An efficient data clustering method for very large databases. In SIGMOD. 103--114. Google ScholarDigital Library
Recommendations
Speeding up k-Means algorithm by GPUs
Cluster analysis plays a critical role in a wide variety of applications; but it is now facing the computational challenge due to the continuously increasing data volume. Parallel computing is one of the most promising solutions to overcoming the ...
Speeding up K-Means Algorithm by GPUs
CIT '10: Proceedings of the 2010 10th IEEE International Conference on Computer and Information TechnologyCluster analysis plays a critical role in a wide variety of applications, but it is now facing the computational challenge due to the continuously increasing data volume. Parallel computing is one of the most promising solutions to overcoming the ...
Efficient Convex Optimization on GPUs for Embedded Model Predictive Control
GPGPU-10: Proceedings of the General Purpose GPUsGPU applications have traditionally run on PCs or in larger scale systems. With the introduction of the Tegra line of mobile processors, NVIDIA expanded the types of systems that can exploit the massive parallelism offered by GPU computing ...
Comments