Abstract
Density Peaks Clustering (DPC) is a recently proposed clustering algorithm that has distinctive advantages over existing clustering algorithms. However, DPC requires computing the distance between every pair of input points, therefore incurring quadratic computation overhead, which is prohibitive for large data sets. To address the efficiency problem of DPC, we propose to use GPU to accelerate DPC. We exploit a spatial index structure VP-Tree to help efficiently maintain the data points. We first propose a vectorized GPU-friendly VP-Tree structure, based on which we propose GDPC algorithm, where the density \(\rho \) and the dependent distance \(\delta \) can be efficiently computed by using GPU. Our results show that GDPC can achieve over 5.3–78.8\(\times \) acceleration compared to the state-of-the-art DPC implementations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ge, K., Su, H., Li, D., Lu, X.: Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit. Front. Inf. Technol. Electron. Eng. 18(7), 915–927 (2017). https://doi.org/10.1631/FITEE.1601786
Gong, S., Zhang, Y., Yu, G.: Clustering stream data by exploring the evolution of density mountain. Proc. VLDB Endow. 11(4), 393–405 (2017)
Kramosil, I., Michálek, J.: Fuzzy metrics and statistical metric spaces. Kybernetika 11(5), 336–344 (1975)
Li, M., Huang, J., Wang, J.: Paralleled fast search and find of density peaks clustering algorithm on gpus with CUDA. In: SNPD ’2016. pp. 313–318 (2016)
Patil, C., Baidari, I.: Estimating the optimal number of clusters k in a dataset using data depth. Data Sci. Eng. 4(2), 132–140 (2019)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: SODA ’93. pp. 311–321 (1993)
Zhang, Y., Chen, S., Yu, G.: Efficient distributed density peaks for clustering large data sets in mapreduce. IEEE Trans. on Knowl. Data Eng. 28(12), 3218–3230 (2016)
Acknowledgements
This work was partially supported by National Key R&D Program of China (2018YFB1003404), National Natural Science Foundation of China (61672141), and Fundamental Research Funds for the Central Universities (N181605017, N181604016).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Su, Y., Zhang, Y., Wan, C., Yu, G. (2020). GDPC: A GPU-Accelerated Density Peaks Clustering Algorithm. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12112. Springer, Cham. https://doi.org/10.1007/978-3-030-59410-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-59410-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59409-1
Online ISBN: 978-3-030-59410-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)