research-article

Accelerating DBSCAN Algorithm with AI Chips for Large Datasets

Authors:

Cho-Li WangAuthors Info & Claims

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

Article No.: 51, Pages 1 - 11

https://doi.org/10.1145/3472456.3473518

Published: 05 October 2021 Publication History

Abstract

DBSCAN is a popular clustering algorithm, which shows great success in many real-world applications. Its advantages come at the expense of massive computation, especially for computing the distance matrix. Driven by deep learning, many Artificial Intelligence (AI) chips have been developed. With efficient matrix multiplication units, AI chips can significantly accelerate the distance calculation. However, DBSCAN also needs to identify and count the neighbors for each point. It is challenging for most AI chips due to over-specialization. Moreover, the increasing data size and the limited device memory capacity force DBSCAN to follow a mini-batch manner. It results in a high data transfer overhead, which further hinders the performance of DBSCAN on AI chips. In this paper, we propose two novel techniques to address the challenges of accelerating the DBSCAN algorithm with AI chips: (1) new neighbor identification algorithms using bitwise operations only, while traditional solutions require the compare-and-select operations that are weakly supported in AI chips; and (2) two speculative execution strategies to reduce the data transfer overhead induced by mini-batches. Evaluations show that deploying distance matrix calculation to tensor cores achieves 2.61 × speedup on Nvidia RTX 3090. On Huawei Ascend 310, our neighbor identification algorithms achieve 17.88 × throughout of using CPUs for neighbor identification. The speculative execution strategies further reduce the execution time by 15.1% on average for normal datasets and up to 99.0% for sparse datasets.

References

[1]

T Aaamodt and A Boktor. 2012. Gpgpu-sim 3. x: A performance simulator for many-core accelerator research. In International Symposium on Computer Architecture (ISCA), http://www. gpgpu-sim. org/isca2012-tutorial.

[2]

Hamed Habibi Aghdam and Elnaz Jahani Heravi. 2017. Guide to convolutional neural networks. New York, NY: Springer 10 (2017), 978–973.

[3]

Thomas D Ahle and Francesco Silvestri. 2020. Similarity search with tensor core units. In International Conference on Similarity Search and Applications. Springer, 76–84.

Digital Library

[4]

Guilherme Andrade, Gabriel Ramos, Daniel Madeira, Rafael Sachetto, Renato Ferreira, and Leonardo Rocha. 2013. G-dbscan: A gpu accelerated algorithm for density-based clustering. Procedia Computer Science 18 (2013), 369–378.

[5]

S Baxter and D Merrill. 2013. Efficient Merge, Search, and Set Operations on GPUs. Online (2013).

[6]

Cambricon. 2021. Cambricon BANG C Developer Guide. https://www.cambricon.com/docs/bangc/developer_guide_html/

[7]

Min Chen, Xuedong Gao, and HuiFei Li. 2010. Parallel DBSCAN with priority r-tree. In 2010 2nd IEEE International Conference on Information Management and Engineering. IEEE, 508–511.

[8]

Jack Choquette, Wishwesh Gandhi, Olivier Giroux, Nick Stam, and Ronny Krashinsky. 2021. NVIDIA A100 Tensor Core GPU: Performance and Innovation. IEEE Micro (2021).

[9]

Abdul Dakkak, Cheng Li, Jinjun Xiong, Isaac Gelado, and Wen-mei Hwu. 2019. Accelerating reduction and scan using tensor core units. In Proceedings of the ACM International Conference on Supercomputing. 46–57.

Digital Library

[10]

Mohamad Dolatshah, Ali Hadian, and Behrouz Minaei-Bidgoli. 2015. Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces. arXiv preprint arXiv:1511.00628(2015).

[11]

Pedro Domingos. 2012. A few useful things to know about machine learning. Commun. ACM 55, 10 (2012), 78–87.

Digital Library

[12]

Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, Vol. 96. 226–231.

[13]

Xiufen Fu, Yaguang Wang, Yanna Ge, Peiwen Chen, and Shaohua Teng. 2013. Research and application of DBSCAN algorithm based on Hadoop platform. In Joint International Conference on Pervasive Computing and the Networked World. Springer, 73–87.

[14]

Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. 2012. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of computing 8, 1 (2012), 321–350.

[15]

Yaobin He, Haoyu Tan, Wuman Luo, Huajian Mao, Di Ma, Shengzhong Feng, and Jianping Fan. 2011. Mr-dbscan: an efficient parallel density-based clustering algorithm using mapreduce. In 2011 IEEE 17th International Conference on Parallel and Distributed Systems. IEEE, 473–480.

Digital Library

[16]

Huawei. 2021. Tensor Boost Engine Operator Development Guide. https://support.huaweicloud.com/odevg-Inference-cann/odevg-Inference-cann.pdf

[17]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869–6898.

Digital Library

[18]

Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P Scarpazza. 2018. Dissecting the NVIDIA volta GPU architecture via microbenchmarking. arXiv preprint arXiv:1804.06826(2018).

[19]

Heinrich Jiang, Jennifer Jang, and Jakub Łącki. 2020. Faster DBSCAN via subsampled similarity queries. arXiv preprint arXiv:2006.06743(2020).

[20]

Yang Jiao, Liang Han, and Xin Long. 2020. Hanguang 800 NPU–The Ultimate AI Inference Solution for Data Centers. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1–29.

[21]

Gary J Katz and Joseph T Kider. 2008. All-pairs shortest-paths for large graphs on the GPU. (2008).

[22]

Ravi Kumar, Manish Purohit, Zoya Svitkina, Erik Vee, and Joshua Wang. 2019. Efficient rematerialization for deep networks. Advances in Neural Information Processing Systems 32 (2019), 15172–15181.

[23]

Heng Liao, Jiajin Tu, Jing Xia, and Xiping Zhou. 2019. Davinci: A scalable architecture for neural network computing. In 2019 IEEE Hot Chips 31 Symposium (HCS). IEEE Computer Society, 1–44.

[24]

Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. 2017. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 212–220.

[25]

Yuechao Lu, Ichitaro Yamazaki, Fumihiko Ino, Yasuyuki Matsushita, Stanimire Tomov, and Jack Dongarra. 2020. Reducing the amount of out-of-core data access for GPU-accelerated randomized SVD. Concurrency and Computation: Practice and Experience 32, 19(2020), e5754.

[26]

Guangchun Luo, Xiaoyu Luo, Thomas Fairley Gooch, Ling Tian, and Ke Qin. 2016. A parallel dbscan algorithm based on spark. In 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom). IEEE, 548–553.

[27]

Lijuan Luo, Martin Wong, and Wen-mei Hwu. 2010. An effective GPU implementation of breadth-first search. In Design Automation Conference. IEEE, 52–55.

Digital Library

[28]

Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S Vetter. 2018. Nvidia tensor core programmability, performance & precision. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 522–531.

[29]

Duane Merrill, Michael Garland, and Andrew Grimshaw. 2015. High-performance and scalable GPU graph traversal. ACM Transactions on Parallel Computing (TOPC) 1, 2 (2015), 1–30.

Digital Library

[30]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546(2013).

[31]

Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, and Matteo Interlandi. 2020. A Tensor Compiler for Unified Machine Learning Prediction Serving. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 899–917.

[32]

Aaron Nech and Ira Kemelmacher-Shlizerman. 2017. Level playing field for million scale face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7044–7053.

[33]

Thomas Norrie, Nishant Patil, Doe Hyun Yoon, George Kurian, Sheng Li, James Laudon, Cliff Young, Norman P Jouppi, and David Patterson. 2020. Google’s Training Chips Revealed: TPUv2 and TPUv3. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1–70.

[34]

NVIDIA. 2018. NVIDIA TURING GPU ARCHITECTURE. Online (2018).

[35]

NVIDIA. 2020. NVIDIA AMPERE GA102 GPU ARCHITECTURE. Online (2020).

[36]

NVIDIA. 2020. RAPIDS Memory Manager (RMM). https://docs.rapids.ai/api/rmm/stable/basics.html

[37]

NVIDIA. 2021. Programming Guide: CUDA Toolkit Documentation.

[38]

Opendota. 2021. Community-maintained open source Dota 2 data platform. https://www.opendota.com/

[39]

Md Aamir Raihan, Negar Goli, and Tor M Aamodt. 2019. Modeling deep learning accelerator enabled gpus. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 79–92.

[40]

Sebastian Raschka, Joshua Patterson, and Corey Nolet. 2020. Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence. arXiv preprint arXiv:2002.04803(2020).

[41]

Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS) 42, 3 (2017), 1–21.

Digital Library

[42]

SCI-Compiler. 2018. Ping Pong Buffer. http://www.scicompiler.cloud/userguide/PingPongBuffer.html

[43]

Hwanjun Song and Jae-Gil Lee. 2018. RP-DBSCAN: A superfast parallel DBSCAN algorithm based on random partitioning. In Proceedings of the 2018 International Conference on Management of Data. 1173–1187.

Digital Library

[44]

Sándor Szénási. 2016. Parallel implementation of DBSCAN algorithm using multiple graphics accelerators. International Multidisciplinary Scientific GeoConference: SGEM 1 (2016), 327–333.

[45]

Wei Wang, Jiong Yang, Richard Muntz, 1997. STING: A statistical information grid approach to spatial data mining. In VLDB, Vol. 97. 186–195.

[46]

Yiqiu Wang, Yan Gu, and Julian Shun. 2020. Theoretically-efficient and practical parallel DBSCAN. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2555–2571.

Digital Library

Cited By

Lee SAn SKim JNamkung HPark JKim RLee S(2024)Grid-Based DBSCAN Clustering Accelerator for LiDAR’s Point CloudElectronics10.3390/electronics1317339513:17(3395)Online publication date: 26-Aug-2024
https://doi.org/10.3390/electronics13173395
Cui C(2024)Acceleration of Tensor-Product Operations with Tensor CoresACM Transactions on Parallel Computing10.1145/369546611:4(1-24)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3695466
Yin LHu HLi KZheng GQu YChen H(2023)Improvement of DBSCAN Algorithm Based on K-Dist Graph for Adaptive Determining ParametersElectronics10.3390/electronics1215321312:15(3213)Online publication date: 25-Jul-2023
https://doi.org/10.3390/electronics12153213
Show More Cited By

Recommendations

A new hybrid method based on partitioning-based DBSCAN and ant clustering

Clustering problem is an unsupervised learning problem. It is a procedure that partition data objects into matching clusters. The data objects in the same cluster are quite similar to each other and dissimilar in the other clusters. Density-based ...
Exact, Fast and Scalable Parallel DBSCAN for Commodity Platforms
ICDCN '17: Proceedings of the 18th International Conference on Distributed Computing and Networking

DBSCAN is one of the most popular density-based clustering algorithm capable of identifying arbitrary shaped clusters and noise. It is computationally expensive for large data sets. In this paper, we present a grid-based DBSCAN algorithm, GridDBSCAN, ...
GB-DBSCAN: A fast granular-ball based DBSCAN clustering algorithm
Abstract
Density-Based Spatial Clustering of Applications with Noise (DBSCAN) identifies high-density connected areas as clusters, so that it has advantages in discovering arbitrary-shaped clusters. However, it has difficulty in adjusting parameters and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

August 2021

927 pages

ISBN:9781450390682

DOI:10.1145/3472456

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

University Grants Committee

Conference

ICPP 2021

ICPP 2021: 50th International Conference on Parallel Processing

August 9 - 12, 2021

IL, Lemont, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
244
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)8

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lee SAn SKim JNamkung HPark JKim RLee S(2024)Grid-Based DBSCAN Clustering Accelerator for LiDAR’s Point CloudElectronics10.3390/electronics1317339513:17(3395)Online publication date: 26-Aug-2024
https://doi.org/10.3390/electronics13173395
Cui C(2024)Acceleration of Tensor-Product Operations with Tensor CoresACM Transactions on Parallel Computing10.1145/369546611:4(1-24)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3695466
Yin LHu HLi KZheng GQu YChen H(2023)Improvement of DBSCAN Algorithm Based on K-Dist Graph for Adaptive Determining ParametersElectronics10.3390/electronics1215321312:15(3213)Online publication date: 25-Jul-2023
https://doi.org/10.3390/electronics12153213
Li SWu NGao K(2023)Applications of Accelerating Genetic Algorithms in System Engineering2023 World Conference on Communication & Computing (WCONF)10.1109/WCONF58270.2023.10235117(1-6)Online publication date: 14-Jul-2023
https://doi.org/10.1109/WCONF58270.2023.10235117
Tang YWang C(2023)SelB-k-NN: A Mini-Batch K-Nearest Neighbors Algorithm on AI Processors2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00088(831-841)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00088
Tang YWang C(2023)Performance modeling on DaVinci AI coreJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.01.008175:C(134-149)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.jpdc.2023.01.008
Zhang HMa WYuan WZhang JLu Z(2023)Mixed-precision block incomplete sparse approximate preconditioner on Tensor coreCCF Transactions on High Performance Computing10.1007/s42514-023-00165-96:1(54-67)Online publication date: 13-Sep-2023
https://doi.org/10.1007/s42514-023-00165-9
Lapenta GGoldman MNewman DEriksson S(2022)Formation and Reconnection of Electron Scale Current Layers in the Turbulent Outflows of a Primary Reconnection SiteThe Astrophysical Journal10.3847/1538-4357/ac98bc940:2(187)Online publication date: 6-Dec-2022
https://doi.org/10.3847/1538-4357/ac98bc
Zhao YChen GJia Z(2022)TODProceedings of the VLDB Endowment10.14778/3570690.357070316:3(546-560)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.14778/3570690.3570703
Gallet BGowanlock M(2022)Leveraging GPU Tensor Cores for Double Precision Euclidean Distance Calculations2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00029(135-144)Online publication date: Dec-2022
https://doi.org/10.1109/HiPC56025.2022.00029

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents