skip to main content
10.1145/3472456.3473518acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Accelerating DBSCAN Algorithm with AI Chips for Large Datasets

Published: 05 October 2021 Publication History

Abstract

DBSCAN is a popular clustering algorithm, which shows great success in many real-world applications. Its advantages come at the expense of massive computation, especially for computing the distance matrix. Driven by deep learning, many Artificial Intelligence (AI) chips have been developed. With efficient matrix multiplication units, AI chips can significantly accelerate the distance calculation. However, DBSCAN also needs to identify and count the neighbors for each point. It is challenging for most AI chips due to over-specialization. Moreover, the increasing data size and the limited device memory capacity force DBSCAN to follow a mini-batch manner. It results in a high data transfer overhead, which further hinders the performance of DBSCAN on AI chips. In this paper, we propose two novel techniques to address the challenges of accelerating the DBSCAN algorithm with AI chips: (1) new neighbor identification algorithms using bitwise operations only, while traditional solutions require the compare-and-select operations that are weakly supported in AI chips; and (2) two speculative execution strategies to reduce the data transfer overhead induced by mini-batches. Evaluations show that deploying distance matrix calculation to tensor cores achieves 2.61 × speedup on Nvidia RTX 3090. On Huawei Ascend 310, our neighbor identification algorithms achieve 17.88 × throughout of using CPUs for neighbor identification. The speculative execution strategies further reduce the execution time by 15.1% on average for normal datasets and up to 99.0% for sparse datasets.

References

[1]
T Aaamodt and A Boktor. 2012. Gpgpu-sim 3. x: A performance simulator for many-core accelerator research. In International Symposium on Computer Architecture (ISCA), http://www. gpgpu-sim. org/isca2012-tutorial.
[2]
Hamed Habibi Aghdam and Elnaz Jahani Heravi. 2017. Guide to convolutional neural networks. New York, NY: Springer 10 (2017), 978–973.
[3]
Thomas D Ahle and Francesco Silvestri. 2020. Similarity search with tensor core units. In International Conference on Similarity Search and Applications. Springer, 76–84.
[4]
Guilherme Andrade, Gabriel Ramos, Daniel Madeira, Rafael Sachetto, Renato Ferreira, and Leonardo Rocha. 2013. G-dbscan: A gpu accelerated algorithm for density-based clustering. Procedia Computer Science 18 (2013), 369–378.
[5]
S Baxter and D Merrill. 2013. Efficient Merge, Search, and Set Operations on GPUs. Online (2013).
[6]
Cambricon. 2021. Cambricon BANG C Developer Guide. https://www.cambricon.com/docs/bangc/developer_guide_html/
[7]
Min Chen, Xuedong Gao, and HuiFei Li. 2010. Parallel DBSCAN with priority r-tree. In 2010 2nd IEEE International Conference on Information Management and Engineering. IEEE, 508–511.
[8]
Jack Choquette, Wishwesh Gandhi, Olivier Giroux, Nick Stam, and Ronny Krashinsky. 2021. NVIDIA A100 Tensor Core GPU: Performance and Innovation. IEEE Micro (2021).
[9]
Abdul Dakkak, Cheng Li, Jinjun Xiong, Isaac Gelado, and Wen-mei Hwu. 2019. Accelerating reduction and scan using tensor core units. In Proceedings of the ACM International Conference on Supercomputing. 46–57.
[10]
Mohamad Dolatshah, Ali Hadian, and Behrouz Minaei-Bidgoli. 2015. Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces. arXiv preprint arXiv:1511.00628(2015).
[11]
Pedro Domingos. 2012. A few useful things to know about machine learning. Commun. ACM 55, 10 (2012), 78–87.
[12]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, Vol. 96. 226–231.
[13]
Xiufen Fu, Yaguang Wang, Yanna Ge, Peiwen Chen, and Shaohua Teng. 2013. Research and application of DBSCAN algorithm based on Hadoop platform. In Joint International Conference on Pervasive Computing and the Networked World. Springer, 73–87.
[14]
Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. 2012. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of computing 8, 1 (2012), 321–350.
[15]
Yaobin He, Haoyu Tan, Wuman Luo, Huajian Mao, Di Ma, Shengzhong Feng, and Jianping Fan. 2011. Mr-dbscan: an efficient parallel density-based clustering algorithm using mapreduce. In 2011 IEEE 17th International Conference on Parallel and Distributed Systems. IEEE, 473–480.
[16]
Huawei. 2021. Tensor Boost Engine Operator Development Guide. https://support.huaweicloud.com/odevg-Inference-cann/odevg-Inference-cann.pdf
[17]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869–6898.
[18]
Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P Scarpazza. 2018. Dissecting the NVIDIA volta GPU architecture via microbenchmarking. arXiv preprint arXiv:1804.06826(2018).
[19]
Heinrich Jiang, Jennifer Jang, and Jakub Łącki. 2020. Faster DBSCAN via subsampled similarity queries. arXiv preprint arXiv:2006.06743(2020).
[20]
Yang Jiao, Liang Han, and Xin Long. 2020. Hanguang 800 NPU–The Ultimate AI Inference Solution for Data Centers. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1–29.
[21]
Gary J Katz and Joseph T Kider. 2008. All-pairs shortest-paths for large graphs on the GPU. (2008).
[22]
Ravi Kumar, Manish Purohit, Zoya Svitkina, Erik Vee, and Joshua Wang. 2019. Efficient rematerialization for deep networks. Advances in Neural Information Processing Systems 32 (2019), 15172–15181.
[23]
Heng Liao, Jiajin Tu, Jing Xia, and Xiping Zhou. 2019. Davinci: A scalable architecture for neural network computing. In 2019 IEEE Hot Chips 31 Symposium (HCS). IEEE Computer Society, 1–44.
[24]
Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. 2017. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 212–220.
[25]
Yuechao Lu, Ichitaro Yamazaki, Fumihiko Ino, Yasuyuki Matsushita, Stanimire Tomov, and Jack Dongarra. 2020. Reducing the amount of out-of-core data access for GPU-accelerated randomized SVD. Concurrency and Computation: Practice and Experience 32, 19(2020), e5754.
[26]
Guangchun Luo, Xiaoyu Luo, Thomas Fairley Gooch, Ling Tian, and Ke Qin. 2016. A parallel dbscan algorithm based on spark. In 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom). IEEE, 548–553.
[27]
Lijuan Luo, Martin Wong, and Wen-mei Hwu. 2010. An effective GPU implementation of breadth-first search. In Design Automation Conference. IEEE, 52–55.
[28]
Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S Vetter. 2018. Nvidia tensor core programmability, performance & precision. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 522–531.
[29]
Duane Merrill, Michael Garland, and Andrew Grimshaw. 2015. High-performance and scalable GPU graph traversal. ACM Transactions on Parallel Computing (TOPC) 1, 2 (2015), 1–30.
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546(2013).
[31]
Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, and Matteo Interlandi. 2020. A Tensor Compiler for Unified Machine Learning Prediction Serving. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 899–917.
[32]
Aaron Nech and Ira Kemelmacher-Shlizerman. 2017. Level playing field for million scale face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7044–7053.
[33]
Thomas Norrie, Nishant Patil, Doe Hyun Yoon, George Kurian, Sheng Li, James Laudon, Cliff Young, Norman P Jouppi, and David Patterson. 2020. Google’s Training Chips Revealed: TPUv2 and TPUv3. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1–70.
[34]
NVIDIA. 2018. NVIDIA TURING GPU ARCHITECTURE. Online (2018).
[35]
NVIDIA. 2020. NVIDIA AMPERE GA102 GPU ARCHITECTURE. Online (2020).
[36]
NVIDIA. 2020. RAPIDS Memory Manager (RMM). https://docs.rapids.ai/api/rmm/stable/basics.html
[37]
NVIDIA. 2021. Programming Guide: CUDA Toolkit Documentation.
[38]
Opendota. 2021. Community-maintained open source Dota 2 data platform. https://www.opendota.com/
[39]
Md Aamir Raihan, Negar Goli, and Tor M Aamodt. 2019. Modeling deep learning accelerator enabled gpus. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 79–92.
[40]
Sebastian Raschka, Joshua Patterson, and Corey Nolet. 2020. Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence. arXiv preprint arXiv:2002.04803(2020).
[41]
Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS) 42, 3 (2017), 1–21.
[42]
SCI-Compiler. 2018. Ping Pong Buffer. http://www.scicompiler.cloud/userguide/PingPongBuffer.html
[43]
Hwanjun Song and Jae-Gil Lee. 2018. RP-DBSCAN: A superfast parallel DBSCAN algorithm based on random partitioning. In Proceedings of the 2018 International Conference on Management of Data. 1173–1187.
[44]
Sándor Szénási. 2016. Parallel implementation of DBSCAN algorithm using multiple graphics accelerators. International Multidisciplinary Scientific GeoConference: SGEM 1 (2016), 327–333.
[45]
Wei Wang, Jiong Yang, Richard Muntz, 1997. STING: A statistical information grid approach to spatial data mining. In VLDB, Vol. 97. 186–195.
[46]
Yiqiu Wang, Yan Gu, and Julian Shun. 2020. Theoretically-efficient and practical parallel DBSCAN. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2555–2571.

Cited By

View all
  • (2024)Grid-Based DBSCAN Clustering Accelerator for LiDAR’s Point CloudElectronics10.3390/electronics1317339513:17(3395)Online publication date: 26-Aug-2024
  • (2024)Acceleration of Tensor-Product Operations with Tensor CoresACM Transactions on Parallel Computing10.1145/369546611:4(1-24)Online publication date: 9-Sep-2024
  • (2023)Improvement of DBSCAN Algorithm Based on K-Dist Graph for Adaptive Determining ParametersElectronics10.3390/electronics1215321312:15(3213)Online publication date: 25-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
August 2021
927 pages
ISBN:9781450390682
DOI:10.1145/3472456
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AI chip
  2. DBSCAN
  3. Tensor core
  4. clustering
  5. parallel computing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP 2021

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)8
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Grid-Based DBSCAN Clustering Accelerator for LiDAR’s Point CloudElectronics10.3390/electronics1317339513:17(3395)Online publication date: 26-Aug-2024
  • (2024)Acceleration of Tensor-Product Operations with Tensor CoresACM Transactions on Parallel Computing10.1145/369546611:4(1-24)Online publication date: 9-Sep-2024
  • (2023)Improvement of DBSCAN Algorithm Based on K-Dist Graph for Adaptive Determining ParametersElectronics10.3390/electronics1215321312:15(3213)Online publication date: 25-Jul-2023
  • (2023)Applications of Accelerating Genetic Algorithms in System Engineering2023 World Conference on Communication & Computing (WCONF)10.1109/WCONF58270.2023.10235117(1-6)Online publication date: 14-Jul-2023
  • (2023)SelB-k-NN: A Mini-Batch K-Nearest Neighbors Algorithm on AI Processors2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00088(831-841)Online publication date: May-2023
  • (2023)Performance modeling on DaVinci AI coreJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.01.008175:C(134-149)Online publication date: 1-May-2023
  • (2023)Mixed-precision block incomplete sparse approximate preconditioner on Tensor coreCCF Transactions on High Performance Computing10.1007/s42514-023-00165-96:1(54-67)Online publication date: 13-Sep-2023
  • (2022)Formation and Reconnection of Electron Scale Current Layers in the Turbulent Outflows of a Primary Reconnection SiteThe Astrophysical Journal10.3847/1538-4357/ac98bc940:2(187)Online publication date: 6-Dec-2022
  • (2022)TODProceedings of the VLDB Endowment10.14778/3570690.357070316:3(546-560)Online publication date: 1-Nov-2022
  • (2022)Leveraging GPU Tensor Cores for Double Precision Euclidean Distance Calculations2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00029(135-144)Online publication date: Dec-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media