Hashing-Based Approximate DBSCAN

Li, Tianrun; Heinis, Thomas; Luk, Wayne

doi:10.1007/978-3-319-44039-2_3

Tianrun Li¹⁷,
Thomas Heinis¹⁸ &
Wayne Luk¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9809))

Included in the following conference series:

East European Conference on Advances in Databases and Information Systems

1064 Accesses

Abstract

Analyzing massive amounts of data and extracting value from it has become key across different disciplines. As the amounts of data grow rapidly, however, current approaches for data analysis struggle. This is particularly true for clustering algorithms where distance calculations between pairs of points dominate overall time.

Crucial to the data analysis and clustering process, however, is that it is rarely straightforward. Instead, parameters need to be determined through several iterations. Entirely accurate results are thus rarely needed and instead we can sacrifice precision of the final result to accelerate the computation. In this paper we develop ADvaNCE, a new approach to approximating DBSCAN. ADvaNCE uses two measures to reduce distance calculation overhead: (1) locality sensitive hashing to approximate and speed up distance calculations and (2) representative point selection to reduce the number of distance calculations. Our experiments show that our approach is in general one order of magnitude faster (at most 30x in our experiments) than the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Density-Based Clustering by Double-Bit Quantization Hashing

An Approximate Nearest Neighbor Search Algorithm Using Distance-Based Hashing

PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search

Article 03 July 2021

References

Adaszewski, S., Dukart, J., Kherif, F., Frackowiak, R., Draganski, B.: How early can we predict Alzheimer’s disease using computational anatomy? Neurobiol. Aging 34(12), 2815–2826 (2013)
Article Google Scholar
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Article Google Scholar
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: SIGMOD 1999 (1999)
Google Scholar
Bache, K., Lichman, M.: UCI Machine Learning Repository (2013)
Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article MATH MathSciNet Google Scholar
Borah, B., Bhattacharyya, D.: An improved sampling-based DBSCAN for large spatial databases. In: Conference on Intelligent Sensing and Information Processing (2004)
Google Scholar
Chen, M.-S., Han, J., Yu, P.: Data mining: an overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8(6), 866–883 (1996)
Article Google Scholar
Collins, L.M., Dent, C.W.: Omega: a general formulation of the rand index of cluster recoverysuitable for non-disjoint solutions. Multivar. Behav. Res. 23(2), 231–242 (1988)
Article Google Scholar
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, SCG 2004 (2004)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and and Data Mining (1996)
Google Scholar
Gan, J., Tao, Y.: DBSCAN revisited: mis-claim, un-fixability, and approximation. In: SIGMOD 2015 (2015)
Google Scholar
Gunawan, A.: A faster algorithm for DBSCAN. Master’s thesis, Technical University of Eindhoven, March 2013
Google Scholar
Patwary, M., Ali, M., Satish, N., Sundaram, N., Manne, F., Habib, S., Dubey, P.: Pardicle: parallel approximate density-based clustering. In: Supercomputing 2014 (2014)
Google Scholar
Viswanath, P., Pinkesh, R.: l-DBSCAN: a fast hybrid density based clustering method. In: Proceedings of the Conference on Pattern Recognition (2006)
Google Scholar
Yeganeh, S., Habibi, J., Abolhassani, H., Tehrani, M., Esmaelnezhad, J.: An approximation algorithm for finding skeletal points for density based clustering approaches. In: Symposium on Computational Intelligence and Data Mining (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua University, Beijing, China
Tianrun Li
Imperial College London, London, UK
Thomas Heinis & Wayne Luk

Authors

Tianrun Li
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Heinis
View author publications
You can also search for this author in PubMed Google Scholar
Wayne Luk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Heinis .

Editor information

Editors and Affiliations

MFF, Charles University MFF, Prague, Czech Republic
Jaroslav Pokorný
Faculty of Sciences, University of Novi Sad Faculty of Sciences, Novi Sad, Serbia
Mirjana Ivanović
Christian-Albrechts-Universität Kiel , Kiel, Germany
Bernhard Thalheim
VSB-Technical University Ostrava , Ostrava, Czech Republic
Petr Šaloun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, T., Heinis, T., Luk, W. (2016). Hashing-Based Approximate DBSCAN. In: Pokorný, J., Ivanović, M., Thalheim, B., Šaloun, P. (eds) Advances in Databases and Information Systems. ADBIS 2016. Lecture Notes in Computer Science(), vol 9809. Springer, Cham. https://doi.org/10.1007/978-3-319-44039-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-44039-2_3
Published: 14 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44038-5
Online ISBN: 978-3-319-44039-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics