AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Kim, Jeong-Hun; Choi, Jong-Hyeok; Yoo, Kwan-Hee; Nasridinov, Aziz

doi:10.1007/s11227-018-2380-z

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Published: 08 May 2018

Volume 75, pages 142–169, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jeong-Hun Kim¹,
Jong-Hyeok Choi¹,
Kwan-Hee Yoo¹ &
…
Aziz Nasridinov¹

2056 Accesses
47 Citations
Explore all metrics

Abstract

Clustering is a typical data mining technique that partitions a dataset into multiple subsets of similar objects according to similarity metrics. In particular, density-based algorithms can find clusters of different shapes and sizes while remaining robust to noise objects. DBSCAN, a representative density-based algorithm, finds clusters by defining the density criterion with global parameters, \( \varepsilon \)-distance and \( MinPts \). However, most density-based algorithms, including DBSCAN, find clusters incorrectly because the density criterion is fixed to the global parameters and misapplied to clusters of varying densities. Although studies have been conducted to determine optimal parameters or to improve clustering performance using additional parameters and computations, running time for clustering has been significantly increased, particularly when the dataset is large. In this study, we focus on minimizing the additional computation required to determine the parameters by using the approximate adaptive \( \varepsilon \)-distance for each density while finding the clusters with varying densities that DBSCAN cannot find. Specifically, we propose a new tree structure based on a quadtree to define a dataset density layer. In addition, we propose approximate adaptive DBSCAN (AA-DBSCAN) and kAA-DBSCAN that have clustering performance similar to those of existing algorithms for finding clusters with varying densities while significantly reducing the running time required to perform clustering. We evaluate the proposed algorithms, AA-DBSCAN and kAA-DBSCAN, via extensive experiments using the state-of-the-art algorithms. Experimental results demonstrate an improvement in clustering performance and reduction in running time of the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes

DBSCAN-like clustering method for various data densities

Article 05 April 2019

Rudolf Scitovski & Kristian Sabo

A Clustering Algorithm for Multi-density Datasets

References

Lv Y, Ma T, Tang M et al (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171:9–22. https://doi.org/10.1016/j.neucom.2015.05.109
Article Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, Waltham
MATH Google Scholar
Zhu Y, Ting KM, Carman MJ (2016) Density-ratio based clustering for discovering clusters with varying densities. Pattern Recogn 60:983–997. https://doi.org/10.1016/j.patcog.2016.07.007
Article Google Scholar
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34):226–231
Google Scholar
Wang X, Hamilton HJ (2003) DBRS: a density-based spatial clustering method with random sampling. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 563–575. https://doi.org/10.1007/3-540-36175-8_56
Roy S, Bhattacharyya DK (2005) An approach to find embedded clusters using density based techniques. In: International Conference on Distributed Computing and Internet Technology, pp 523–535. https://doi.org/10.1007/11604655_59
Zhou A, Zhou S, Cao J et al (2000) Approaches for scaling DBSCAN algorithm to large spatial databases. J Comput Sci Technol 15(6):509–526. https://doi.org/10.1007/BF02948834
Article MATH Google Scholar
Xiong Z, Chen R, Zhang Y, Zhang X (2012) Multi-density DBSCAN algorithm based on density levels partitioning. J Inform Comput Sci 9(10):2739–2749
Google Scholar
El-Sonbaty Y, Ismail MA, Farouk M (2004) An efficient density based clustering algorithm for large databases. In: 16th IEEE International Conference on Tools with Artificial Intelligence, pp 673–677. https://doi.org/10.1109/ictai.2004.27
Xiaoyun C, Yufang M, Yan Z, Ping W (2008) GMDBSCAN: multi-density DBSCAN cluster based on grid. In: IEEE International Conference on e-Business Engineering, pp 780–783. https://doi.org/10.1109/ICEBE.2008.54
Jiang H, Li J, Yi S et al (2011) A new hybrid method based on partitioning-based DBSCAN and ant clustering. Expert Syst Appl 38(8):9373–9381. https://doi.org/10.1016/j.eswa.2011.01.135
Article Google Scholar
Chen X, Liu W, Qiu H, Lai J (2011) APSCAN: a parameter free algorithm for clustering. Pattern Recogn Lett 32(7):973–986. https://doi.org/10.1016/j.patrec.2011.02.001
Article Google Scholar
Hou J, Gao H, Li X (2016) DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Trans Image Process 25(7):3182–3193. https://doi.org/10.1109/TIP.2016.2559803
Article MathSciNet MATH Google Scholar
Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM Sigmod Rec 28(2):49–60. https://doi.org/10.1145/304182.304187
Article Google Scholar
Liu P, Zhou D, Wu N (2007) VDBSCAN: varied density based spatial clustering of applications with noise. In: International Conference on Service Systems and Service Management, pp 1–4. https://doi.org/10.1109/ICSSSM.2007.4280175
Jahirabadkar S, Kulkarni P (2014) Algorithm to determine ε-distance parameter in density based clustering. Expert Syst Appl 41(6):2939–2946. https://doi.org/10.1016/j.eswa.2013.10.025
Article Google Scholar
Huang TQ, Yu YQ, Li K, Zeng WF (2009) Reckon the parameter of dbscan for multi-density data sets with constraints. Int Conf Artif Intell Comput Intell 4:375–379. https://doi.org/10.1109/AICI.2009.393
Google Scholar
Xu X, Jäger J, Kriegel H-P (1999) A fast parallel clustering algorithm for large spatial databases. Data Min Knowl Disccov 3(3):263–290. https://doi.org/10.1007/0-306-47011-X_3
Article Google Scholar
Lumer ED, Faieta B (1994) Diversity and adaptation in populations of clustering ants. Proc Third Int Conf Simul Adapt Behav 3:501–508
Google Scholar
Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J Roy Stat Soc Ser C (Appl Stat) 28(1):100–108
MATH Google Scholar
Limwattanapibool O, Arch-int S (2017) Determination of the appropriate parameters for K-means clustering using selection of region clusters based on density DBSCAN (SRCD-DBSCAN). Expert Syst. https://doi.org/10.1111/exsy.12204
Google Scholar
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp 47–58. https://doi.org/10.1137/1.9781611972733.5
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496. https://doi.org/10.1126/science.1242072
Article Google Scholar
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619. https://doi.org/10.1109/34.1000236
Article Google Scholar
Liu X, Yang Q, He L (2017) A novel DBSCAN with entropy and probability for mixed data. Cluster Comput 20(2):1313–1323. https://doi.org/10.1007/s10586-017-0818-3
Article Google Scholar
Kim J, Lee W, Song JJ, Lee SB (2017) Optimized combinatorial clustering for stochastic processes. Cluster Comput 20(2):1135–1148. https://doi.org/10.1007/s10586-017-0763-1
Article Google Scholar
Lulli A, Dell’Amico M, Michiardi P, Ricci L (2016) NG-DBSCAN: scalable density-based clustering for arbitrary data. Proc VLDB Endow 10(3):157–168. https://doi.org/10.14778/3021924.3021932
Article Google Scholar
Dalli A (2003) Adaptation of the F-measure to cluster based lexicon quality evaluation. In: Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: Are Evaluation Methods, Metrics and Resources Reusable? pp 51–56
Duan L, Xu L, Guo F et al (2007) A local-density based spatial clustering algorithm with noise. Inform Syst 32(7):978–986. https://doi.org/10.1016/j.is.2006.10.006
Article Google Scholar
Machine Learning. Clustering datasets (2016) http://cs.joensuu.fi/sipu/datasets
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220. https://doi.org/10.1016/j.knosys.2017.07.010
Article Google Scholar
Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. ACM Sigmod Rec 19(2):322–331. https://doi.org/10.1145/93597.98741
Article Google Scholar
Loh WK, Yu H (2015) Fast density-based clustering through dataset partition using graphics processing units. Inf Sci 308:94–112. https://doi.org/10.1016/j.ins.2014.10.023
Article Google Scholar
Andrade G, Ramos G et al (2013) G-dbscan: a gpu accelerated algorithm for density-based clustering. Proc Comput Sci 18:369–378. https://doi.org/10.1016/j.procs.2013.05.200
Article Google Scholar

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1D1A3B03035729).

Author information

Authors and Affiliations

Department of Computer Science, Chungbuk National University, Cheongju-si, 28644, Korea
Jeong-Hun Kim, Jong-Hyeok Choi, Kwan-Hee Yoo & Aziz Nasridinov

Authors

Jeong-Hun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Hyeok Choi
View author publications
You can also search for this author in PubMed Google Scholar
Kwan-Hee Yoo
View author publications
You can also search for this author in PubMed Google Scholar
Aziz Nasridinov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aziz Nasridinov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, JH., Choi, JH., Yoo, KH. et al. AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities. J Supercomput 75, 142–169 (2019). https://doi.org/10.1007/s11227-018-2380-z

Download citation

Published: 08 May 2018
Issue Date: 09 January 2019
DOI: https://doi.org/10.1007/s11227-018-2380-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Abstract

Access this article

Similar content being viewed by others

Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes

DBSCAN-like clustering method for various data densities

A Clustering Algorithm for Multi-density Datasets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Abstract

Access this article

Similar content being viewed by others

Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes

DBSCAN-like clustering method for various data densities

A Clustering Algorithm for Multi-density Datasets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation