Abstract
Density-based clustering has received increasing attention for its ability to handle clusters of arbitrary shapes. However, it still has difficulties in mining clusters of arbitrary densities, especially the clusters of sparse regions in the presence of dense regions. To address this problem, this paper presents a new concept called density decreased chain on the mutual k-NN graph. It starts with the local density center whose density is the highest in the data points connected to this center. Based on the density decreased chain, the concept of the core point is redefined. The density of the core point is close to that of the local density center on the same density decreased chain as the core point. According to its definition, the core point in the data with arbitrary densities can be well identified because the local density centers exist in both sparse and dense regions. Further, intra-cluster density decreased chain is defined to mine subclusters in the core points. After forming the subclusters, the remaining data point is hierarchically assigned to one of these subclusters by the density decreased chains containing this remaining data point. The experiments illustrate the effectiveness of the proposed method.
Similar content being viewed by others
References
Ahmadian S, Joorabloo N, Jalili M, Meghdadi M, Afsharchi M, Ren Y (2018) A temporal clustering approach for social recommender systems. In: IEEE/ACM international conference on advances in social networks analysis and mining. https://doi.org/10.1109/ASONAM.2018.8508723
Moradi P, Ahmadian S, Akhlaghian F (2015) An effective trust-based recommendation method using a novel graph clustering algorithm. Physica A: Statistical mechanics and its applications 436:462–481. https://doi.org/10.1016/j.physa.2015.05.008
Mittal H, Pandey AC, Pal R, Tripathi A (2021) A new clustering method for the diagnosis of CoVID19 using medical images. Appl Intell 51(5):2988–3011. https://doi.org/10.1007/s10489-020-02122-3
Cai Z, Yang X, Huang T, Zhu W (2020) A new similarity combining reconstruction coefficient with pairwise distance for agglomerative clustering. Inf Sci 508:173–182. https://doi.org/10.1016/j.ins.2019.08.048
Liu H, Zhang X, Zhang X, Li Q, Wu XM (2021) RPC: Representative possible world based consistent clustering algorithm for uncertain data. Comput Commun 176:128–137. https://doi.org/10.1016/j.comcom.2021.06.002
Wu JM, Lin JC, Viger PF, Djenouri Y, Chen CH, Li ZC (2019) The density-based clustering method for privacy-preserving data mining. Math Biosci Eng 16(3):1718–1728. https://doi.org/10.3934/mbe.2019082
Bi J, Cao H, Wang Y, Zheng G, Liu K, Cheng N, Zhao M (2022) DBSCAN and TD integrated Wi-Fi positioning algorithm. Remote Sens 14(2):297. https://doi.org/10.3390/rs14020297
Djenouri Y, Belhadi A, Djenouri D, Lin J C-W (2021) Cluster-based information retrieval using pattern mining. Appl Intell 51(4):1888–1903. https://doi.org/10.1007/s10489-020-01922-x
Li C, Chen H, Li T, Yang X (2021) A stable community detection approach for complex network based on density peak clustering and label propagation. Appl Intell, 1–21, https://doi.org/10.1007/s10489-021-02287-5
Djenouri Y, Comuzzi M (2017) Combining apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem. Inf Sci 420:1–15. https://doi.org/10.1016/j.ins.2017.08.043
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining. http://www.aaai.org/Library/KDD/1996/kdd96-037.php
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SIAM International Conference on Data Mining. https://doi.org/10.1137/1.9781611972733.5
Zhu Y, Ting KM, Carman MJ (2016) Density-ratio based clustering for discovering clusters with varying densities. Pattern Recogn 60:983–997. https://doi.org/10.1016/j.patcog.2016.07.007
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344 (6191):1492–1496. https://doi.org/10.1126/science.1242072
Li R, Yang X, Qin X, Zhu W (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl-Based Syst 184:104905. https://doi.org/10.1016/j.knosys.2019.104905
Karypis G, Han E-H, Kumar V (1999) Chameleon: Hierarchical clustering using dynamic modeling. Computer 32(8):68–75. https://doi.org/10.1109/2.781637
Niu X, Zheng Y, Fournier-Viger P, Wang B (2021) Parallel grid-based density peak clustering of big trajectory data. Appl Intell, 1–16, https://doi.org/10.1007/s10489-021-02757-w
Li P, Xie H (2022) Two-stage clustering algorithm based on evolution and propagation patterns. Appl Intell, 1–14, https://doi.org/10.1007/s10489-021-03016-8
Xie H, Li P (2021) A density-based evolutionary clustering algorithm for intelligent development. Eng Appl Artif Intell 104:104396. https://doi.org/10.1016/j.engappai.2021.104396
Xia J, Zhang J, Wang Y, Han L, Yan H (2022) WC-KNNG-PC: Watershed clustering based on k-nearest-neighbor graph and Pauta criterion. Pattern Recogn 121:108177. https://doi.org/10.1016/j.patcog.2021.108177
Sander J, Ester M, Kriegel H-P, Xu X (1998) Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Min Knowl Disc 2(2):169–194. https://doi.org/10.1023/A:1009745219419
Di R, Wang H, Fang Y, Zhou Y (2018) Fake comment detection based on time series and density peaks clustering. In: International Conference on Algorithms and Architectures for Parallel Processing. https://doi.org/10.1007/978-3-030-05234-8_15
Campello R J G B, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Advances in Knowledge Discovery and Data Mining. https://doi.org/10.1007/978-3-642-37456-2_14
Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: Parameter reduction and outlier detection. Inf Syst 38(3):317–330. https://doi.org/10.1016/j.is.2012.09.001
dos Santos JA, Iqbal ST, Naldi MC, Campello RJGB, Sander J (2021) Hierarchical density-based clustering using MapReduce. IEEE Transactions Big Data 7(1):102–114. https://doi.org/10.1109/TBDATA.2019.2907624
Campello R J G B, Moulavi D, Zimek A, Sander J (2015) Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data 10(1):1–51. https://doi.org/10.1145/2733381
Li H, Liu X, Li T, Gan R (2020) A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recogn 102:107206. https://doi.org/10.1016/j.patcog.2020.107206
Zhu Y, Ting K M, Carman M J, Angelova M (2021) CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities. Pattern Recogn 117:107977. https://doi.org/10.1016/j.patcog.2021.107977
Huang T, Wang S, Zhu W (2020) An adaptive kernelized rank-order distance for clustering non-spherical data with high noise. International Journal of Machine Learning and Cybernetics 11(8):1735–1747. https://doi.org/10.1007/s13042-020-01068-9
Guan J, Li S, He X, Zhu J, Chen J (2021) Fast hierarchical clustering of local density peaks via an association degree transfer method. Neurocomputing 455:401–418. https://doi.org/10.1016/j.neucom.2021.05.071
Sun L, Qin X, Ding W, Xu J, Zhang S (2021) Density peaks clustering based on k-nearest neighbors and self-recommendation. International Journal of Machine Learning and Cybernetics 12(7):1913–1938. https://doi.org/10.1007/s13042-021-01284-x
Abbas MA, El-Zoghabi AA, Shoukry AA (2021) DenMune: Density peak based clustering using mutual nearest neighbors. Pattern Recogn 109:107589. https://doi.org/10.1016/j.patcog.2020.107589
Fang F, Qiu L, Yuan S (2020) Adaptive core fusion-based density peak clustering for complex data with arbitrary shapes and densities. Pattern Recogn 107:107452. https://doi.org/10.1016/j.patcog.2020.107452
Liang B, Cai J, Yang H (2022) A new cell group clustering algorithm based on validation & correction mechanism. Expert Syst Appl 193:116410. https://doi.org/10.1016/j.eswa.2021.116410
Ros F, Guillaume S, Hajji M E, Riad R (2020) KdMutual: A novel clustering algorithm combining mutual neighboring and hierarchical approaches using a new selection criterion. Knowl-Based Syst 204:106220. https://doi.org/10.1016/j.knosys.2020.106220
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9 (11):2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.html
Fu L, Medico E (2007) FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinforma 8(1):1–15. https://doi.org/10.1186/1471-2105-8-3
Dua D, Graff C (2017) UCI machine learning repository, University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml
Rossi RA, Ahmed NK (2015) The network data repository with interactive graph analytics and visualization. In: AAAI conference on artificial intelligence. http://networkrepository.com
Nakai K, Kanehisa M (1991) Expert system for predicting protein localization sites in gram-negative bacteria. Proteins 11(2):95–110. https://doi.org/10.1002/prot.340110203
Hull JJ (1994) A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(5):550–554. https://doi.org/10.1109/34.291440
Guyon I, Gunn SR, Ben-Hur A, Dror G (2004) Result analysis of the NIPS 2003 feature selection challenge. In: Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2004
Nene SA, Nayar SK, Murase H, et al. (1996) Columbia object image library (COIL-20)
Sim T, Baker S, Bsat M (2002) The CMU pose, illumination, and expression (PIE) database. In: IEEE International Conference on Automatic Face and Gesture Recognition. https://doi.org/10.1109/AFGR.2002.1004130
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2001
Zhang W, Zhao D, Wang X (2013) Agglomerative clustering via maximum incremental path integral. Pattern Recogn 46:3056–3065. https://doi.org/10.1016/j.patcog.2013.04.013
Nie F, Wang X, Jordan MI, Huang H (2016) The constrained laplacian rank algorithm for graph-based clustering. In: AAAI conference on artificial intelligence. http://www.aaai.org/Library/AAAI/aaai16contents.php
Aggarwal CC, Reddy CK (eds.) (2014) Data clustering: Algorithms and applications. CRC Press, http://www.crcpress.com/product/isbn/9781466558212
Strehl A, Ghosh J (2002) Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617. http://jmlr.org/papers/v3/strehl02a.html
Kuhn HW (1955) The hungarian method for the assignment problem. Nav Res Logist 2(1-2):83–97
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38
Tao Y, Takagi K, Nakata K (2021) Clustering-friendly representation learning via instance discrimination and feature decorrelation. In: International Conference on Learning Representations. https://openreview.net/forum?id=e12NDM7wkEY
Zhong G, Pun C-M (2020) Subspace clustering by simultaneously feature selection and similarity learning. Knowl-Based Syst 193:105512. https://doi.org/10.1016/j.knosys.2020.105512
Acknowledgements
This work was funded by the National Natural Science Foundation of China (Grant No. 61772120).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, R., Cai, Z. A clustering algorithm based on density decreased chain for data with arbitrary shapes and densities. Appl Intell 53, 2098–2109 (2023). https://doi.org/10.1007/s10489-022-03583-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03583-4