Skip to main content
Log in

A clustering algorithm based on density decreased chain for data with arbitrary shapes and densities

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Density-based clustering has received increasing attention for its ability to handle clusters of arbitrary shapes. However, it still has difficulties in mining clusters of arbitrary densities, especially the clusters of sparse regions in the presence of dense regions. To address this problem, this paper presents a new concept called density decreased chain on the mutual k-NN graph. It starts with the local density center whose density is the highest in the data points connected to this center. Based on the density decreased chain, the concept of the core point is redefined. The density of the core point is close to that of the local density center on the same density decreased chain as the core point. According to its definition, the core point in the data with arbitrary densities can be well identified because the local density centers exist in both sparse and dense regions. Further, intra-cluster density decreased chain is defined to mine subclusters in the core points. After forming the subclusters, the remaining data point is hierarchically assigned to one of these subclusters by the density decreased chains containing this remaining data point. The experiments illustrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://github.com/liruijia2017/Density-decreased-chain-based-clustering

References

  1. Ahmadian S, Joorabloo N, Jalili M, Meghdadi M, Afsharchi M, Ren Y (2018) A temporal clustering approach for social recommender systems. In: IEEE/ACM international conference on advances in social networks analysis and mining. https://doi.org/10.1109/ASONAM.2018.8508723

  2. Moradi P, Ahmadian S, Akhlaghian F (2015) An effective trust-based recommendation method using a novel graph clustering algorithm. Physica A: Statistical mechanics and its applications 436:462–481. https://doi.org/10.1016/j.physa.2015.05.008

    Article  Google Scholar 

  3. Mittal H, Pandey AC, Pal R, Tripathi A (2021) A new clustering method for the diagnosis of CoVID19 using medical images. Appl Intell 51(5):2988–3011. https://doi.org/10.1007/s10489-020-02122-3

    Article  Google Scholar 

  4. Cai Z, Yang X, Huang T, Zhu W (2020) A new similarity combining reconstruction coefficient with pairwise distance for agglomerative clustering. Inf Sci 508:173–182. https://doi.org/10.1016/j.ins.2019.08.048

    Article  MathSciNet  Google Scholar 

  5. Liu H, Zhang X, Zhang X, Li Q, Wu XM (2021) RPC: Representative possible world based consistent clustering algorithm for uncertain data. Comput Commun 176:128–137. https://doi.org/10.1016/j.comcom.2021.06.002

    Article  Google Scholar 

  6. Wu JM, Lin JC, Viger PF, Djenouri Y, Chen CH, Li ZC (2019) The density-based clustering method for privacy-preserving data mining. Math Biosci Eng 16(3):1718–1728. https://doi.org/10.3934/mbe.2019082

    Article  MathSciNet  MATH  Google Scholar 

  7. Bi J, Cao H, Wang Y, Zheng G, Liu K, Cheng N, Zhao M (2022) DBSCAN and TD integrated Wi-Fi positioning algorithm. Remote Sens 14(2):297. https://doi.org/10.3390/rs14020297

    Article  Google Scholar 

  8. Djenouri Y, Belhadi A, Djenouri D, Lin J C-W (2021) Cluster-based information retrieval using pattern mining. Appl Intell 51(4):1888–1903. https://doi.org/10.1007/s10489-020-01922-x

    Article  Google Scholar 

  9. Li C, Chen H, Li T, Yang X (2021) A stable community detection approach for complex network based on density peak clustering and label propagation. Appl Intell, 1–21, https://doi.org/10.1007/s10489-021-02287-5

  10. Djenouri Y, Comuzzi M (2017) Combining apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem. Inf Sci 420:1–15. https://doi.org/10.1016/j.ins.2017.08.043

    Article  Google Scholar 

  11. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining. http://www.aaai.org/Library/KDD/1996/kdd96-037.php

  12. Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SIAM International Conference on Data Mining. https://doi.org/10.1137/1.9781611972733.5

  13. Zhu Y, Ting KM, Carman MJ (2016) Density-ratio based clustering for discovering clusters with varying densities. Pattern Recogn 60:983–997. https://doi.org/10.1016/j.patcog.2016.07.007

    Article  MATH  Google Scholar 

  14. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344 (6191):1492–1496. https://doi.org/10.1126/science.1242072

    Article  Google Scholar 

  15. Li R, Yang X, Qin X, Zhu W (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl-Based Syst 184:104905. https://doi.org/10.1016/j.knosys.2019.104905

    Article  Google Scholar 

  16. Karypis G, Han E-H, Kumar V (1999) Chameleon: Hierarchical clustering using dynamic modeling. Computer 32(8):68–75. https://doi.org/10.1109/2.781637

    Article  Google Scholar 

  17. Niu X, Zheng Y, Fournier-Viger P, Wang B (2021) Parallel grid-based density peak clustering of big trajectory data. Appl Intell, 1–16, https://doi.org/10.1007/s10489-021-02757-w

  18. Li P, Xie H (2022) Two-stage clustering algorithm based on evolution and propagation patterns. Appl Intell, 1–14, https://doi.org/10.1007/s10489-021-03016-8

  19. Xie H, Li P (2021) A density-based evolutionary clustering algorithm for intelligent development. Eng Appl Artif Intell 104:104396. https://doi.org/10.1016/j.engappai.2021.104396

    Article  Google Scholar 

  20. Xia J, Zhang J, Wang Y, Han L, Yan H (2022) WC-KNNG-PC: Watershed clustering based on k-nearest-neighbor graph and Pauta criterion. Pattern Recogn 121:108177. https://doi.org/10.1016/j.patcog.2021.108177

    Article  Google Scholar 

  21. Sander J, Ester M, Kriegel H-P, Xu X (1998) Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Min Knowl Disc 2(2):169–194. https://doi.org/10.1023/A:1009745219419

    Article  Google Scholar 

  22. Di R, Wang H, Fang Y, Zhou Y (2018) Fake comment detection based on time series and density peaks clustering. In: International Conference on Algorithms and Architectures for Parallel Processing. https://doi.org/10.1007/978-3-030-05234-8_15

  23. Campello R J G B, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Advances in Knowledge Discovery and Data Mining. https://doi.org/10.1007/978-3-642-37456-2_14

  24. Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: Parameter reduction and outlier detection. Inf Syst 38(3):317–330. https://doi.org/10.1016/j.is.2012.09.001

    Article  Google Scholar 

  25. dos Santos JA, Iqbal ST, Naldi MC, Campello RJGB, Sander J (2021) Hierarchical density-based clustering using MapReduce. IEEE Transactions Big Data 7(1):102–114. https://doi.org/10.1109/TBDATA.2019.2907624

    Article  Google Scholar 

  26. Campello R J G B, Moulavi D, Zimek A, Sander J (2015) Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data 10(1):1–51. https://doi.org/10.1145/2733381

    Article  Google Scholar 

  27. Li H, Liu X, Li T, Gan R (2020) A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recogn 102:107206. https://doi.org/10.1016/j.patcog.2020.107206

    Article  Google Scholar 

  28. Zhu Y, Ting K M, Carman M J, Angelova M (2021) CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities. Pattern Recogn 117:107977. https://doi.org/10.1016/j.patcog.2021.107977

    Article  Google Scholar 

  29. Huang T, Wang S, Zhu W (2020) An adaptive kernelized rank-order distance for clustering non-spherical data with high noise. International Journal of Machine Learning and Cybernetics 11(8):1735–1747. https://doi.org/10.1007/s13042-020-01068-9

    Article  Google Scholar 

  30. Guan J, Li S, He X, Zhu J, Chen J (2021) Fast hierarchical clustering of local density peaks via an association degree transfer method. Neurocomputing 455:401–418. https://doi.org/10.1016/j.neucom.2021.05.071

    Article  Google Scholar 

  31. Sun L, Qin X, Ding W, Xu J, Zhang S (2021) Density peaks clustering based on k-nearest neighbors and self-recommendation. International Journal of Machine Learning and Cybernetics 12(7):1913–1938. https://doi.org/10.1007/s13042-021-01284-x

    Article  Google Scholar 

  32. Abbas MA, El-Zoghabi AA, Shoukry AA (2021) DenMune: Density peak based clustering using mutual nearest neighbors. Pattern Recogn 109:107589. https://doi.org/10.1016/j.patcog.2020.107589

    Article  Google Scholar 

  33. Fang F, Qiu L, Yuan S (2020) Adaptive core fusion-based density peak clustering for complex data with arbitrary shapes and densities. Pattern Recogn 107:107452. https://doi.org/10.1016/j.patcog.2020.107452

    Article  Google Scholar 

  34. Liang B, Cai J, Yang H (2022) A new cell group clustering algorithm based on validation & correction mechanism. Expert Syst Appl 193:116410. https://doi.org/10.1016/j.eswa.2021.116410

    Article  Google Scholar 

  35. Ros F, Guillaume S, Hajji M E, Riad R (2020) KdMutual: A novel clustering algorithm combining mutual neighboring and hierarchical approaches using a new selection criterion. Knowl-Based Syst 204:106220. https://doi.org/10.1016/j.knosys.2020.106220

    Article  Google Scholar 

  36. Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9 (11):2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.html

    MATH  Google Scholar 

  37. Fu L, Medico E (2007) FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinforma 8(1):1–15. https://doi.org/10.1186/1471-2105-8-3

    Article  Google Scholar 

  38. Dua D, Graff C (2017) UCI machine learning repository, University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml

  39. Rossi RA, Ahmed NK (2015) The network data repository with interactive graph analytics and visualization. In: AAAI conference on artificial intelligence. http://networkrepository.com

  40. Nakai K, Kanehisa M (1991) Expert system for predicting protein localization sites in gram-negative bacteria. Proteins 11(2):95–110. https://doi.org/10.1002/prot.340110203

    Article  Google Scholar 

  41. Hull JJ (1994) A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(5):550–554. https://doi.org/10.1109/34.291440

    Article  Google Scholar 

  42. Guyon I, Gunn SR, Ben-Hur A, Dror G (2004) Result analysis of the NIPS 2003 feature selection challenge. In: Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2004

  43. Nene SA, Nayar SK, Murase H, et al. (1996) Columbia object image library (COIL-20)

  44. Sim T, Baker S, Bsat M (2002) The CMU pose, illumination, and expression (PIE) database. In: IEEE International Conference on Automatic Face and Gesture Recognition. https://doi.org/10.1109/AFGR.2002.1004130

  45. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability

  46. Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2001

  47. Zhang W, Zhao D, Wang X (2013) Agglomerative clustering via maximum incremental path integral. Pattern Recogn 46:3056–3065. https://doi.org/10.1016/j.patcog.2013.04.013

    Article  MATH  Google Scholar 

  48. Nie F, Wang X, Jordan MI, Huang H (2016) The constrained laplacian rank algorithm for graph-based clustering. In: AAAI conference on artificial intelligence. http://www.aaai.org/Library/AAAI/aaai16contents.php

  49. Aggarwal CC, Reddy CK (eds.) (2014) Data clustering: Algorithms and applications. CRC Press, http://www.crcpress.com/product/isbn/9781466558212

  50. Strehl A, Ghosh J (2002) Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617. http://jmlr.org/papers/v3/strehl02a.html

    MathSciNet  MATH  Google Scholar 

  51. Kuhn HW (1955) The hungarian method for the assignment problem. Nav Res Logist 2(1-2):83–97

    Article  MathSciNet  MATH  Google Scholar 

  52. Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38

    Article  MathSciNet  MATH  Google Scholar 

  53. Tao Y, Takagi K, Nakata K (2021) Clustering-friendly representation learning via instance discrimination and feature decorrelation. In: International Conference on Learning Representations. https://openreview.net/forum?id=e12NDM7wkEY

  54. Zhong G, Pun C-M (2020) Subspace clustering by simultaneously feature selection and similarity learning. Knowl-Based Syst 193:105512. https://doi.org/10.1016/j.knosys.2020.105512

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by the National Natural Science Foundation of China (Grant No. 61772120).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruijia Li.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, R., Cai, Z. A clustering algorithm based on density decreased chain for data with arbitrary shapes and densities. Appl Intell 53, 2098–2109 (2023). https://doi.org/10.1007/s10489-022-03583-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03583-4

Keywords

Navigation