Skip to main content
Log in

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Cognitive computing involves discovering hidden rules and patterns in massive volumes of data. Density peaks clustering (DPC) is a powerful data mining tool that can identify density peaks in decision graphs and assign labels to them without requiring iterations. It can efficiently and simply detect clusters of arbitrary shapes. However, on the one hand, density measurement using the ϵ neighbor or Gaussian kernel only reflects the global structure of the data, so that correct density peaks cannot be found, and performance on manifold datasets is weakened. On the other hand, the one-step allocation strategy results in chain reaction. Once a point with high density is misallocated, a series of points will be incorrectly assigned. To solve this problem, this paper proposes the Jaccard coefficient to measure the similarity between points. The proposed density measurement based on Jaccard coefficient is only related to the k points that share the max similarity with the given point, which can reflect the local structure of manifold datasets, and the density peaks can be identified accurately. Aiming at the chain reaction caused by the assignment strategy of DPC, we develop a two-step allocation strategy based on label propagation and the proposed measurement of similarity. The first step is to assign labels to points close to the clustering centers, where these are equal to labeled points in the label propagation algorithm. The second step is to complete the assignment of labels to the remaining points according to labeled data which is the nearest to each unassigned sample. We compared the proposed algorithm with four algorithms on synthetic datasets and real-world datasets. The three metrics among these algorithms show that the proposed algorithm outperforms other algorithms. The results of clustering on synthetic datasets verified the effectiveness of the proposed method for manifold datasets, and three metrics on the UCI datasets and the Olivetti Faces dataset show that it can reveal the patterns and associations of real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Jiu MY, Wolf C, Garcia C, et al. Supervised learning and codebook optimization for bag-of-words models. Cogn Comput. 2012;4(4):409–19.

    Article  Google Scholar 

  2. Jia H, Ding S, Du M. Self-tuning p-spectral clustering based on shared nearest neighbors. Cogn Comput. 2015;7(5):622–32.

    Article  Google Scholar 

  3. Wang H, Yang Y, Liu B, Fujita H. A study of graph-based system for multi-view clustering. Knowl-Based Syst. 2019;163:1009–19.

    Article  Google Scholar 

  4. Tan PN, Steinbach M, Kumar V. Introduction to data mining. Pearson Education India; 2016.

  5. Aggarwal CC, Reddy CK. Data clustering: algorithms and applications. Chapman & Hall/CRC; 2013.

  6. Shi Y, Otto C, Jain AK. Face clustering: representation and pairwise constraints. IEEE T Inf Foren Sec. 2018;13(7):1626–40.

    Article  Google Scholar 

  7. Li Z, Zheng Y, Cao L, Jiao L, Zhang C. A Student’s t-based density peaks clustering with superpixel segmentation (tDPCSS) method for image color clustering. Color Res Appl. 2020;(2).  https://doi.org/10.1002/col.22491

  8. Zeng X, Chen A, Zhou M. Color perception algorithm of medical images using density peak based hierarchical clustering. Biomed Signal Proces. 2019;48:69–79.

    Article  Google Scholar 

  9. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.

    Article  Google Scholar 

  10. Ivannikova E, Park H, Hämäläinen T, Lee K. Revealing community structures by ensemble clustering using group diffusion. Inform Fusion. 2018;42:24–36.

    Article  Google Scholar 

  11. Chang MS, Chen LH, Hung LJ, Rossmanith P, Wu GH. Exact algorithms for problems related to the densest k-set problem. Inform Process Lett. 2014;114(9):510–3.

    Article  MathSciNet  Google Scholar 

  12. Zhang H, Zhou A, Song S, Zhang Q, Gao XZ, Zhang J. A self-organizing multiobjective evolutionary algorithm. IEEE Trans Evol Comput. 2016;20(5):792–806.

    Article  Google Scholar 

  13. Luo J, Gu F. An adaptive niching-based evolutionary algorithm for optimizing multi-modal function. Int J Pattern Recogn. 2016;30(03):1–19.

    Article  MathSciNet  Google Scholar 

  14. Macqueen J, Some methods for clarification and analysis of multi variate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967; p.281–97.

  15. Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd. 1996;96(34):226–31.

    Google Scholar 

  16. Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;315(5814):972–6.

    Article  MathSciNet  Google Scholar 

  17. Wang W, Yang J, Muntz R. STING: a statistical information grid approach to spatial data mining. Proceedings of 23rd International Conference on Very Large Data Bases, 1997; p.186–95.

  18. Zhang T, Ramakrishnan R, Livny M. BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc. 1997;1(2):141–82.

    Article  Google Scholar 

  19. Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science. 2014;344(6191):1492–6.

    Article  Google Scholar 

  20. Heimerl F, John M, Han Q, Koch S, Ertl T. DocuCompass: effective exploration of document landscapes. IEEE Conference on Visual Analytics Science and Technology, 2016; p.11–20.

  21. Wang B, Zhang J, Ding F, Zou Y, editors. Multi-document news summarization via paragraph embedding and density peak clustering. 2017 International Conference on Asian Language Processing, 2017; p.260–3.

  22. Xu M, Li Y, Li R, Zou F, Gu X. EADP: An extended adaptive density peaks clustering for overlapping community detection in social networks. Neurocomputing. 2019;337:287–302.

    Article  Google Scholar 

  23. Kuhrova P, Best RB, Bottaro S, Bussi G, Sponer J, Otyepka M, et al. Computer folding of RNA tetraloops: identification of key force field deficiencies. J Chem Theory Comput. 2016;12(9):4534–48.

    Article  Google Scholar 

  24. Chen J, Li K, Hans S, Rong H, Moore J, et al. A disease diagnosis and treatment recommendation system based on big data mining and cloud computing. Inform Sciences. 2018;435:124–49.

    Article  Google Scholar 

  25. Chen Y, Lai D, Qi H, Wang J, Du J. A new method to estimate ages of facial image for large database. Multimed Tools Appl. 2016;75(5):2877–95.

    Article  Google Scholar 

  26. Shi Y, Chen Z, Qi Z, Meng F, Cui L. A novel clustering-based image segmentation via density peaks algorithm with mid-level feature. Neural Comput Appl. 2017;28(1):29–39.

    Article  Google Scholar 

  27. Jia S, Tang G, Zhu J, Li Q. A novel ranking-based clustering approach for hyperspectral band selection. IEEE T Geosci Remote. 2015;54(1):88–102.

    Article  Google Scholar 

  28. Sun K, Geng X, Ji L. Exemplar component analysis: a fast band selection method for hyperspectral imagery. IEEE Geosci Remote S. 2014;12(5):998–1002.

    Google Scholar 

  29. Xu X, Ding S, Du M, Xue Y. DPCG: an efficient density peaks clustering algorithm based on grid. Int J Mach Learn Cyb. 2018;9(5):743–54.

    Article  Google Scholar 

  30. Li M, Huang J, Wang J, editors. Paralleled fast search and find of density peaks clustering algorithm on GPUs with CUDA. 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2016; p.313–8.

  31. Li T, Ge H, Su S. Density peaks clustering by automatic determination of cluster centers. J Front Comput Sci Technol. 2016;10(11):1614–22.

    Google Scholar 

  32. Xu J, Wang G, Deng W. DenPEHC: Density peak based efficient hierarchical clustering. Inform Sciences. 2016;373:200–18.

    Article  Google Scholar 

  33. Du M, Ding S, Jia H. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst. 2016;99:135–45.

    Article  Google Scholar 

  34. Xie J, Gao H, Xie W, Liu X, Grant PW. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors. Inform Sciences. 2016;354:19–40.

    Article  Google Scholar 

  35. Xu X, Ding S, Wang L, Wang Y. A robust density peaks clustering algorithm with density-sensitive similarity. Knowl-Based Syst. 2020;200:1–11.

    Article  Google Scholar 

  36. Hennig C, Hausdorf B. Design of dissimilarity measures: a new dissimilarity between species distribution areas: Springer Berlin Heidelberg; 2006. pp.29–37.

  37. Vinh NX, Epps J, Bailey J. Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res. 2010;11:2837–54.

    MathSciNet  MATH  Google Scholar 

  38. Fowlkes EB, Mallows CL. A method for comparing two hierarchical clusterings. J Am Stat Assoc. 1983;78(383):553–69.

    Article  Google Scholar 

Download references

Funding

This work was supported by National Natural Science Foundation of China (21606159, 62176176).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoxia Han.

Ethics declarations

Ethics Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, X., Han, X., Chu, J. et al. Density Peaks Clustering Based on Jaccard Similarity and Label Propagation. Cogn Comput 13, 1609–1626 (2021). https://doi.org/10.1007/s12559-021-09906-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-021-09906-w

Keywords

Navigation