Skip to main content
Log in

A novel clustering algorithm based on PageRank and minimax similarity

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Clustering by fast search and find of density peaks (herein called FDPC), as a recently proposed density-based clustering algorithm, has attracted the attention of many researchers since it can recognize arbitrary-shaped clusters. In addition, FDPC needs only one parameter \(d_c\) and identifies the number of clusters by decision graph. Nevertheless, it is not clear how to find a proper \(d_c\) for a given data set and such a perfect parameter may not exist in practice for the multi-scale data set. In this paper, we proposed a modified PageRank algorithm to compute the local density for each data point which is more robust than Gaussian kernel and cutoff method. Besides, FDPC yields poor results on the random distribution data sets since there may be several maxima for one cluster. To solve this problem, we proposed an improved minimax similarity method. Comparing our proposed approach with FDPC on some artificial and real-life data sets, the experimental results indicate that our proposed approach outperforms FDPC in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://yann.lecun.com/exdb/mnist/.

  2. http://www.cis.upenn.edu/ jshi/software/.

References

  1. Kashyap M, Bhattacharya M (2017) A density invariant approach to clustering. Neural Comput Appl 28(7):1695–1713

    Article  Google Scholar 

  2. Kaur A, Datta A (2015) A novel algorithm for fast and scalable subspace clustering of high-dimensional data. J Big Data 2(1):17

    Article  Google Scholar 

  3. Wu D, Ren J, Sheng L (2017) Representative points clustering algorithm based on density factor and relevant degree. Int J Mach Learn Cybernet 8(2):641–649

    Article  Google Scholar 

  4. Yang XL, Song Q, Wu YL, Cao AZ (2009) A novel pruning approach for robust data clustering. Neural Comput Appl 18(7):759–768

    Article  Google Scholar 

  5. Gromov VA, Konev AS (2017) Precocious identification of popular topics on Twitter with the employment of predictive clustering. Neural Comput Appl 28(11):3317–3322

    Article  Google Scholar 

  6. Azimi R, Sajedi H (2018) Peer sampling gossip-based distributed clustering algorithm for unstructured P2P networks. Neural Comput Appl 29(2):593–612

    Article  Google Scholar 

  7. MacQueen J (1967). Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability vol 1, no 14, pp 281–297

  8. Huang X, Ye Y, Zhang H (2014) Extensions of kmeans-type algorithms: a new clustering framework by integrating intracluster compactness and intercluster separation. IEEE Trans Neural Netw Learn Syst 25(8):1433–1446

    Article  Google Scholar 

  9. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MathSciNet  Google Scholar 

  10. Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method. Pattern Recogn 58:39–48

    Article  Google Scholar 

  11. Bezdek JC (1980) A convergence theorem for the fuzzy ISODATA clustering algorithms. IEEE Trans Pattern Anal Mach Intell 1:1–8

    Article  MathSciNet  Google Scholar 

  12. Liu L, Sun L, Chen S, Liu M, Zhong J (2016) K-PRSCAN: a clustering method based on PageRank. Neurocomputing 175:65–80

    Article  Google Scholar 

  13. Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319

    Article  Google Scholar 

  14. Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957

    Article  Google Scholar 

  15. Tu E, Cao L, Yang J, Kasabov N (2014) A novel graph-based k-means for nonlinear manifold clustering and representative selection. Neurocomputing 143:109–122

    Article  Google Scholar 

  16. Chang D, Zhao Y, Liu L, Zheng C (2016) A dynamic niching clustering algorithm based on individual-connectedness and its application to color image segmentation. Pattern Recogn 60:334–347

    Article  Google Scholar 

  17. Tzortzis G, Likas A (2008, June). The global kernel k-means clustering algorithm. In: IEEE international joint conference on neural networks, 2008. IJCNN 2008 (IEEE World Congress on Computational Intelligence). IEEE, pp 1977–1984

  18. Dhillon I, Guan Y, Kulis B (2005) A fast kernel-based multilevel algorithm for graph clustering. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 629–634

  19. Nataliani Y, Yang MS (2017) Powered Gaussian kernel spectral clustering. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3036-2

    Article  Google Scholar 

  20. Hagen L, Kahng AB (1992) New spectral methods for ratio cut partitioning and clustering. IEEE Trans Comput Aided Des Integr Circuits Syst 11(9):1074–1085

    Article  Google Scholar 

  21. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  22. Ding C H, He X, Zha H, Gu M, Simon HD (2001) A min-max cut algorithm for graph partitioning and data clustering. In: Proceedings IEEE international conference on data mining, 2001, ICDM 2001. IEEE, pp 107–114

  23. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, vol 96, no. 34, pp 226–231

  24. Chen M, Li L, Wang B, Cheng J, Pan L, Chen X (2016) Effectively clustering by finding density backbone based-on kNN. Pattern Recogn 60:486–498

    Article  Google Scholar 

  25. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  26. Du H (2015). Robust local outlier detection. In Data Mining Workshop (ICDMW). In: 2015 IEEE international conference on IEEE, pp 116–123

  27. Shi Y, Chen Z, Qi Z, Meng F, Cui L (2017) A novel clustering-based image segmentation via density peaks algorithm with mid-level feature. Neural Comput Appl 28(1):29–39

    Article  Google Scholar 

  28. Du M, Ding S, Xue Y (2018) A robust density peaks clustering algorithm using fuzzy neighborhood. Int J Mach Learn Cybern 9(7):1131–1140

    Article  Google Scholar 

  29. Du M, Ding S, Xu X, Xue Y (2017) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-017-0648-x

    Article  Google Scholar 

  30. Bai L, Cheng X, Liang J, Shen H, Guo Y (2017) Fast density clustering strategies based on the k-means algorithm. Pattern Recogn 71:375–386

    Article  Google Scholar 

  31. Brin S, Page L (2012) The anatomy of a large-scale hypertextual web search engine. Comput Netw 56(18):3825–3833

    Article  Google Scholar 

  32. Fischer B, Buhmann JM (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25(4):513–518

    Article  Google Scholar 

  33. Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86

    Article  Google Scholar 

  34. Prim RC (1957) Shortest connection networks and some generalizations. Bell Labs Tech J 36(6):1389–1401

    Article  Google Scholar 

  35. Floyd RW (1962) Algorithm 97: shortest path. Commun ACM 5(6):345

    Article  Google Scholar 

  36. Xenaki SD, Koutroumbas KD, Rontogiannis AA (2016) A novel adaptive possibilistic clustering algorithm. IEEE Trans Fuzzy Syst 24(4):791–810

    Article  Google Scholar 

  37. Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recogn 41(1):191–203

    Article  Google Scholar 

  38. Jackson TS, Read N (2010) Theory of minimum spanning trees. I. Mean-field theory and strongly disordered spin-glass model. Phys Rev E 81(2):021130

    Article  Google Scholar 

  39. Del Corso GM, Gulli A, Romani F (2005) Fast PageRank computation via a sparse linear system. Internet Math 2(3):251–273

    Article  MathSciNet  Google Scholar 

  40. Rungsawang A, Manaskasemsak B (2012, February). Fast pagerank computation on a gpu cluster. In: 2012 20th Euromicro international conference on parallel, distributed and network-based processing (PDP). IEEE, pp 450–456

  41. Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):4

    Article  Google Scholar 

  42. Zelnik-Manor L, Perona P (2005) Self-tuning spectral clustering. In: Advances in neural information processing systems, pp 1601–1608

  43. Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, pp 1735–1742

  44. Liu Q, Zhang R, Zhao Z, Wang Z, Jiao M, Wang G (2018) Robust MST-based clustering algorithm. Neural Comput 30(6):1624–1646

    Article  Google Scholar 

  45. Zhang S, You Z, Wu X (2017) Plant disease leaf image segmentation based on superpixel clustering and EM algorithm. Neural Comput Appl PP:1–8

  46. Khemchandani R, Pal A, Chandra S (2018) Fuzzy least squares twin support vector clustering. Neural Comput Appl 29(2):553–563

    Article  Google Scholar 

  47. Moftah HM, Azar AT, Al-Shammari ET, Ghali NI, Hassanien AE, Shoman M (2014) Adaptive k-means clustering algorithm for MR breast image segmentation. Neural Comput Appl 24(7–8):1917–1928

    Article  Google Scholar 

  48. Zhang H, Wang S, Xu X, Chow TW, Wu QJ (2018) Tree2Vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 28:3045–3060

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Project No. 61702240) and the Fundamental Research Founds for the Central Universities (Project No. lzujbky-2017-191).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruisheng Zhang.

Ethics declarations

Conflict of interest

No conflict of interest exists in the submission of this manuscript.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Q., Zhang, R., Liu, X. et al. A novel clustering algorithm based on PageRank and minimax similarity. Neural Comput & Applic 31, 7769–7780 (2019). https://doi.org/10.1007/s00521-018-3607-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3607-x

Keywords

Navigation