Abstract
Clustering by fast search and detection of density peaks (DPC, Density Peaks Clustering) is a relatively novel clustering algorithm published in the Science journal. As a density-based clustering algorithm, DPC produces better clustering results while using less parameters than other relevant algorithms. However, we found that the DPC algorithm does not perform well if clusters with different densities are very close. To address this problem, we propose a new DPC algorithm by incorporating an improved mutual k-nearest-neighbor graph (Mk-NNG) into DPC. Our Mk-NNG-DPC algorithm leverages the distance matrix of data samples to improve the Mk-NNG, and then utilizes DPC to constrain and select cluster centers. The proposed Mk-NNG-DPC algorithm ensures an instance to be allocated to the fittest cluster. Experimental results on synthetic and real world datasets show that our Mk-NNG-DPC algorithm can effectively and efficiently improve clustering performance, even for clusters with arbitrary shapes.
Similar content being viewed by others
References
Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Burlington
Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, New York
Chifu AG, Hristea F, Mothe J, Popescu M (2015) Word sense discrimination in information retrieval: a spectral clustering-based approach. Inf Process Manag 51(2):16–31
Kaufman L, Rousseeuw P (2009) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken
Kearns M, Mansour Y, Ng AY (1999) An information-theoretic analysis of hard and soft assignment methods for clustering. In: Jordan MI (ed) Learning in graphical models. MIT Press, Cambridge, pp 495–520
Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769
Bishnu PS, Bhattacherjee V (2013) A modified K-modes clustering algorithm. pattern recognition and machine intelligence, Volume 8251 of the series Lecture Notes in Computer Science, 2013, pp 60–66
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD international conference on management of data, 1996, pp 103–114
Karypis G, Han E, Kumar V (1999) CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput 32(8):68–75
Fan J (2015) OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm. Neural Comput Appl. https://doi.org/10.1007/s00521-015-1998-5
Ester M, Kriegel H, Sander J, Xu X, Simoudis E, Han J, Fayyad UM (eds) (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). AAAI Press, pp 226–231
Hinneburg A, Gabriel HH (2007) DENCLUE 2.0: fast clustering based on kernel density estimation. In: Proceedings of the 2007 international conference on intelligent data analysis (IDA’07), Ljubljana, Slovenia, 2007, pp 70–80
Banerjee A, Shan H (2010) Model-based clustering. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning, pp 686–689
Ding S, Zhang N, Zhang J, Xu X, Shi Z (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8(2):587–595
Du M, Ding S, Xu X, Xue Y (2017) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-017-0648-x
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Arias-Castro E, Chen G, Lerman G (2011) Spectral clustering based on local linear approximations. Electron J Stat 5(1):1537–1587
Székely GJ, Rizzo ML (2005) Hierarchical clustering via Joint between-within distances: extending ward’s minimum variance method. J Classif 22(2):151–183
Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New Jersey
Fu G (1998) Optimization methods for fuzzy clustering. Fuzzy Sets Syst 93(3):301–309
Nayak J, Naik B, Behera HS (2014) Fuzzy C-means (FCM) clustering algorithm: a decade review from 2000 to 2014. Computational Intelligence in Data Mining-Volume 2, Volume 32 of the series Smart Innovation, Systems and Technologies, pp 133–149
Brito MR, Chávez EL, Quiroz AJ, Yukich JE (1997) Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat Probab Lett 35(1):33–42
Sardana D, Bhatnagar R (2014) Graph clustering using mutual K-nearest neighbors. Active Media Technology, Volume 8610 of the series Lecture Notes in Computer Science, pp 35–48
Xie J, Gao H, Xie W (2016) K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset. Sci Sin Inf 46(2):258–280
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. University of California, School of Information and Computer Science, Irvine
Cover TM, Thomas JA (2001) Elements of information theory. Wiley, Hoboken
Fan J, Niu Z, Liang Y, Zhao Z (2016) Probability model selection and parameter evolutionary estimation for clustering imbalanced data without sampling. Neurocomputing 211(10):172–181
Wang F, Zhang C (2005) Spectral clustering for time series. In: Proceedings of third international conference on advances in pattern recognition, ICAPR 2005, Bath, UK, August 22–25, 2005, pp 345–354
Xu X, Ding S, Du M, Xue Y (2018) DPCG: an efficient density peaks clustering algorithm based on grid. Int J Mach Learn Cybern 9(5):743–754
Du M, Ding S, Xue Y (2018) A robust density peaks clustering algorithm using fuzzy neighborhood. Int J Mach Learn Cybern 9(7):1131–1140
Bai X, Yang P, Shi X (2017) An overlapping community detection algorithm based on density peaks. Neurocomputing 226(2):7–15
Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science. Springer, Berlin Heidelberg, pp 160–172
Li J, Huang X, Selke C, Yong J (2007) A fast algorithm for finding correlation clusters in noise data. In: Proceedings of the 11th Pacific-Asia conference on knowledge discovery and data mining, pp 639–647
Zhang T-T, Yuan B (2018) Density-based multiscale analysis for clustering in strong noise settings with varying densities. IEEE Access 6:25861–25873
Zhang H, Wang S, Xu X, Chow TWS, Wu QMJ (2018) Tree2Vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 29(11):5304–5318
Wang X, Xing H-J, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654
Wang R, Wang X, Kwong S, Chen X (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475
Wang X, Wang R, Chen X (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715
Wang X, Zhang T, Wang R (2019) Non-iterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. IEEE Trans Syst Man Cybern Syst 49(7):1299–1380
Lin JCW, Yang L, Fournier-Viger P, Hong TP (2018) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238
Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77
Chen CM, Xiang B, Liu Y, Wang KH (2019) A secure authentication protocol for internet of vehicles. IEEE ACCESS 7(1):12047–12057
Chen CM, Xiang B, Wang KH, Yeh KH, Wu TY (2018) A robust mutual authentication with a key agreement scheme for session initiation protocol. Appl Sci 8(10):1
Yang C, Huang L, Li F (2018) Exponential synchronization control of discontinuous non-autonomous networks and autonomous coupled networks. Complexity 1:1–10
Lian D, Xianwen F, Chuangxia H (2017) Global exponential convergence in a delayed almost periodic nicholsons blowflies model with discontinuous harvesting. Math Methods Appl Sci 41(5):1954–1965
Lian D, Lihong H, Zhenyuan G (2017) Periodic attractor for reactiondiffusion high-order hopfield neural networks with time-varying delays. Comput Math Appl 73(2):233–245
Huang C, Liu B, Tian X, Yang L, Zhang X (2019) Global convergence on asymptotically almost periodic SICNNs with nonlinear decay functions. Neural Process Lett 49(2):625–641
Huang C, Zhang H, Huang L (2019) Almost periodicity analysis for a delayed Nicholson’s blowflies model with nonlinear density-dependent mortality term. Commun Pure Appl Anal 18(6):3337–3349
Huang C, Zhang H (2019) Periodicity of non-autonomous inertial neural networks involving proportional delays and non-reduced order method. Int J Biomath 12(02):1950016
Huang C, Cao J, Wen F, Yang X (2016) Stability analysis of SIR model with distributed delay on complex networks. PLoS One 11(8):e0158813
Li Y, Fan JC, Pan JS, Mao GH, Wu GK (2019) A novel rough fuzzy clustering algorithm with a new similarity measurement. J Internet Technol 20(4):1
Fan J-C, Li Y, Tang Lei-Yu, Geng-Kun W (2018) RoughPSO: rough set-based particle swarm optimisation. Int J Bio-Inspired Comput 12(4):245–253
Acknowledgements
We would like to thank the anonymous reviewers for their valuable comments and suggestions. This work is supported by Shandong Provincial Natural Science Foundation of China under Grant ZR2018MF009, the State Key Research Development Program of China under Grant 2017YFC0804406, the National Natural Science Foundation of China under Grant 61433012, and 61303167, the Special Funds of Taishan Scholars Construction Project, and Leading Talent Project of Shandong University of Science and Technology.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fan, Jc., Jia, Pl. & Ge, L. Mk-NNG-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph. Int. J. Mach. Learn. & Cyber. 11, 1179–1195 (2020). https://doi.org/10.1007/s13042-019-01031-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-019-01031-3