Clustering stability-based Evolutionary K-Means

He, Zhenfeng; Yu, Chunyan

doi:10.1007/s00500-018-3280-0

Clustering stability-based Evolutionary K-Means

Methodologies and Application
Published: 02 June 2018

Volume 23, pages 305–321, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

695 Accesses
34 Citations
Explore all metrics

Abstract

Evolutionary K-Means (EKM), which combines K-Means and genetic algorithm, solves K-Means’ initiation problem by selecting parameters automatically through the evolution of partitions. Currently, EKM algorithms usually choose silhouette index as cluster validity index, and they are effective in clustering well-separated clusters. However, their performance of clustering noisy data is often disappointing. On the other hand, clustering stability-based approaches are more robust to noise; yet, they should start intelligently to find some challenging clusters. It is necessary to join EKM with clustering stability-based analysis. In this paper, we present a novel EKM algorithm that uses clustering stability to evaluate partitions. We firstly introduce two weighted aggregated consensus matrices, positive aggregated consensus matrix (PA) and negative aggregated consensus matrix (NA), to store clustering tendency for each pair of instances. Specifically, PA stores the tendency of sharing the same label and NA stores that of having different labels. Based upon the matrices, clusters and partitions can be evaluated from the view of clustering stability. Then, we propose a clustering stability-based EKM algorithm CSEKM that evolves partitions and the aggregated matrices simultaneously. To evaluate the algorithm’s performance, we compare it with an EKM algorithm, two consensus clustering algorithms, a clustering stability-based algorithm and a multi-index-based clustering approach. Experimental results on a series of artificial datasets, two simulated datasets and eight UCI datasets suggest CSEKM is more robust to noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Article 29 February 2024

Elmira Pourabbasi, Vahid Majidnezhad, … Yasser jafari

Density-Based Clustering Based on Hierarchical Density Estimates

References

Aggarwal CC, Reddy CK (2014) Data clustering: algorithms and applications. CRC Press, Boca Raton
Book MATH Google Scholar
Alves V, Campello RJGB, Hruschka ER (2006) Towards a fast evolutionary algorithm for clustering. In: Proceedings of IEEE congress on evolutionary computation (CEC 2006), pp 1776–1783
Arbelaitz O, Gurrutxaga I, Muguerza J, Perez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46:243–256
Article Google Scholar
Arthur D, Vassilvitskii (2007) S K-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms (SODA), pp 1027–1035
Bache K, Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml
Bandyopadhyay S, Maulik U (2002) An evolutionary technique based on K-Means algorithm for optimal clustering in \(R^N\). Inf Sci 146:221–237
Article MATH Google Scholar
Ben-David S, von Luxburg U, Páal D (2006) A sober look at clustering stability. In: Proceedings of the 19th annual conference on learning theory (COLT 2006), pp 5–19
Bezdek JC, Boggavarapu S, Hall LO, Bensaid A (1994) Genetic algorithm guided clustering. In: Proceedings of the first IEEE conference on evolutionary computation, pp 34–39
Brunsch T, Roglin H (2013) A bad instance for k-means++. Theoret Comput Sci 505:19–26
Article MathSciNet MATH Google Scholar
Bubeck S, Meilă M, Luxburg U (2012) How the initialization affects the stability of the K-Means algorithm. ESAIM Prob Stat 16:436–452
Article MathSciNet MATH Google Scholar
Cano JR, Cordon O, Herrera F, Sanchez F (2002) A greedy randomized adaptive search procedure applied to the clustering problem as an initialization process using K-Means as a local search procedure, J Intell Fuzzy Syst 12:235–242
MATH Google Scholar
Charrad M, Ghazzali N, Boiteau V, Niknafs A (2014) NbClust: an R package for determining the relevant number of clusters in a data set. J Stat Softw 61(6):1–36
Article Google Scholar
Chen S, Chao Y, Wang H, Fu H (2006) A prototypes-embedded genetic K-Means algorithm. In: Proceedings of the 18th international conference on pattern recognition (ICPR), pp 724–727
Chiu TY, Hsu TC, Wang JS (2010) AP-based consensus clustering for gene expression time series. In: Proceedings of the 20th international conference on pattern recognition (ICPR), pp 2512–2515
Chiui TY, Hsu TC, Yen CC, Wang JS (2015) Interpolation based consensus clustering for gene expression time series. BMC Bioinform 16:117
Article Google Scholar
Craenendonck TV, Blockeel H (2015) Using internal validity measures to compare clustering algorithms. ICML 2015 AutoML Workshop, https://lirias.kuleuven.be/bitstream/123456789/504712/1/automl_camera.pdf
de Amorima RC (2015) Recovering the number of clusters in data sets with noise features using feature rescaling factors. Inf Sci 324:126–145
Article MathSciNet Google Scholar
Erisoglu M, Calis N, Sakallioglu S (2011) A new algorithm for initial cluster centers in K-Means algorithm. Pattern Recogn Lett 32:1701–1705
Article Google Scholar
Famili AF, Liu G, Liu Z (2004) Evaluation and optimization of clustering in gene expression data analysis. Bioinformatics 20(10):1535–1545
Article Google Scholar
Fang Y, Wang J (2012) Selection of the number of clusters via the bootstrap method. Comput Stat Data Anal 56(3):468–477
Article MathSciNet MATH Google Scholar
Hall LO, Özyurt IB, Bezdek JC (1999) Clustering with a genetically optimized approach. IEEE Trans Evol Comput 3(2):103–112
Article Google Scholar
Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76
Article Google Scholar
He Z (2016) Evolutionary K-Means with pair-wise constraints. Soft Comput 20(1):287–301
Article MathSciNet Google Scholar
Hennig C (2007) Cluster-wise assessment of cluster stability. Comput Stat Data Anal 52(1):258–271
Article MathSciNet MATH Google Scholar
Hruschka ER, Campello RJGB, de Castro LN (2006) Evolving clusters in gene-expression data. Inf Sci 176:1898–1927
Article MathSciNet Google Scholar
Hruschka ER, Campello RJGB, Freitas AA, Carvalho ACPLF (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst Man Cybern Part C Appl Rev 39(2):133–155
Article Google Scholar
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666
Article Google Scholar
Krishna K, Murty MN (1999) Genetic K-Means algorithm. IEEE Trans Syst Man Cybern B Cybern 29(3):433–439
Article Google Scholar
Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of internal clustering validation measures. In: Proceedings on 10th IEEE international conference on data mining (ICDM 2010), pp 911–916
Moller U (2009) Resampling methods for unsupervised learning from sample data. In: Mellouk A, Chebira A (eds) Machine learning. InTech, Cape Town, SA, pp 289–304 http://cdn.intechweb.org/pdfs/6069.pdf
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52:91118
Article MATH Google Scholar
Naldi MC, Campello RJGB, Hruschka ER, Carvalho ACPLF (2011) Efficiency issues of evolutionary K-Means. Appl Soft Comput 11:1938–1952
Article Google Scholar
R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/
Rahman MA, Islam MZ, Bossomaier T, DenClust (2014) A density based seed selection approach for K-Means. In: Proceedings of 13th international conference on artificial intelligence and soft computing (ICSISC), Part II, Lecture notes in computer science, vol 8468, pp 784–795
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article MATH Google Scholar
Schmidt TSB, Matias Rodrigues JF, von Mering C (2015) Limits to robustness and reproducibility in the demarcation of operational taxonomic units. Environ Microbiol 17(5):1689–1706
Article Google Scholar
Senbabaoglu Y, Michailidis G, Li JZ (2014) Critical limitations of consensus clustering in class discovery. Sci Rep 4:6207
Article Google Scholar
Shamir O, Tishby N (2010) Stability and model selection in K-Means clustering. Mach Learn 80(2–3):213–243
Article MathSciNet Google Scholar
Vendramin L, Campello RJGB, Hruschka ER (2010) Relative clustering validity criteria: a comparative overview. Stat Anal Data Min 3(4):243–256
MathSciNet Google Scholar
Vinh NX, Epps J (2009) A novel approach for automatic number of clusters detection in microarray data based on consensus clustering. In: Proceedings of the 9th international conference on bioinformatics and bioengineering (BIBE), pp 84–91
Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary?. In: Proceedings of the 26th annual international conference on machine learning (ICML 2009), pp 1073–1080
von Luxburg U (2009) Clustering stability: an overview. Found Trends Mach Learn 2(3):235–274
Article MATH Google Scholar
Wang X, Qiu W, Zamar RH (2007) CLUES: a non-parametric clustering method based on local shrinking. Comput Stat Data Anal 52(1):286–298
Article MathSciNet MATH Google Scholar
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar
Yu Z, Wong H, Wang H (2007) Graph based consensus clustering for class discovery from gene expression data. Bioinformatics 23(21):2888–2896
Article Google Scholar

Download references

Acknowledgements

This study was funded by National Nature Science Foundation of China (Grant No. 60805042), and Fujian Natural Science Foundation (Grant No. 2018J01794).

Author information

Authors and Affiliations

College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
Zhenfeng He & Chunyan Yu

Authors

Zhenfeng He
View author publications
You can also search for this author in PubMed Google Scholar
Chunyan Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenfeng He.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, Z., Yu, C. Clustering stability-based Evolutionary K-Means. Soft Comput 23, 305–321 (2019). https://doi.org/10.1007/s00500-018-3280-0

Download citation

Published: 02 June 2018
Issue Date: 24 January 2019
DOI: https://doi.org/10.1007/s00500-018-3280-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering stability-based Evolutionary K-Means

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Density-Based Clustering Based on Hierarchical Density Estimates

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering stability-based Evolutionary K-Means

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Density-Based Clustering Based on Hierarchical Density Estimates

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation