ABSTRACT
Data clustering is a good benchmark problem for testing the performance of many combinatory optimization methods. However, very few works have been done on using the estimation of distribution algorithms for solving the problem of data clustering. The purpose of this paper is to demonstrate the effectiveness of the estimation of distribution algorithms for solving the problem of data clustering. In particular, a novel encoding strategy termed as the Similarity Matrix Encoding strategy (SME) and a Virtual Population Based Incremental Learning algorithm using SME encoding strategy (VPBIL-SME) are proposed for clustering a set of unlabeled instances into groups. Effectiveness of VPBIL-SME is confirmed by experimental results on several real data sets.
- Y. Hong, S. Kwong, Q. Ren, and X. Wang. A comprehensive comparison between real population based tournament selection and virtual population based tournament selection. In IEEE Congress on Evolutionary Computation (CEC2007), pages 445--452, 2007.Google ScholarCross Ref
Index Terms
- Data clustering using virtual population based incremental learning algorithm with similarity matrix encoding strategy
Recommendations
A dynamic shuffled differential evolution algorithm for data clustering
In order to further improve the convergence performance of data clustering algorithms, a dynamic shuffled differential evolution algorithm, DSDE for short, is presented in this paper. In DSDE, mutation strategy DE/best/1 is employed, which can take ...
Improvement in k-Means Clustering Algorithm Using Data Clustering
ICCUBEA '15: Proceedings of the 2015 International Conference on Computing Communication Control and AutomationThe set of objects having same characteristics are organized in groups and clusters of these objects reformed known as Data Clustering. It is an unsupervisedlearning technique for classification of data. K-means algorithm is widely used and famous ...
A size-insensitive integrity-based fuzzy c-means method for data clustering
Fuzzy c-means (FCM) is one of the most popular techniques for data clustering. Since FCM tends to balance the number of data points in each cluster, centers of smaller clusters are forced to drift to larger adjacent clusters. For datasets with ...
Comments