ABSTRACT
Clustering is a fundamental and hence widely studied problem in data analysis. In a multi-objective perspective, this paper combines principles from two different clustering paradigms: the connectivity principle from density-based methods is integrated into the partitional clustering approach. The standard k-Means algorithm is hybridized with Particle Swarm Optimization. The new method (PSO-kMeans) benefits from both a local and a global view on data and alleviates some drawbacks of the k-Means algorithm; thus, it is able to spot types of clusters which are otherwise difficult to obtain (elongated shapes, non-similar volumes). Our experimental results show that PSO-kMeans improves the performance of standard k-Means in all test cases and performs at least comparable to state-of-the-art methods in the worst case. PSO-kMeans is robust to outliers. This comes at a cost: the preprocessing step for finding the nearest neighbors for each data item is required, which increases the initial linear complexity of k-Means to quadratic complexity.
- A. Abraham, S. Das, and S. Roy. Swarm intelligence algorithms for data clustering. Soft Computing for Knowledge Discovery and Data Mining, Springer Verlag, pages 279--313, 2007.Google Scholar
- J. C. Bezdek, S. Boggavarapu, L. O. Hall, and A. Bensaid. Genetic algorithm guided clustering. In International Conference on Evolutionary Computation, pages 34--39, 1994.Google ScholarCross Ref
- M. Breaban, L. Alboaie, and H. Luchian. Guiding users within trust networks using swarm algorithms. In Proceedings of the Eleventh conference on Congress on Evolutionary Computation, CEC'09, pages 1770--1777, Piscataway, NJ, USA, 2009. IEEE Press. Google ScholarDigital Library
- M. Breaban and H. Luchian. A unifying criterion for unsupervised clustering and feature selection. Pattern Recognition, In Press, Corrected Proof:--, 2010. Google ScholarDigital Library
- X. Cui, T. E. Potok, and P. Palathingal. Document clustering using particle swarm optimization. In IEEE Swarm Intelligence Symposium, The Westin, 2005.Google ScholarCross Ref
- D. L. Davies and D. W. Bouldin. A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2):224--227, 1979.Google ScholarDigital Library
- D. Dumitrescu and K. Simon. Evolutionary prototype selection. In Proceedings of the International Conference on Theory and Applications of Mathematics and Informatics -- ICTAMI, pages 183--190, 2003.Google Scholar
- J. Handl and J. Knowles. Improving the scalability of multiobjective clustering. In Proceedings of the Congress on Evolutionary Computation, 2005.Google Scholar
- J. Handl and J. Knowles. Improving the scalability of multiobjective clustering. In Proceedings of the Congress on Evolutionary Computation, pages 2372--2379. IEEE Press, 2005.Google Scholar
- J. Handl, J. Knowles, and M. Dorigo. Ant-based clustering and topographic mapping. Artificial Life, 12, 2005. Google ScholarDigital Library
- A. Hubert. Comparing partitions. Journal of Classification, 2:193--198, 1985.Google ScholarCross Ref
- D. R. Jones and M. A. Beltramo. Solving partitioning problems with genetic algorithms. In 4th International Conference on Genetic Algorithms, pages 442--45O, 1991.Google Scholar
- J. Kennedy and R. Eberhart. Particle swarm optimization. In Proceedings of the 1995 IEEE International Conference on Neural Networks, volume 4, pages 1942--1948, 1995.Google ScholarCross Ref
- R. Krovi. Genetic algorithms for. clustering: A preliminary investigation. In Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, pages 540--544. IEEE Computer Society Press, 1991.Google Scholar
- S. Luchian, H. Luchian, and M. Petriuc. Evolutionary automated classification. In Proceedings of 1st Congress on Evolutionary Computation, pages 585--588, 1994.Google ScholarCross Ref
- O. Nasraoui, E. Leon, and R. Krishnapuram. Unsupervised niche clustering: Discovering an unknown number of clusters in noisy data sets. In A. Ghosh and L. Jain, editors, Evolutionary Computation in Data Mining, volume 163 of Studies in Fuzziness and Soft Computing, pages 157--188. Springer Berlin / Heidelberg, 2005.Google Scholar
- T. Niknam and B. Amiri. An efficient hybrid approach based on pso, aco and k-means for cluster analysis. Appl. Soft Comput., 10:183--197, January 2010. Google ScholarDigital Library
- P. J. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1):53--65, 1987. Google ScholarDigital Library
- I. Sarafis, A. M. S. Zalzala, and P. W. Trinder. A genetic rule-based data clustering toolkit. In Proceedings of the Evolutionary Computation on 2002. CEC '02. Proceedings of the 2002 Congress - Volume 02, CEC '02, pages 1238--1243, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
- C. Veenhuis and M. Koeppen. Data swarm clustering. Swarm Intelligence in Data Mining, Springer Berlin / Heidelberg, pages 221--241, 2006.Google Scholar
- D. Zaharie. Density based clustering with crowding differential evolution. Symbolic and Numeric Algorithms for Scientific Computing, International Symposium on, pages 343--350, 2005. Google ScholarDigital Library
Index Terms
- PSO aided k-means clustering: introducing connectivity in k-means
Recommendations
Ensemble-Initialized k-Means Clustering
ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and ComputingAs one of the most classical clustering techniques, the k-means clustering has been widely used in various areas over the past few decades. Despite its significant success, there are still several challenging issues in the k-means clustering research, ...
Ant clustering algorithm with K-harmonic means clustering
Clustering is an unsupervised learning procedure and there is no a prior knowledge of data distribution. It organizes a set of objects/data into similar groups called clusters, and the objects within one cluster are highly similar and dissimilar with ...
Initializing K-means Clustering Using Affinity Propagation
HIS '09: Proceedings of the 2009 Ninth International Conference on Hybrid Intelligent Systems - Volume 01K-means clustering is widely used due to its fast convergence, but it is sensitive to the initial condition.Therefore, many methods of initializing K-means clustering have been proposed in the literatures. Compared with Kmeans clustering, a novel ...
Comments