ABSTRACT
Semi-supervised clustering with instance-level constraints is one of the most active research topics in the areas of pattern recognition, machine learning and data mining. Several recent studies have shown that instance-level constraints can significantly increase accuracies of a variety of clustering algorithms. However, instance-level constraints may split the search space of the optimal clustering solution into pieces, thus significantly compound the difficulty of the search task. This paper explores a genetic approach to solve the problem of semi-supervised clustering with instance-level constraints. In particular, a novel semi-supervised clustering algorithm with instance-level constraints, termed as the hybrid genetic-guided semi-supervised clustering algorithm with instance-level constraints (Cop-HGA), is proposed. Cop-HGA uses a hybrid genetic algorithm to perform the search task of a high quality clustering solution that is able to draw a good balance between predefined clustering criterion and available instance-level background knowledge. The effectiveness of Cop-HGA is confirmed by experimental results on several real data sets with artificial instance-level constraints.
- A.K. Jain, M.N. Murty, and P.J. Flynn. Data clustering: A review. ACM Computing Survey, 13:264--323, 1999. Google ScholarDigital Library
- K. Wagstaff. Intelligent Clustering with Instance-Level Constraints. Department of Computer Science and Engineering, Cornell University, 2002.Google Scholar
- M. Law. Clustering, Dimensionality Reduction, and Side Information. Department of Computer Science and Engineering, Michigan State University, 2006.Google Scholar
- K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In International Conference on Machine Learning, pages 577--584, 2001. Google ScholarDigital Library
- K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In International Conference on Machine Learning, pages 1103--1110, 2000. Google ScholarDigital Library
- D. Klein, S. D. Kamvar, and C. Manning. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In International Conference on Machine Learning, pages 307--314, 2002. Google ScholarDigital Library
- Z. Lu and T.K. Leen. Semi-supervised learning with penalized probabilistic clustering. In Advances in Neural Information Processing Systems, 2005.Google Scholar
- S. Basu, M. Bilenko, and R.J. Mooney. A probabilistic framework for semi-supervised clustering. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 56--68, 2004. Google ScholarDigital Library
- E.P. Xing, A. Y. Ng, M.I. Jordan, and S. Russell. Distance metrix learning with application to clustering with side-information. In Advances in Neural Information Processing Systems, pages 505--512, 2002.Google Scholar
- S. Basu and I. Davidson. Clustering with constraints: Theory and practice. In ACM KDD2006 Tutorials, 2006.Google Scholar
- Y. Hong and S. Kwong. Learning assignment order of instances for constrained k-means clustering algorithm. IEEE Transactions on System, Man and Cybernetics, Part B, Under Review. Google ScholarDigital Library
- Y. Hong, S. Kwong, H. Xiong, and Qingsheng Ren. Data clustering using virtual population based incremental learning algorithm with similarity matrix encoding strategy. In GECCO 2008, to appear. Google ScholarDigital Library
- C. Blake and C. Merz. UCI Machine Learning Repository. http://www.ics.uci.edu/mlearn/MLRepository.html, 1998.Google Scholar
- W.M. Rand. Objective criterion for the evaluation of clustering methods. Journal of Americal Statistical Association, 66:846--850, 1970.Google ScholarCross Ref
- K. Krishna and M. Murty. Genetic k-means algorithm. IEEE Transactions on System, Man, and Cybernetics-Part B, 29:433--439, 1999. Google ScholarDigital Library
Index Terms
- Genetic-guided semi-supervised clustering algorithm with instance-level constraints
Recommendations
Effective semi-supervised document clustering via active learning with instance-level constraints
Semi-supervised document clustering, which takes into account limited supervised data to group unlabeled documents into clusters, has received significant interest recently. Because of getting supervised data may be expensive, it is important to get ...
Active Learning of Instance-Level Constraints for Semi-supervised Document Clustering
WI-IAT '09: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01This paper presents a framework that actively selects informative documents pairs for semi-supervised document clustering. The semi-supervised document clustering algorithm is a Constrained DBSCAN (Cons-DBSCAN), which incorporates instance-level ...
Semi-supervised Hierarchical Clustering
ICDM '11: Proceedings of the 2011 IEEE 11th International Conference on Data MiningSemi-supervised clustering (i.e., clustering with knowledge-based constraints) has emerged as an important variant of the traditional clustering paradigms. However, most existing semi-supervised clustering algorithms are designed for partitional ...
Comments