ABSTRACT
The incorporation of semi-supervision in the cluster detection process has proved especially useful when one wants to get a high consistency between the data partitioning and the knowledge the user has about the data domain. In recent years, several strategies for semi-supervised clustering have been proposed. The approaches adopted by these strategies aim at guiding the process of cluster detection by using constraints: to interfere with the allocation of elements to the most appropriate cluster at each iteration of the algorithm; or to modify the objective function employed. This paper proposes a novel approach for incorporating semi-supervision in the well-known k-means algorithm. This semi-supervised clustering method employs constraint information in the definition of multiple assistant representatives for the centroids used at each iteration of k-means. A refinement process is designed to reduce the number of assistant representatives considered for each centroid without losing the clustering quality. The experimental results with eight synthetic datasets show the potential of the proposed approach for dealing with complex data structures composed by clusters of different shapes.
- M. C. N. Barioni, H. L. Razente, A. M. R. Marcelino, A. J. M. Traina, and C. Traina-Jr. Open issues for partitioning clustering methods: an overview. Wiley Interdisc. Rew.: DMKD, 4(3):161--177, 2014. Google ScholarDigital Library
- S. Basu, I. Davidson, and K. Wagstaff. Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC, 1 edition, 2008. Google ScholarDigital Library
- J. Demšar. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1--30, 2006. Google ScholarDigital Library
- A. Dubey, I. Bhattacharya, and S. Godbole. A cluster-level semi-supervision model for interactive clustering. In ECML PKDD, LNCS vol. 6321, pages 409--424. Springer, Barcelona, 2010. Google ScholarDigital Library
- A. Gionis, H. Mannila, and P. Tsaparas. Clustering aggregation. ACM TKDD, 1(1), 2007. Google ScholarDigital Library
- S. Guha, R. Rastogi, and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. SIGMOD Rec., 27(2):73--84, 1998. Google ScholarDigital Library
- H. Huang, Y. Cheng, and R. Zhao. A semi-supervised clustering algorithm based on must-link set. In ADMA, pages 492--499, Berlin, 2008. Google ScholarDigital Library
- A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recogn. Lett., 31(8):651--666, 2010. Google ScholarDigital Library
- M. Ley. DBLP Computer Science Bibliography. http://dblp.uni-trier.de/. Accessed: 2014-09-16.Google Scholar
- E. Y. Liu, Z. Zhang, and W. Wang. Clustering with relative constraints. In KDD, pages 947--955, San Diego, 2011. ACM. Google ScholarDigital Library
- M. Liu, X. Jiang, and A. C. Kot. A multi-prototype clustering algorithm. Pattern Recognition, 42(5):689--698, 2009. Google ScholarDigital Library
- M. Pourrajabi, D. Moulavi, R. J. G. B. Campello, A. Zimek, J. Sander, and R. Goebel. Model selection for semi-supervised clustering. In EDBT, pages 331--342, Athens, 2014.Google Scholar
- W. Rand. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc., 66(336):846--850, 1971.Google ScholarCross Ref
- J. Schmidt, E. M. Brandle, and S. Kramer. Clustering with attribute-level constraints. In ICDM, pages 1206--1211, Vancouver, 2011. IEEE. Google ScholarDigital Library
- K. Wagstaff, C. Cardie, S. Rogers, and S. Schrödl. Constrained k-means clustering with background knowledge. In ICML, pages 577--584, Williamstown, 2001. Google ScholarDigital Library
- J. H. Zar. Biostatistical Analysis. Prentice Hall, 5th edition, 2010. Google ScholarDigital Library
- L. Zheng and T. Li. Semi-supervised hierarchical clustering. In ICDM, pages 982--991, Vancouver, 2011. IEEE. Google ScholarDigital Library
Index Terms
- Semi-supervised clustering using multi-assistant-prototypes to represent each cluster
Recommendations
Stratification-based semi-supervised clustering algorithm for arbitrary shaped datasets
AbstractSemi-supervised clustering is not only an important branch of semi-supervised learning but also an improvement direction for clustering. Semi-supervised clustering algorithms designed based on Kmeans, such as the classical Seeded-...
Density-based semi-supervised clustering
Semi-supervised clustering methods guide the data partitioning and grouping process by exploiting background knowledge, among else in the form of constraints. In this study, we propose a semi-supervised density-based clustering method. Density-based ...
Semi-supervised Hierarchical Clustering
ICDM '11: Proceedings of the 2011 IEEE 11th International Conference on Data MiningSemi-supervised clustering (i.e., clustering with knowledge-based constraints) has emerged as an important variant of the traditional clustering paradigms. However, most existing semi-supervised clustering algorithms are designed for partitional ...
Comments