Abstract
Most variants of fuzzy c-means (FCM) clustering algorithms involving prior knowledge are generally based on the modification of the objective function or the clustering process. This paper proposes a new weighted semi-supervised FCM algorithm (SSFCM-HPR) that transforms the prior knowledge in the labeled samples into constraint conditions in terms of fuzzy membership degrees, assigns different weights according to the representativeness of the samples, and then uses the HPR multiplier to solve the clustering problem. The “representativeness” of the labeled samples is decided by their distances to the cluster centers they belong to. In this paper, we take the ratio of the largest to the second largest fuzzy membership degree from a labeled sample as its weight. This algorithm not only retains the fuzzy partition of the labeled samples, which guarantees the effective guidance on the clustering process, but also can detect whether a sample is an outlier or not. Moreover, when part of the supervised information of the labeled samples is wrong, this algorithm can reduce the influence of the incorrectly labeled samples on the final clustering results. The experimental evaluation on synthetic and real data sets demonstrates the efficiency and effectiveness of our approach.
Similar content being viewed by others
References
Domeniconi C, Peng J, Yan B (2011) Composite kernels for semi-supervised clustering. Knowl Inf Syst 28(1): 99–116
Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained K-means clustering with background knowledge. In: Proceedings of the 18th international conference on Machine Learning, pp 577–584
Basu S, Banerjee A, Mooney RJ (2002) Semi-supervised clustering by seeding. In: Proceedings of the 19th international conference on machine learning, pp 19–26
Zhao W, He Q, Ma H, Shi Z (2011) Effective semi-supervised document clustering via active learning with instance-level constraints. Knowl Inf Syst. doi:10.1007/s10115-011-0389-1
Basu S, Banerjee A, Mooney RJ (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM international conference on data mining, pp 333–344
Yan B, Domeniconi C (2006) An adaptive kernel method for semi-supervised clustering. In: Proceedings of the 17th European conference on machine learning, pp 18–22
Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. In: Proceedings of the 20th international conference on machine learning, pp 11–18
Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning with application to clustering with side-information. In: Proceedings of the 16th annual conference on neural information processing system, pp 505–512
Yeung DY, Chang H (2006) Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints. Pattern Recognit 39(5): 1007–1010
Basu S, Banerjee A, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric teaming is semi-supervised clustering. In: Proceedings of the 21th international conference on machine learning, pp 81–88
Tang W, Xiong H, Zhong S, Wu J (2007) Enhancing semi-supervised clustering: a feature projection perspective. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 707–716
Zhang DQ, Zhou ZH, Chen SC (2007) Semi-supervised dimensionality reduction. In: Proceedings of the 7th SIAM international conference on data mining, pp 629–634
Pelekis N, Kopanakis I, Kotsifakos EE, Frentzos E, Theodoridis Y (2011) Clustering uncertain trajectories. Knowl Inf Syst 28(1): 117–147
Kianmehr K, Alshalalfa M, Alhajj R (2010) Fuzzy clustering-based discretization for gene expression classification. Knowl Inf Syst 24(3): 441–465
Pedrycz W (1985) Algorithms of fuzzy clustering with partial supervision. Pattern Recognit Lett 3(1): 13–20
Pedrycz W, Waletzky J (1997) Fuzzy clustering with partial supervision. IEEE Trans Syst Man Cybern B 27(5): 787–795
Pedrycz W, Waletzky J (1997) Neural-network front ends in unsupervised learning. IEEE Trans Neural Netw 8(2): 390–401
Pedrycz W (2005) Knowledge-based clustering: from data to information granules. Wiley, New York
Stutz C, Runkler TA (2002) Classification and prediction of road traffic using application-specific fuzzy clustering. IEEE Trans Fuzzy Syst 10(3): 297–308
Pedrycz W, Vukovich G (2004) Fuzzy clustering with supervision. Pattern Recognit 37(7): 1339–1349
Bouchachia A, Pedrycz W (2006) Data clustering with partial supervision. Data Min Knowl Discov 12: 47–78
Bouchachia A, Pedrycz W (2006) Enhancement of fuzzy clustering by mechanisms of partial supervision. Fuzzy Sets Syst 157(13): 1759–1773
Pedrycz W (2007) Collaborative and knowledge-based fuzzy clustering. Int J Innov Comput Inf Control 3(1): 1–12
Pedrycz W, Amato A, Lecce VD (2008) Fuzzy clustering with partial supervision in organization and classification of digital images. IEEE Trans Fuzzy Syst 16(4): 1008–1026
Grira N, Crucianu M, Boujemaa N (2008) Active semi-supervised fuzzy clustering. Pattern Recognit 41(5): 1834–1844
Kanzawa Y, Endo Y, Miyamoto S (2009) Some pairwise constrained semi-supervised Fuzzy c-Means clustering algorithms. In: Proceedings of the 7th international conference on modeling decisions for artificial intelligence, pp 268–281
Zhang HX, Lu J (2009) Semi-supervised fuzzy clustering: a kernel-based approach. Knowl Based Syst 22(6): 477–481
Bensaid A, Hall LO, Bezdek JC, Clarke LP (1996) Partially supervised clustering for image segmentation. Pattern Recognit 29(5): 859–871
Benkhalifa M, Bensaid A, Mouradi A (1999) Text categorization using the semi-supervised fuzzy C-Means algorithm. In: Proceedings of the 18th international conference of the North American fuzzy information, pp 561–565
Tari L, Baral C, Kim S (2009) Fuzzy c-Means clustering with prior biological knowledge. J Biomed Inf 42(1): 74–81
Kang JY, Min LQ, Luan QX, Li X, Liu JZ (2009) Novel modified fuzzy c-Means algorithm with applications. Digit Signal Process 19(2): 309–319
Liu L, Liang Q (2011) A high-performing comprehensive learning algorithm for text classification without pre-labeled training set. Knowl Inf Syst. doi:10.1007/s10115-011-0387-3
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-Means clustering algorithm. IEEE Trans Fuzzy Syst 13(4): 517–530
http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zeng, S., Tong, X., Sang, N. et al. A study on semi-supervised FCM algorithm. Knowl Inf Syst 35, 585–612 (2013). https://doi.org/10.1007/s10115-012-0521-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0521-x