Abstract
Semi-supervised clustering is a new learning method which combines semi-supervised learning (SSL) and cluster analysis. It is widely valued and applied to machine learning. Traditional unsupervised clustering algorithm based on data partition does not need any property; however, there are a small amount of independent class labels or pair constraint information data samples in practice; in order to obtain better clustering results, scholars have proposed a semi-supervised clustering. Compared with traditional clustering methods, it can effectively improve clustering performance through a small number of supervised information, and it has been used widely in machine learning. Firstly, this paper introduces the research status and classification of semi-supervised learning and compares the four classification methods as follows: decentralized model, support vector machine, graph, and collaborative training. Secondly, the semi-supervised clustering is described in detail, the current status of semi-supervised clustering is analyzed, and the Cop-kmeans algorithm, Lcop-kmeans algorithm, Seeded-kmeans algorithm, SC-kmeans algorithm, and other algorithms are introduced. The introduction of several semi-supervised clustering methods in this paper can show the advantages of semi-supervised clustering over traditional clustering, and the related literature in recent years is summarized. This paper summarized the latest development of semi-supervised learning and semi-supervised clustering and discussed the application of semi-supervised clustering and the future research direction.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Hartigan JA, Wong MA. Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat. 1979;28(1):100–8.
Maddah M, Crimson WEL, Warfield SK. Statistical modeling and EM clustering of white matter fiber tracts. IEEE International Symposium on Biomedical Imaging: Nano To Macro. IEEE; 2006. p. 53–56.
Li KL, Cao Z, Cao LP, et al. Some developments on semi-supervised clustering. Int J Pattern Recognit Artif Intell. 2009;22(5):735–42.
Chen WJ. Semi-supervised learning study summary. Comput Knowl Technol. 2011;07(16):3887–9.
Liu JW, Liu Y, Luo XL. Semi-supervised learning methods. Chin J Comput. 2015;38(08):1592–617.
Scudder HI. Probability of error of some adaptive pattern-recognition machines. IEEE Trans Inf Theory. 1965;11(3):363–71.
Fralick S. Learning to recognize patterns without a teacher. IEEE Trans Inf Theory. 2003;13(1):57–64.
Agrawala A. Learning with a probabilistic teacher. IEEE Trans Inf Theory. 1970;16(4):373–9.
Merz CJ, St. Clair DC, Bond WE. Semi-supervised adaptive resonance theory (SMART2). Int Jt Conf Neural Netw IEEE. 1992;3:851–6.
Shahshahani BM, Landgrebe D. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans Geosci Remote Sens. 1994;32(5):1087–95.
Wang J, Jebara T, Chang SF. Semi-supervised learning using greedy max-cut. J Mach Learn Res. 2013;14(1):771–800.
Klein D, Kamvar SD, Manning CD. From instance-level constraints to space-level constraints: ,making the most of prior knowledge in data clustering. The Nineteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. 2002. p. 307–314.
Cheng S, Shi Y, Qin Q. Particle swarm optimization based semi-supervised learning on Chinese text categorization. IEEE Congress on Evolutionary Computation Cec; 2012. p. 1–8.
Wang J, Kumar S, Chang SF. Semi-supervised hashing for scalable image retrieval. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, Ca, Usa, 13–18 June. DBLP, 2010:3424–3431.
Kingma DP, Rezende DJ, Mohamed S. Semi-supervised learning with deep generative models. Adv Neural Inf Proces Syst. 2014;4:3581–9.
Zhang J, Yu J, Tao D. Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process. 2018:1–10.
Zhang D, Zhou ZH, Chen S. Semi-supervised dimensionality reduction. Siam International Conference on Data Mining, April 26-28, 2007, Minneapolis, Minnesota, USA. DBLP; 2007. p. 11–393.
Zhou ZH, Li M. Semi-supervised regression with co-training. International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc.; 2005. p. 908–913.
Mehrkanoon S, Alzate C, Mall R, et al. Multi-class semi-supervised learning based upon kernel spectral clustering. IEEE Trans Neural Netw Learn Syst. 2015;26(4):720–33.
Callut J, Francoisse K, Saerens M, et al. Semi-supervised classification from discriminative random walk. Lect Notes Comput Sci. 2008;5211:162–77.
Zhou ZH. Machine learning. Tsinghua University Press; 2016.
Castelli V, Cover TM. On the exponential value of labeled samples. Elsevier Science Inc.; 1995.
Cozman FG, Cohen I. Unlabeled data can degrade classification performance of generative classifiers. Fifteenth International Florida Artificial Intelligence Society Conference. 2009. p. 327–331.
Baudat G, Anouar F. Generalized discriminant analysis using a kernel approach. Neural Comput. 2000;12(10):2385–404.
Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Read Speech Recognit. 1990;77(2):267–96.
Vapnik V, Sterin A. On structural risk minimization or overall risk in a problem of pattern recognition. Autom Remote Control. 1977;10(10):1495–503.
Zhang M, Pang L. Review of domestic application research of big data mining technology-SVM in credit risk evaluation. 3rd International Seminar on Education Innovation and Economic Management, Penang, Malaysia, 2018. p. 286.
Ding SF, Zhu ZB, Zhang XK. An overview on semi-supervised support vector machine. Neural Comput Applic. 2017;28(5):969–78.
Zhang H, Cao L, Gao S. A locality correlation preserving support vector machine. Pattern Recogn. 2014;47(9):3168–78.
Tao XM, Li Q, Guo WJ. Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification. Inf Sci. 2019:487.
Tang T, Chen S, Zhao M. Very large-scale data classification based on K-means clustering and multi-kernel SVM. Soft Comput. 2018;1:3793–801.
Bruzzone L, Chi M, Marconcini M. A novel transductive SVM for semi-supervised classification of remote-sensing images. IEEE Trans Geosci Remote Sens. 2006;44(11):3363–73.
Yu LI, Feng A, Zou SR. TSVM learning algorithm based on improved K-nearest neighbor. Comput Modern. 2018:22–5.
Chapelle O, Vapnik V, Bousquet O, et al. Choosing multiple parameters for support vector machines. Mach Learn. 2002;46(1–3):131–59.
Blum A, Chawla S. Learning from labeled and unlabeled data using Graph Mincuts. Eighteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc.; 2001. p. 19–26.
Szeliski R, Zabih R, Ssharstein D, et al. A comparative study of energy minimization methods for Markov random fields. European Conference on Computer Vision. Berlin: Springer; 2006. p. 16–29.
Zhu X, Lafferty J. Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. Int Conf DBLP. 2005:1052–9.
Zhou D, Scholkopf B. Learning from labeled and unlabeled data using random walks. Berlin Heidelberg: Springer; 2004.
Belkin M, Niyoge P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7(1):2399–434.
Goldberg AB, Li M, Zhu X. Online manifold regularization: a new learning setting and empirical study. European Conference on Machine Learning and Knowledge Discovery in Databases. Verlag: Springer; 2008. p. 393–407.
Balcan MF, Blum A, Choi PP, et al. Person identification in webcam images: an application of semi-supervised learning. International Conference on Machine Learning; 2005.
Blum A. Combining labeled and unlabeled data with co-training. Conf Comput Learn Theor 1998;92–100.
Coldman SA, Zhou Y. Enhancing supervised learning with unlabeled data. 2000. p. 327–334.
Wagstaff K, Cardie C, Rogers S, et al. Constrained K-means clustering with background knowledge. Proceedings of 18th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc;2001. p. 577–584.
Yang Y, Tan W, Li T, et al. Consensus clustering based on constrained self-organizing map and improved Cop-Kmeans ensemble in intelligent decision support systems. Knowl-Based Syst. 2012;32(32):101–15.
Chen ZY, Wang MJ, Hu M, et al. An active semi-supervised clustering algorithm based on seed set and pairwise constraints. J Jilin Univ (Sci Ed). 2017;55(3):664–72.
Davidson I, Ravi S. Clustering with constraints: feasibility issues and the k-means algorithm. SDM. 2005;16(95):1147–57.
Dan P, Baras D. K-means with large and noisy constraint sets. Mach Learn ECML. 2007;2008:674–82.
Wagstaff K, Cardie C. Clustering with instance-level constraints. 17th International Conference on Machine Learning; 2000. p. 1097–1103.
Basu S, Banerjee A, Mooney R. Semi-Supervised Clustering by Seeding. 19th International Conference on Machine Learning; 2002. p. 19–26.
Zheng L, Li T. Semi-supervised hierarchical clustering. 11th International Conference on Data Mining; 2011. p. 982–991.
He P, Xu X, Lu L. Semi-supervised clustering via two-level random walk. J Softw. 2014;25(5):997–1013.
Wang L, Bo LF, Jiao LC. Density-sensitive semi-supervised spectral clustering. J Softw. 2007;18(10):2412–22.
Shi X, Fan W, Yu P. Efficient semi-supervised spectral co-clustering with constraints. International Conference on Data Mining, 2010.
Tang Q, Liao ZG. A semi-supervised clustering method based on affinity propagation algorithm. Electron Inf Warfare Technol. 2017;32(1):8–12.
Yang Y, Rutayisire T, Lin C, et al. An improved cop-Kmeans clustering for solving constraint violation based on map reduce framework. Fundam Inf. 2013;126(4):301–18.
Sun Y, Xin L, Cheng W. A modified k-means algorithm for clustering problem with balancing constraint. Third International Conference on Measuring Technology and Mechatronics Automation. IEEE; 2011. p. 127–130.
Yin SS, Hu SL, Chen SC. Discriminative semi-supervised clustering analysis with pairwise constraint. J Softw. 2008;19(11):2791–802.
Wei S, Li Z, Zhang C. Combined constraint-based with metric-based in semi-supervised clustering ensemble. Int J Mach Learn Cybern. 2018;9(7):1085–100.
Li CM, Xu SB, Hao ZF. Cross-entropy semi-supervised clustering based on pairwise constraints. Pattern Recogn Artif Intell. 2017;30(7):598–608.
Ding S, Xu X, Fan SY, Xue Y. Locally adaptive multiple kernel k-means based on shared nearest neighbors. Soft Comput. 2018;22(14):4573–83.
Chai BF, Lu F, Li WB. Semi-supervised Kmeans clustering algorithm based on active learning priors. Comput Appl. 2018;38(11):93–7.
Basu S, Bilenko M, Mooney RJ. A probabilistic framework for semi-supervised clustering. 2004;59–68.
Ding S, Jia H, Du M, et al. A semi-supervised approximate spectral clustering algorithm based on HMRF model. Inf Sci. 2018;429:215–28.
Saha S, Bandyopadhyay S. Semi-GAPS: a semi-supervised clustering method using point symmetry. IOS Press; 2009.
Si WW, Qian YT. Semi-supervised clustering based on spectral cluster. Comput Appl. 2005;25(6):1347–9.
Bilenko M, Basu S, Mooney R J. Integrating constraints and metric learning in semi-supervised clustering. International Conference. DBLP, Banff, Alberta, Canada, 2004;11.
Alok AK, Saha S, Ekbal A. Feature selection and semi-supervised clustering using multi-objective optimization. Springer Plus. 2014;3(1):1–12.
Gui J, Wang SL, Lei YK. Multi-step dimensionality reduction and semi-supervised graph-based tumor classification using gene expression data. Artif Intell Med. 2010;50(3):181–91.
Saha S, Kaushik K, Alok AK, et al. Multi-objective semi-supervised clustering of tissue samples for cancer diagnosis. Soft Comput. 2016;20(9):3381–92.
Yu J, Tao D, Li J, et al. Semantic preserving distance metric learning and applications. Inf Sci. 2014;281:674–86.
Shiga M, Mamitsuka H. Efficient semi-supervised learning on locally informative multiple graphs. Pattern Recogn. 2012;45(3):1035–49.
Chen HS. Semi-supervised clustering ensemble for bio-molecular pattern mining. South China University of Technology; 2016.
Orozco-Duque A, Bustamante J, Castellanos-Dominguez G. Semi-supervised clustering of fractionated electrograms for electroanatomical atrial mapping. Biomed Eng Online. 2016;15(1):44.
Gan H, Fan Y, Luo Z. Local homogeneous consistent safe semi-supervised clustering. Expert Syst Appl. 2017;97:384–93.
Syed FH, Tahir MA. Safe semi supervised multi-target regression (MTR-SAFER) for new targets learning. Multimed Tools Appl. 2018;77:29971–87.
Wang Y, Chen J. Safe semi-supervised collaborative filtering recommendation algorithm. Comput Eng Appl. 2018;54(8):107–11.
Lu Z, Ip HHS. Combining context, consistency, and diversity cues for interactive image categorization. IEEE Trans Multimed. 2010;12(3):194–203.
Portela NM, Cavalcanti GDC, Ren TI. Semi-supervised clustering for MR brain image segmentation. Expert Syst Appl. 2014;41(4):1492–7.
Hasnat MA, Alata O, Tremeau A. Joint color-spatial-directional clustering and region merging (JCSD-RM) for unsupervised RGB-D image segmentation. IEEE Trans Pattern Anal Mach Intell. 2016;1–1.
An QQ, Zhang F, Li ZX. Research on image segmentation based on machine learning. Automation & Instrumentation. 2018;6:29–31.
Li YW. Research on robust segmentation algorithm based on semi-supervised fuzzy clustering. Xi’an: Xi’an University of Posts & Telecommunications; 2018.
Yu J, Tao D, Wang M, et al. Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern. 2015;45(4):767–79.
Yu J, Rui Y, Tao D. Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process. 2014;23(5):2019–32.
Cheng XM, Yang QH, Zhai YP, et al. Test case selection technique base on semi-supervised clustering method. Comput Sci. 2018;45(1):249–54.
Yu J, Yang X, Gao F. Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern. 2016;1–11.
Yu Z, Yu J, Xiang C, et al. Beyond bilinear: generalized multi-modal factorized high-order pooling for visual question answering. IEEE Trans Neural Netw Learn Syst. 2018;(99):1–13.
Yu J, Kuang Z, Zhang B, et al. Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Trans Inf Forensics Secur. 2018;13(5):1317–32.
Yu J, Zhu C, Zhang J, et al. Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst. 2019;(99):1–14.
Yu J, Hong C, Rui Y, et al. Multi-task autoencoder model for recovering human poses. IEEE Trans Indust Electron. 2018;(99):1–1.
Hong C, Yu J, Tao D, et al. Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron. 2015;62(6):3742–51.
Hong C, Yu J, Wan J, et al. Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process. 2015;24(12):5659–70.
Mukkamala S, Sung AH. Feature ranking and selection for intrusion detection systems using support vector machines. Proceed the Second Digital Forensic Research Workshop. 2002;4(3):72.
Zhang H, Lu J. Creating ensembles of classifiers via fuzzy clustering and deflection. Fuzzy Set Sys. 2010;161(13):1790–802.
Depren O, Topallar M, Anarim E, et al. An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks. Expert Syst Appl. 2005;29(4):713–22.
Fiore U, Palmieri F, Castiglione A, et al. Network anomaly detection with the restricted Boltzmann machine. Neuro Comput. 2013;122:13–23.
Liang C, Li CH. Novel intrusion detection method based on semi-supervised clustering. Comput Sci. 2016;43(5):87–90.
Peng TL, Zhang WJ, Lan JL, et al. Micro video annotation method based on semi-supervised clustering. Appl Res Comput. 2016;33(3):948–52.
Zhong S. Semi-supervised model-based document clustering: a comparative study. Mach Learn. 2006;65(1):3–29.
Funding
This work is supported by the National Natural Science Foundation of China under Grant Nos .61672522 and No.61379101.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Informed Consent
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.
Human and Animal Rights
This article does not contain any studies with human or animal subjects performed by any of the authors.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qin, Y., Ding, S., Wang, L. et al. Research Progress on Semi-Supervised Clustering. Cogn Comput 11, 599–612 (2019). https://doi.org/10.1007/s12559-019-09664-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-019-09664-w