Skip to main content
Log in

Research Progress on Semi-Supervised Clustering

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Semi-supervised clustering is a new learning method which combines semi-supervised learning (SSL) and cluster analysis. It is widely valued and applied to machine learning. Traditional unsupervised clustering algorithm based on data partition does not need any property; however, there are a small amount of independent class labels or pair constraint information data samples in practice; in order to obtain better clustering results, scholars have proposed a semi-supervised clustering. Compared with traditional clustering methods, it can effectively improve clustering performance through a small number of supervised information, and it has been used widely in machine learning. Firstly, this paper introduces the research status and classification of semi-supervised learning and compares the four classification methods as follows: decentralized model, support vector machine, graph, and collaborative training. Secondly, the semi-supervised clustering is described in detail, the current status of semi-supervised clustering is analyzed, and the Cop-kmeans algorithm, Lcop-kmeans algorithm, Seeded-kmeans algorithm, SC-kmeans algorithm, and other algorithms are introduced. The introduction of several semi-supervised clustering methods in this paper can show the advantages of semi-supervised clustering over traditional clustering, and the related literature in recent years is summarized. This paper summarized the latest development of semi-supervised learning and semi-supervised clustering and discussed the application of semi-supervised clustering and the future research direction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Hartigan JA, Wong MA. Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat. 1979;28(1):100–8.

    Google Scholar 

  2. Maddah M, Crimson WEL, Warfield SK. Statistical modeling and EM clustering of white matter fiber tracts. IEEE International Symposium on Biomedical Imaging: Nano To Macro. IEEE; 2006. p. 53–56.

  3. Li KL, Cao Z, Cao LP, et al. Some developments on semi-supervised clustering. Int J Pattern Recognit Artif Intell. 2009;22(5):735–42.

    Google Scholar 

  4. Chen WJ. Semi-supervised learning study summary. Comput Knowl Technol. 2011;07(16):3887–9.

    Google Scholar 

  5. Liu JW, Liu Y, Luo XL. Semi-supervised learning methods. Chin J Comput. 2015;38(08):1592–617.

    Google Scholar 

  6. Scudder HI. Probability of error of some adaptive pattern-recognition machines. IEEE Trans Inf Theory. 1965;11(3):363–71.

    Google Scholar 

  7. Fralick S. Learning to recognize patterns without a teacher. IEEE Trans Inf Theory. 2003;13(1):57–64.

    Google Scholar 

  8. Agrawala A. Learning with a probabilistic teacher. IEEE Trans Inf Theory. 1970;16(4):373–9.

    Google Scholar 

  9. Merz CJ, St. Clair DC, Bond WE. Semi-supervised adaptive resonance theory (SMART2). Int Jt Conf Neural Netw IEEE. 1992;3:851–6.

    Google Scholar 

  10. Shahshahani BM, Landgrebe D. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans Geosci Remote Sens. 1994;32(5):1087–95.

    Google Scholar 

  11. Wang J, Jebara T, Chang SF. Semi-supervised learning using greedy max-cut. J Mach Learn Res. 2013;14(1):771–800.

    Google Scholar 

  12. Klein D, Kamvar SD, Manning CD. From instance-level constraints to space-level constraints: ,making the most of prior knowledge in data clustering. The Nineteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. 2002. p. 307–314.

  13. Cheng S, Shi Y, Qin Q. Particle swarm optimization based semi-supervised learning on Chinese text categorization. IEEE Congress on Evolutionary Computation Cec; 2012. p. 1–8.

  14. Wang J, Kumar S, Chang SF. Semi-supervised hashing for scalable image retrieval. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, Ca, Usa, 13–18 June. DBLP, 2010:3424–3431.

  15. Kingma DP, Rezende DJ, Mohamed S. Semi-supervised learning with deep generative models. Adv Neural Inf Proces Syst. 2014;4:3581–9.

    Google Scholar 

  16. Zhang J, Yu J, Tao D. Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process. 2018:1–10.

  17. Zhang D, Zhou ZH, Chen S. Semi-supervised dimensionality reduction. Siam International Conference on Data Mining, April 26-28, 2007, Minneapolis, Minnesota, USA. DBLP; 2007. p. 11–393.

  18. Zhou ZH, Li M. Semi-supervised regression with co-training. International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc.; 2005. p. 908–913.

  19. Mehrkanoon S, Alzate C, Mall R, et al. Multi-class semi-supervised learning based upon kernel spectral clustering. IEEE Trans Neural Netw Learn Syst. 2015;26(4):720–33.

    PubMed  Google Scholar 

  20. Callut J, Francoisse K, Saerens M, et al. Semi-supervised classification from discriminative random walk. Lect Notes Comput Sci. 2008;5211:162–77.

    Google Scholar 

  21. Zhou ZH. Machine learning. Tsinghua University Press; 2016.

  22. Castelli V, Cover TM. On the exponential value of labeled samples. Elsevier Science Inc.; 1995.

  23. Cozman FG, Cohen I. Unlabeled data can degrade classification performance of generative classifiers. Fifteenth International Florida Artificial Intelligence Society Conference. 2009. p. 327–331.

  24. Baudat G, Anouar F. Generalized discriminant analysis using a kernel approach. Neural Comput. 2000;12(10):2385–404.

    CAS  PubMed  Google Scholar 

  25. Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Read Speech Recognit. 1990;77(2):267–96.

    Google Scholar 

  26. Vapnik V, Sterin A. On structural risk minimization or overall risk in a problem of pattern recognition. Autom Remote Control. 1977;10(10):1495–503.

    Google Scholar 

  27. Zhang M, Pang L. Review of domestic application research of big data mining technology-SVM in credit risk evaluation. 3rd International Seminar on Education Innovation and Economic Management, Penang, Malaysia, 2018. p. 286.

  28. Ding SF, Zhu ZB, Zhang XK. An overview on semi-supervised support vector machine. Neural Comput Applic. 2017;28(5):969–78.

    Google Scholar 

  29. Zhang H, Cao L, Gao S. A locality correlation preserving support vector machine. Pattern Recogn. 2014;47(9):3168–78.

    Google Scholar 

  30. Tao XM, Li Q, Guo WJ. Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification. Inf Sci. 2019:487.

  31. Tang T, Chen S, Zhao M. Very large-scale data classification based on K-means clustering and multi-kernel SVM. Soft Comput. 2018;1:3793–801.

    Google Scholar 

  32. Bruzzone L, Chi M, Marconcini M. A novel transductive SVM for semi-supervised classification of remote-sensing images. IEEE Trans Geosci Remote Sens. 2006;44(11):3363–73.

    Google Scholar 

  33. Yu LI, Feng A, Zou SR. TSVM learning algorithm based on improved K-nearest neighbor. Comput Modern. 2018:22–5.

  34. Chapelle O, Vapnik V, Bousquet O, et al. Choosing multiple parameters for support vector machines. Mach Learn. 2002;46(1–3):131–59.

    Google Scholar 

  35. Blum A, Chawla S. Learning from labeled and unlabeled data using Graph Mincuts. Eighteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc.; 2001. p. 19–26.

  36. Szeliski R, Zabih R, Ssharstein D, et al. A comparative study of energy minimization methods for Markov random fields. European Conference on Computer Vision. Berlin: Springer; 2006. p. 16–29.

    Google Scholar 

  37. Zhu X, Lafferty J. Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. Int Conf DBLP. 2005:1052–9.

  38. Zhou D, Scholkopf B. Learning from labeled and unlabeled data using random walks. Berlin Heidelberg: Springer; 2004.

    Google Scholar 

  39. Belkin M, Niyoge P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7(1):2399–434.

    Google Scholar 

  40. Goldberg AB, Li M, Zhu X. Online manifold regularization: a new learning setting and empirical study. European Conference on Machine Learning and Knowledge Discovery in Databases. Verlag: Springer; 2008. p. 393–407.

    Google Scholar 

  41. Balcan MF, Blum A, Choi PP, et al. Person identification in webcam images: an application of semi-supervised learning. International Conference on Machine Learning; 2005.

  42. Blum A. Combining labeled and unlabeled data with co-training. Conf Comput Learn Theor 1998;92–100.

  43. Coldman SA, Zhou Y. Enhancing supervised learning with unlabeled data. 2000. p. 327–334.

  44. Wagstaff K, Cardie C, Rogers S, et al. Constrained K-means clustering with background knowledge. Proceedings of 18th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc;2001. p. 577–584.

  45. Yang Y, Tan W, Li T, et al. Consensus clustering based on constrained self-organizing map and improved Cop-Kmeans ensemble in intelligent decision support systems. Knowl-Based Syst. 2012;32(32):101–15.

    Google Scholar 

  46. Chen ZY, Wang MJ, Hu M, et al. An active semi-supervised clustering algorithm based on seed set and pairwise constraints. J Jilin Univ (Sci Ed). 2017;55(3):664–72.

    Google Scholar 

  47. Davidson I, Ravi S. Clustering with constraints: feasibility issues and the k-means algorithm. SDM. 2005;16(95):1147–57.

    Google Scholar 

  48. Dan P, Baras D. K-means with large and noisy constraint sets. Mach Learn ECML. 2007;2008:674–82.

    Google Scholar 

  49. Wagstaff K, Cardie C. Clustering with instance-level constraints. 17th International Conference on Machine Learning; 2000. p. 1097–1103.

  50. Basu S, Banerjee A, Mooney R. Semi-Supervised Clustering by Seeding. 19th International Conference on Machine Learning; 2002. p. 19–26.

  51. Zheng L, Li T. Semi-supervised hierarchical clustering. 11th International Conference on Data Mining; 2011. p. 982–991.

  52. He P, Xu X, Lu L. Semi-supervised clustering via two-level random walk. J Softw. 2014;25(5):997–1013.

    Google Scholar 

  53. Wang L, Bo LF, Jiao LC. Density-sensitive semi-supervised spectral clustering. J Softw. 2007;18(10):2412–22.

    Google Scholar 

  54. Shi X, Fan W, Yu P. Efficient semi-supervised spectral co-clustering with constraints. International Conference on Data Mining, 2010.

  55. Tang Q, Liao ZG. A semi-supervised clustering method based on affinity propagation algorithm. Electron Inf Warfare Technol. 2017;32(1):8–12.

    Google Scholar 

  56. Yang Y, Rutayisire T, Lin C, et al. An improved cop-Kmeans clustering for solving constraint violation based on map reduce framework. Fundam Inf. 2013;126(4):301–18.

    Google Scholar 

  57. Sun Y, Xin L, Cheng W. A modified k-means algorithm for clustering problem with balancing constraint. Third International Conference on Measuring Technology and Mechatronics Automation. IEEE; 2011. p. 127–130.

  58. Yin SS, Hu SL, Chen SC. Discriminative semi-supervised clustering analysis with pairwise constraint. J Softw. 2008;19(11):2791–802.

    Google Scholar 

  59. Wei S, Li Z, Zhang C. Combined constraint-based with metric-based in semi-supervised clustering ensemble. Int J Mach Learn Cybern. 2018;9(7):1085–100.

    Google Scholar 

  60. Li CM, Xu SB, Hao ZF. Cross-entropy semi-supervised clustering based on pairwise constraints. Pattern Recogn Artif Intell. 2017;30(7):598–608.

    Google Scholar 

  61. Ding S, Xu X, Fan SY, Xue Y. Locally adaptive multiple kernel k-means based on shared nearest neighbors. Soft Comput. 2018;22(14):4573–83.

    Google Scholar 

  62. Chai BF, Lu F, Li WB. Semi-supervised Kmeans clustering algorithm based on active learning priors. Comput Appl. 2018;38(11):93–7.

    Google Scholar 

  63. Basu S, Bilenko M, Mooney RJ. A probabilistic framework for semi-supervised clustering. 2004;59–68.

  64. Ding S, Jia H, Du M, et al. A semi-supervised approximate spectral clustering algorithm based on HMRF model. Inf Sci. 2018;429:215–28.

    Google Scholar 

  65. Saha S, Bandyopadhyay S. Semi-GAPS: a semi-supervised clustering method using point symmetry. IOS Press; 2009.

  66. Si WW, Qian YT. Semi-supervised clustering based on spectral cluster. Comput Appl. 2005;25(6):1347–9.

    Google Scholar 

  67. Bilenko M, Basu S, Mooney R J. Integrating constraints and metric learning in semi-supervised clustering. International Conference. DBLP, Banff, Alberta, Canada, 2004;11.

  68. Alok AK, Saha S, Ekbal A. Feature selection and semi-supervised clustering using multi-objective optimization. Springer Plus. 2014;3(1):1–12.

    Google Scholar 

  69. Gui J, Wang SL, Lei YK. Multi-step dimensionality reduction and semi-supervised graph-based tumor classification using gene expression data. Artif Intell Med. 2010;50(3):181–91.

    PubMed  Google Scholar 

  70. Saha S, Kaushik K, Alok AK, et al. Multi-objective semi-supervised clustering of tissue samples for cancer diagnosis. Soft Comput. 2016;20(9):3381–92.

    Google Scholar 

  71. Yu J, Tao D, Li J, et al. Semantic preserving distance metric learning and applications. Inf Sci. 2014;281:674–86.

    Google Scholar 

  72. Shiga M, Mamitsuka H. Efficient semi-supervised learning on locally informative multiple graphs. Pattern Recogn. 2012;45(3):1035–49.

    Google Scholar 

  73. Chen HS. Semi-supervised clustering ensemble for bio-molecular pattern mining. South China University of Technology; 2016.

  74. Orozco-Duque A, Bustamante J, Castellanos-Dominguez G. Semi-supervised clustering of fractionated electrograms for electroanatomical atrial mapping. Biomed Eng Online. 2016;15(1):44.

    PubMed  PubMed Central  Google Scholar 

  75. Gan H, Fan Y, Luo Z. Local homogeneous consistent safe semi-supervised clustering. Expert Syst Appl. 2017;97:384–93.

    Google Scholar 

  76. Syed FH, Tahir MA. Safe semi supervised multi-target regression (MTR-SAFER) for new targets learning. Multimed Tools Appl. 2018;77:29971–87.

    Google Scholar 

  77. Wang Y, Chen J. Safe semi-supervised collaborative filtering recommendation algorithm. Comput Eng Appl. 2018;54(8):107–11.

    Google Scholar 

  78. Lu Z, Ip HHS. Combining context, consistency, and diversity cues for interactive image categorization. IEEE Trans Multimed. 2010;12(3):194–203.

    Google Scholar 

  79. Portela NM, Cavalcanti GDC, Ren TI. Semi-supervised clustering for MR brain image segmentation. Expert Syst Appl. 2014;41(4):1492–7.

    Google Scholar 

  80. Hasnat MA, Alata O, Tremeau A. Joint color-spatial-directional clustering and region merging (JCSD-RM) for unsupervised RGB-D image segmentation. IEEE Trans Pattern Anal Mach Intell. 2016;1–1.

  81. An QQ, Zhang F, Li ZX. Research on image segmentation based on machine learning. Automation & Instrumentation. 2018;6:29–31.

  82. Li YW. Research on robust segmentation algorithm based on semi-supervised fuzzy clustering. Xi’an: Xi’an University of Posts & Telecommunications; 2018.

    Google Scholar 

  83. Yu J, Tao D, Wang M, et al. Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern. 2015;45(4):767–79.

    PubMed  Google Scholar 

  84. Yu J, Rui Y, Tao D. Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process. 2014;23(5):2019–32.

    PubMed  Google Scholar 

  85. Cheng XM, Yang QH, Zhai YP, et al. Test case selection technique base on semi-supervised clustering method. Comput Sci. 2018;45(1):249–54.

    Google Scholar 

  86. Yu J, Yang X, Gao F. Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern. 2016;1–11.

  87. Yu Z, Yu J, Xiang C, et al. Beyond bilinear: generalized multi-modal factorized high-order pooling for visual question answering. IEEE Trans Neural Netw Learn Syst. 2018;(99):1–13.

  88. Yu J, Kuang Z, Zhang B, et al. Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Trans Inf Forensics Secur. 2018;13(5):1317–32.

    Google Scholar 

  89. Yu J, Zhu C, Zhang J, et al. Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst. 2019;(99):1–14.

  90. Yu J, Hong C, Rui Y, et al. Multi-task autoencoder model for recovering human poses. IEEE Trans Indust Electron. 2018;(99):1–1.

  91. Hong C, Yu J, Tao D, et al. Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron. 2015;62(6):3742–51.

    Google Scholar 

  92. Hong C, Yu J, Wan J, et al. Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process. 2015;24(12):5659–70.

    PubMed  Google Scholar 

  93. Mukkamala S, Sung AH. Feature ranking and selection for intrusion detection systems using support vector machines. Proceed the Second Digital Forensic Research Workshop. 2002;4(3):72.

    Google Scholar 

  94. Zhang H, Lu J. Creating ensembles of classifiers via fuzzy clustering and deflection. Fuzzy Set Sys. 2010;161(13):1790–802.

    Google Scholar 

  95. Depren O, Topallar M, Anarim E, et al. An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks. Expert Syst Appl. 2005;29(4):713–22.

    Google Scholar 

  96. Fiore U, Palmieri F, Castiglione A, et al. Network anomaly detection with the restricted Boltzmann machine. Neuro Comput. 2013;122:13–23.

    Google Scholar 

  97. Liang C, Li CH. Novel intrusion detection method based on semi-supervised clustering. Comput Sci. 2016;43(5):87–90.

    Google Scholar 

  98. Peng TL, Zhang WJ, Lan JL, et al. Micro video annotation method based on semi-supervised clustering. Appl Res Comput. 2016;33(3):948–52.

    Google Scholar 

  99. Zhong S. Semi-supervised model-based document clustering: a comparative study. Mach Learn. 2006;65(1):3–29.

    Google Scholar 

Download references

Funding

This work is supported by the National Natural Science Foundation of China under Grant Nos .61672522 and No.61379101.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shifei Ding.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, Y., Ding, S., Wang, L. et al. Research Progress on Semi-Supervised Clustering. Cogn Comput 11, 599–612 (2019). https://doi.org/10.1007/s12559-019-09664-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-019-09664-w

Keywords

Navigation