Abstract
Semi-supervised classification methods can perform even worse than the supervised counterparts in some cases. It undoubtedly reduces their confidence in real applications, and it is desired to improve the safety of semi-supervised classification such that it never performs worse than the supervised counterpart. Considering that the cluster assumption may not well reflect the real data distribution, which can be one possible cause of unsafe learning, we develop a safe semi-supervised support vector machine method in this paper by adjusting the cluster assumption (ACA-S3VM for short). Specifically, when samples from different classes are seriously overlapped, the real boundary actually lies not in the low density region, which will not be found by the cluster assumption. However, an unsupervised clustering method is able to detect the real boundary in this case. As a result, we design ACA-S3VM by adjusting the cluster assumption with the help of clustering, which considers the distances of individual unlabeled instances to the distribution boundary in learning. Empirical results show the competition of ACA-S3VM compared with the off-the-shelf safe semi-supervised classification methods.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Zhou Z-H, Li M (2010) Semi-supervised learning by disagreement. Knowl Inf Syst 24(3):415–439
Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Morgan & Claypool, San Rafael
Zhu X (2008) Semi-supervised learning literature survey. University of Wisconsin-Madison, Computer Sciences, Madison
Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge
Gong C et al (2015) Scalable semi-supervised classification via Neumann series. Neural Process Lett 42(1):187–197
Zhao Z-Q et al (2010) A modified semi-supervised learning algorithm on Laplacian eigenmaps. Neural Process Lett 32(1):75–82
Mallapragada PK et al (2009) Semiboost: boosting for semi-supervised learning. IEEE Trans Pattern Anal Mach Intell 31(11):2000–2014
Fung G, Mangasarian OL (2001) Semi-supervised support vector machine for unlabeled data classification. Opt Methods Softw 15(1):99–105
Collobert R et al (2006) Large scale transductive SVMs. J Mach Learn Res 7:1687–1712
Li Y-F, Kwok JT, Zhou Z-H (2009) Semi-supervised learning using label mean. In: Proceedings of the 26th international conference on machine learning. Montreal, Canada
Bengio Y, Alleau OB, Le Roux N (2006) Label propagation andquadratic criterion. In: Chapelle O, Schölkopf B, Zien A (eds) Semi-supervised learning. MIT Press, Cambridge, pp 193–216
Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Carnegie Mellon University, Pittsburgh
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(11):2399–2434
Li Y-F, Zhou Z-H (2011) Improving semi-supervised support vector machines through unlabeled instances selection. In: Proceedings of the 25th AAAI conference on artificial intelligence (AAAI’11). San Francisco, CA
Li Y-F, Zhou Z-H (2011) Towards making unlabeled data never hurt. In: Proceedings of the 28th international conference on machine learning (ICML’11). Bellevue, WA
Wang Y, Chen S (2013) Safety-aware semi-supervised classification. IEEE Trans Neural Netw Learn Syst 24(11):1763–1772
Li Y-F, Zhou Z-H (2015) Towards making unlabeled data never hurt. IEEE Trans Pattern Anal Mach Intell 37(1):175–188
Wang Y, Chen S, Zhou Z-H (2012) New semi-supervised classification method based on modified cluster assumption. IEEE Trans Neural Netw Learn Syst 23(5):689–702
Soares RGF, Chen H, Yao X (2012) Semi-supervised classification with cluster regularisation. IEEE Trans Neural Netw Learn Syst 23(11):1779–1792
Gu B, Sheng VS (2016) A robust regularization path algorithm for \(\nu \)-support vector classification. IEEE Trans Neural Netw Learn Syst 1:1–8
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the 16th international conference on machine learning. Bled, Slovenia
Gorski J, Pfeuffer F (2007) Biconvex sets and optimization with biconvex functions: a survey and extensions. Math Methods Oper Res 66(3):373–407
Anguita D et al (2014) Unlabeled patterns to tighten Rademacher complexity error bounds for kernel classifiers. Pattern Recognit Lett 37:210–219
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant Nos. 61300165, 61375057 and 61300164, the Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant No. 20133223120009, the Introduction of Talent Research Foundation of Nanjing University of Posts and Telecommunications under Grant Nos. NY213033 and NY213031.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, Y., Meng, Y., Fu, Z. et al. Towards Safe Semi-supervised Classification: Adjusted Cluster Assumption via Clustering. Neural Process Lett 46, 1031–1042 (2017). https://doi.org/10.1007/s11063-017-9607-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-017-9607-5