Abstract
This paper presents a comparison of the methods of selecting a small amount useful unlabeled data to improve the classification accuracy of semi-supervised learning (SSL) algorithms. In particular, three selection approaches, namely, the simply adjusted approach based on an uncertainty level, the normalized-and-adjusted approach, and the entropy based adjusted approach, are considered and compared empirically. The experimental results, which are obtained from synthetic and real-life benchmark data using semi-supervised support vector machines (S3VMs), demonstrate that the entropy based approach works slightly better than the other ones in terms of the classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, CA (2013)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. of the 11th Ann. Conf. Computational Learning Theory (COLT 98), Madison, WI, pp. 92–100 (1998)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intelligent Systems and Technology 2(3), 1–27 (2011). http://www.csie.ntu.edu.tw/\(\sim \)cjlin/libsvm
Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. The MIT Press, MA (2006)
d’Alché-Buc, F., Grandvalet, Y., Ambroise, C.: Semi-supervised marginboost. Advances in Neural Information Processing Systems (NIPS), pp. 553–560. The MIT Press, London (2002)
Dagan, I., Engelson, S. P.: Committee-based sampling for training probabilistic classifiers. In: Proc. of the 12th Int’l Conf. on Machine Learning (ICML 1995), pp. 150–157. Morgan Kaufmann, Tahoe City, CA (1995)
Le, T.-B., Kim, S.-W.: On incrementally using a small portion of strong unlabeled data for semi-supervised learning algorithms. Pattern Recognition Letters 41, 53–64 (2014)
Le, T. -B., Kim, S. -W.: On selecting helpful unlabeled data for improving semi-supervised support vector machines. In: Proc. of the 3rd Int’l Conf. on Pattern Recognition Applications and Methods (ICPRAM 2014), Angers, France, pp. 48–59 (2014)
Mallapragada, P.K., Jain, A.K., Liu, Y.: SemiBoost: boosting for semi-supervised learning. IEEE Trans. Pattern Anal. and Machine Intell. 31(11), 2000–2014 (2009)
Platt, J.C.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers. The MIT Press, Cambridge (2000)
Reitmaier, T., Sick, B.: Let us know your decision: Pool-based active training of a generative classifier with the selection strategy 4DS. Information Sciences 230, 106–131 (2013)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proc. of the 33rd annual meeting on Association for Computational Linguistics (ACL1995), Cambridge, MA, 189–196 (1995)
Zhu, X.: Semi-Supervised Learning Literature Survey. Technical Report 1530, Dept. of Computer Sciences, University of Wisconsin at Madison, MA (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Le, TB., Kim, SW. (2015). Comparison of Adjusted Methods for Selecting Useful Unlabeled Data for Semi-Supervised Learning Algorithms. In: Ali, M., Kwon, Y., Lee, CH., Kim, J., Kim, Y. (eds) Current Approaches in Applied Artificial Intelligence. IEA/AIE 2015. Lecture Notes in Computer Science(), vol 9101. Springer, Cham. https://doi.org/10.1007/978-3-319-19066-2_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-19066-2_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19065-5
Online ISBN: 978-3-319-19066-2
eBook Packages: Computer ScienceComputer Science (R0)