Abstract
We present a possibly great improvement while performing semi-supervised learning tasks from training data sets when only a small fraction of the data pairs is labeled. In particular, we propose a novel decision strategy based on normalized model outputs. The paper compares performances of two popular semi-supervised approaches (Consistency Method and Harmonic Gaussian Model) on the unbalanced and balanced labeled data by using normalization of the models’ outputs and without it. Experiments on text categorization problems suggest significant improvements in classification performances for models that use normalized outputs as a basis for final decision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Huang, T.M., Kecman, V.: SemiL, Software for solving semi-supervised learning problems, Auckland (2004) [downloadable from, http://www.support-vector.ws/html/semil.html or from, http://www.engineers.auckland.ac.nz/~vkec001 ]
Kecman, V.: Learning and Soft Computing, Support Vector Machines, Neural Networks and Fuzzy Logic Systems. The MIT Press, Cambridge (2001)
Ng, A.Y., Jordan, M., Weiss, Y.: On Spectral Clustering: Analysis and an Algorithm. In: Dietterich, T.G., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14. MIT Press, Cambridge (2002)
Park, C.: Personal Communication, Tübingen (2004)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with Local and Global Consistency. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, pp. 321–328. MIT Press, Cambridge (2004)
Zhu, X.-J., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), Washington DC (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, T.M., Kecman, V. (2004). Semi-supervised Learning from Unbalanced Labeled Data – An Improvement. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2004. Lecture Notes in Computer Science(), vol 3215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30134-9_107
Download citation
DOI: https://doi.org/10.1007/978-3-540-30134-9_107
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23205-6
Online ISBN: 978-3-540-30134-9
eBook Packages: Springer Book Archive