Abstract
Semi-supervised text classification has numerous applications and is particularly applicable to the problems where large quantities of unlabeled data are readily available while only a small number of labeled training samples are accessible. The paper proposes a semi-supervised classifier that integrates a clustering based Expectation Maximization (EM) algorithm into radial basis function (RBF) neural networks and can learn for classification from a very small number of labeled training samples and a large pool of unlabeled data effectively. A generalized centroid clustering algorithm is also investigated in this work to balance predictive values between labeled and unlabeled training data and to improve classification accuracy. Experimental results with three popular text classification corpora show that the proper use of additional unlabeled data in this semi-supervised approach can reduce classification errors by up to 26%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with Co-Training. In: 11th COLT conference, pp. 92–100 (1998)
Cardoso-Cachopo, A., Oliveira, A.: Semi-supervised Single-label Text Categorization Using Centroid-based Classifiers. In: ACM Symposium on Applied Computing, pp. 844–851 (2007)
Cohen, F., Sebastiani, F.: An analysis of the relative hardness of reuters-21578 subsets. J. American Society for information Science and Technology 56(6), 584–596 (2004)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society, Series B 39, 1–38 (1977)
Jiang, E.: Detecting spam email by radial basis function networks. International J. Knowledge based and Intelligent Engineering Systems 11, 409–418 (2007)
Joachims, T.: Transductive inference for text classification using support vector machines. In: 16th ICML conference, pp. 200–209 (1999)
Nigam, K., McCallum, A., Thurn, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Seeger, M.: Learning with labeled and unlabeled data. Technical report, Edinburgh University (2001)
Yang, Y., Pederson, J.O.: A Comparative Study on Feature Selection in Text Classification. In: 14th International Conference on Machine Learning, pp. 412–420 (1997)
Zeng, H., Wang, X., Chen, Z., Lu, H., Ma, W.: CBC-clustering based text classification requiring minimal labeled data. In: 3rd International Conference on Data Mining, pp. 443–450 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, E.P. (2009). Semi-supervised Text Classification Using RBF Networks. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, JF. (eds) Advances in Intelligent Data Analysis VIII. IDA 2009. Lecture Notes in Computer Science, vol 5772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03915-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-03915-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03914-0
Online ISBN: 978-3-642-03915-7
eBook Packages: Computer ScienceComputer Science (R0)