Skip to main content

Semi-supervised Text Classification Using RBF Networks

  • Conference paper
Advances in Intelligent Data Analysis VIII (IDA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5772))

Included in the following conference series:

Abstract

Semi-supervised text classification has numerous applications and is particularly applicable to the problems where large quantities of unlabeled data are readily available while only a small number of labeled training samples are accessible. The paper proposes a semi-supervised classifier that integrates a clustering based Expectation Maximization (EM) algorithm into radial basis function (RBF) neural networks and can learn for classification from a very small number of labeled training samples and a large pool of unlabeled data effectively. A generalized centroid clustering algorithm is also investigated in this work to balance predictive values between labeled and unlabeled training data and to improve classification accuracy. Experimental results with three popular text classification corpora show that the proper use of additional unlabeled data in this semi-supervised approach can reduce classification errors by up to 26%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

    MATH  Google Scholar 

  2. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with Co-Training. In: 11th COLT conference, pp. 92–100 (1998)

    Google Scholar 

  3. Cardoso-Cachopo, A., Oliveira, A.: Semi-supervised Single-label Text Categorization Using Centroid-based Classifiers. In: ACM Symposium on Applied Computing, pp. 844–851 (2007)

    Google Scholar 

  4. Cohen, F., Sebastiani, F.: An analysis of the relative hardness of reuters-21578 subsets. J. American Society for information Science and Technology 56(6), 584–596 (2004)

    Google Scholar 

  5. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society, Series B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  6. Jiang, E.: Detecting spam email by radial basis function networks. International J. Knowledge based and Intelligent Engineering Systems 11, 409–418 (2007)

    Article  Google Scholar 

  7. Joachims, T.: Transductive inference for text classification using support vector machines. In: 16th ICML conference, pp. 200–209 (1999)

    Google Scholar 

  8. Nigam, K., McCallum, A., Thurn, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)

    Article  MATH  Google Scholar 

  9. Seeger, M.: Learning with labeled and unlabeled data. Technical report, Edinburgh University (2001)

    Google Scholar 

  10. Yang, Y., Pederson, J.O.: A Comparative Study on Feature Selection in Text Classification. In: 14th International Conference on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  11. Zeng, H., Wang, X., Chen, Z., Lu, H., Ma, W.: CBC-clustering based text classification requiring minimal labeled data. In: 3rd International Conference on Data Mining, pp. 443–450 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, E.P. (2009). Semi-supervised Text Classification Using RBF Networks. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, JF. (eds) Advances in Intelligent Data Analysis VIII. IDA 2009. Lecture Notes in Computer Science, vol 5772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03915-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03915-7_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03914-0

  • Online ISBN: 978-3-642-03915-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics