Skip to main content

A New PU Learning Algorithm for Text Classification

  • Conference paper
MICAI 2005: Advances in Artificial Intelligence (MICAI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3789))

Included in the following conference series:

Abstract

This paper studies the problem of building text classifiers using positive and unlabeled examples. The primary challenge of this problem as compared with classical text classification problem is that no labeled negative documents are available in the training example set. We call this problem PU-Oriented text Classification. Our text classifier adopts traditional two-step approach by making use of both positive and unlabeled examples. In the first step, we improved the 1-DNF algorithm by identifying much more reliable negative documents with very low error rate. In the second step, we build a set of classifiers by iteratively applying SVM algorithm on training data set, which is augmented during iteration. Different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Experimental results on the Reuter data set show that our method increases the performance (F1-measure) of classifier by 1.734 percent compared with PEBL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yang, Y., Pedersen, J.P.: Feature selection in statistical learning of text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  2. Han, E.S., Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification, Computer Science Technical Report TR99-019 (1999)

    Google Scholar 

  3. Levis, D., Ringuette, M.: A comparison of two learning algorithms for text classification. In: Third Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)

    Google Scholar 

  4. Cortes, C., Vapnik, V.: Support vector networks. Machine learning 20, 273–297 (1995)

    MATH  Google Scholar 

  5. Cohen, W.J., Singer, Y.: Context-sensitive learning methods for text categorization. In: SIGIR 1996: Proc. 19th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 307–315 (1996)

    Google Scholar 

  6. Weiss, S.M., Apte, C., Damerau, F.J.: Maximizing Text-Mining Performance. IEEE Intelligent Systems, 2–8 (1999)

    Google Scholar 

  7. Wiener, E., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Processing of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR), pp. 22–34 (1995)

    Google Scholar 

  8. Denis, F.: PAC learning from positive statistical queries. In: Workshop on Algorithmic Learning Theory, ALT (1998)

    Google Scholar 

  9. Letouzey, F., Denis, F., Gilleron, R.: Learning from positive and unlabeled examples. In: Workshop on Algorithmic Learning Theory, ALT (2000)

    Google Scholar 

  10. DeComite, F., Denis, F., Gilleron, R.: Positive and unlabeled examples help learning. In: Workshop on Algorithmic Learning Theory, ALT (1999)

    Google Scholar 

  11. Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: The Nineteenth International Conference on Machine Learning (ICML 2002), pp. 384–397 (2002)

    Google Scholar 

  12. Li, X., Liu, B.: Learning to classify text using positive and unlabeled data. In: The International Joint Conference on Artifical Intelligence, IJCAI (2003)

    Google Scholar 

  13. Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building Text Classifiers Us-ing Positive and Unlabeled Examples. In: Proceedings of the Third IEEE International Con-ference on Data Mining, ICDM, pp. 179–187 (2003)

    Google Scholar 

  14. Yu, H., Han, J., Chang, K.C.-C.: PEBL: Positive example based learn-ing for Web page classification using SVM. In: The international conference on Knowledge Discovery and Data mining, KDD (2002)

    Google Scholar 

  15. Denis, F., Gilleron, R., Tommasi, M.: Text classification from positive and unlabeled examples. In: Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU (2002)

    Google Scholar 

  16. Manevitz, L.M., Yousef, M.: One-Class SVMs for document classification. Journal of Machine Learning Research 2, 139–154 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yu, H., Zuo, W., Peng, T. (2005). A New PU Learning Algorithm for Text Classification. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_84

Download citation

  • DOI: https://doi.org/10.1007/11579427_84

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29896-0

  • Online ISBN: 978-3-540-31653-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics