A New PU Learning Algorithm for Text Classification

Yu, Hailong; Zuo, Wanli; Peng, Tao

doi:10.1007/11579427_84

Hailong Yu²¹,
Wanli Zuo²¹ &
Tao Peng²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3789))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1162 Accesses
2 Citations

Abstract

This paper studies the problem of building text classifiers using positive and unlabeled examples. The primary challenge of this problem as compared with classical text classification problem is that no labeled negative documents are available in the training example set. We call this problem PU-Oriented text Classification. Our text classifier adopts traditional two-step approach by making use of both positive and unlabeled examples. In the first step, we improved the 1-DNF algorithm by identifying much more reliable negative documents with very low error rate. In the second step, we build a set of classifiers by iteratively applying SVM algorithm on training data set, which is augmented during iteration. Different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Experimental results on the Reuter data set show that our method increases the performance (F1-measure) of classifier by 1.734 percent compared with PEBL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yang, Y., Pedersen, J.P.: Feature selection in statistical learning of text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)
Google Scholar
Han, E.S., Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification, Computer Science Technical Report TR99-019 (1999)
Google Scholar
Levis, D., Ringuette, M.: A comparison of two learning algorithms for text classification. In: Third Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)
Google Scholar
Cortes, C., Vapnik, V.: Support vector networks. Machine learning 20, 273–297 (1995)
MATH Google Scholar
Cohen, W.J., Singer, Y.: Context-sensitive learning methods for text categorization. In: SIGIR 1996: Proc. 19th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 307–315 (1996)
Google Scholar
Weiss, S.M., Apte, C., Damerau, F.J.: Maximizing Text-Mining Performance. IEEE Intelligent Systems, 2–8 (1999)
Google Scholar
Wiener, E., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Processing of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR), pp. 22–34 (1995)
Google Scholar
Denis, F.: PAC learning from positive statistical queries. In: Workshop on Algorithmic Learning Theory, ALT (1998)
Google Scholar
Letouzey, F., Denis, F., Gilleron, R.: Learning from positive and unlabeled examples. In: Workshop on Algorithmic Learning Theory, ALT (2000)
Google Scholar
DeComite, F., Denis, F., Gilleron, R.: Positive and unlabeled examples help learning. In: Workshop on Algorithmic Learning Theory, ALT (1999)
Google Scholar
Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: The Nineteenth International Conference on Machine Learning (ICML 2002), pp. 384–397 (2002)
Google Scholar
Li, X., Liu, B.: Learning to classify text using positive and unlabeled data. In: The International Joint Conference on Artifical Intelligence, IJCAI (2003)
Google Scholar
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building Text Classifiers Us-ing Positive and Unlabeled Examples. In: Proceedings of the Third IEEE International Con-ference on Data Mining, ICDM, pp. 179–187 (2003)
Google Scholar
Yu, H., Han, J., Chang, K.C.-C.: PEBL: Positive example based learn-ing for Web page classification using SVM. In: The international conference on Knowledge Discovery and Data mining, KDD (2002)
Google Scholar
Denis, F., Gilleron, R., Tommasi, M.: Text classification from positive and unlabeled examples. In: Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU (2002)
Google Scholar
Manevitz, L.M., Yousef, M.: One-Class SVMs for document classification. Journal of Machine Learning Research 2, 139–154 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, 130012, China
Hailong Yu, Wanli Zuo & Tao Peng

Authors

Hailong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Tao Peng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh
Technológico de Monterrey (ITESM), Campus Ciudad de México (CCM), Calle del Puente 222, Col. Ejudos de Huipulco, 14360 DF, Tlalpan, Mexico
Álvaro de Albornoz
Center for Intelligent Systems, Tecnológico de Monterrey, Campus Monterrey, 64849, Monterrey, N.L., Mexico
Hugo Terashima-Marín

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, H., Zuo, W., Peng, T. (2005). A New PU Learning Algorithm for Text Classification. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_84

Download citation

DOI: https://doi.org/10.1007/11579427_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29896-0
Online ISBN: 978-3-540-31653-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics