Abstract
This paper studies the problem of building text classifiers using positive and unlabeled examples. The primary challenge of this problem as compared with classical text classification problem is that no labeled negative documents are available in the training example set. We call this problem PU-Oriented text Classification. Our text classifier adopts traditional two-step approach by making use of both positive and unlabeled examples. In the first step, we improved the 1-DNF algorithm by identifying much more reliable negative documents with very low error rate. In the second step, we build a set of classifiers by iteratively applying SVM algorithm on training data set, which is augmented during iteration. Different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Experimental results on the Reuter data set show that our method increases the performance (F1-measure) of classifier by 1.734 percent compared with PEBL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yang, Y., Pedersen, J.P.: Feature selection in statistical learning of text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)
Han, E.S., Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification, Computer Science Technical Report TR99-019 (1999)
Levis, D., Ringuette, M.: A comparison of two learning algorithms for text classification. In: Third Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)
Cortes, C., Vapnik, V.: Support vector networks. Machine learning 20, 273–297 (1995)
Cohen, W.J., Singer, Y.: Context-sensitive learning methods for text categorization. In: SIGIR 1996: Proc. 19th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 307–315 (1996)
Weiss, S.M., Apte, C., Damerau, F.J.: Maximizing Text-Mining Performance. IEEE Intelligent Systems, 2–8 (1999)
Wiener, E., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Processing of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR), pp. 22–34 (1995)
Denis, F.: PAC learning from positive statistical queries. In: Workshop on Algorithmic Learning Theory, ALT (1998)
Letouzey, F., Denis, F., Gilleron, R.: Learning from positive and unlabeled examples. In: Workshop on Algorithmic Learning Theory, ALT (2000)
DeComite, F., Denis, F., Gilleron, R.: Positive and unlabeled examples help learning. In: Workshop on Algorithmic Learning Theory, ALT (1999)
Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: The Nineteenth International Conference on Machine Learning (ICML 2002), pp. 384–397 (2002)
Li, X., Liu, B.: Learning to classify text using positive and unlabeled data. In: The International Joint Conference on Artifical Intelligence, IJCAI (2003)
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building Text Classifiers Us-ing Positive and Unlabeled Examples. In: Proceedings of the Third IEEE International Con-ference on Data Mining, ICDM, pp. 179–187 (2003)
Yu, H., Han, J., Chang, K.C.-C.: PEBL: Positive example based learn-ing for Web page classification using SVM. In: The international conference on Knowledge Discovery and Data mining, KDD (2002)
Denis, F., Gilleron, R., Tommasi, M.: Text classification from positive and unlabeled examples. In: Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU (2002)
Manevitz, L.M., Yousef, M.: One-Class SVMs for document classification. Journal of Machine Learning Research 2, 139–154 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yu, H., Zuo, W., Peng, T. (2005). A New PU Learning Algorithm for Text Classification. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_84
Download citation
DOI: https://doi.org/10.1007/11579427_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29896-0
Online ISBN: 978-3-540-31653-4
eBook Packages: Computer ScienceComputer Science (R0)