Abstract
This paper presents a novel solution for the problem of building text classifier using positive documents (P) and unlabeled documents (U). Here, the unlabeled documents are mixed with positive and negative documents. This problem is also called PU-Learning. The key feature of PU-Learning is that there is no negative document for training. Recently, several approaches have been proposed for solving this problem. Most of them are based on the same idea, which builds a classifier in two steps. Each existing technique uses a different method for each step. Generally speaking, these existing approaches do not perform well when the size of P is small. In this paper, we propose a new approach aiming at improving the system when the size of P is small. This approach combines the graph-based semi-supervised learning method with the two-step method. Experiments indicate that our proposed method performs well especially when the size of P is small.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph minicuts. In: Proceedings of the 18th International Conference on Machine Learning, pp. 19–26 (2001)
Denis, F., Gilleron, R., Tommasi, M.: Text classification and co-training from positive and unlabeled examples. In: Proceedings of the ICML-03 Workshop on Continuum from Labeled to Unlabeled Data, pp. 80–87 (2003)
Denis, F., et al.: Learning from positive and unlabeled examples. Journal of Theoretical Computer Science 1(248), 70–83 (2005)
Joachims, T.: Transductive learning via spectral graph partitioning. In: Proceedings of the 20th International Conference on Machine Learning, pp. 290–297 (2003)
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the Twentieth International Conference on Machine Learning, 448–455 (2003)
Li, X., Liu, B.: Learning to classify text using positive and unlabeled data. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 587–594 (2003)
Liu, B., Lee, W.S., Yu, P., Li, X.: Partially supervised classification of text documents. In: Proceedings of the 19th International Conference on Machine Learning, pp. 387–394 (2002)
Liu, B., et al.: Building text classifiers using positive and unlabeled examples. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 179–188 (2003)
Szummer, M., Jaakkola, T.: Partially labeled classification with Markov random walks. Advances in Neural Information Processing Systems, 945–952 (2002)
Yu, H., Han, J., Chang, K.: PEBL: Positive example based learning for Web page classification using SVM. In: Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery in Databases, pp. 239–248 (2002)
Zhou, D., et al.: Learning with local and global consistency. Advances in Neural Information Processing Systems, 321–328 (2003)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning, pp. 912–919 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yu, S., Li, C. (2007). PE-PUC: A Graph Based PU-Learning Approach for Text Classification. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_43
Download citation
DOI: https://doi.org/10.1007/978-3-540-73499-4_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)