PE-PUC: A Graph Based PU-Learning Approach for Text Classification

Yu, Shuang; Li, Chunping

doi:10.1007/978-3-540-73499-4_43

Shuang Yu¹ &
Chunping Li¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

3691 Accesses
5 Citations

Abstract

This paper presents a novel solution for the problem of building text classifier using positive documents (P) and unlabeled documents (U). Here, the unlabeled documents are mixed with positive and negative documents. This problem is also called PU-Learning. The key feature of PU-Learning is that there is no negative document for training. Recently, several approaches have been proposed for solving this problem. Most of them are based on the same idea, which builds a classifier in two steps. Each existing technique uses a different method for each step. Generally speaking, these existing approaches do not perform well when the size of P is small. In this paper, we propose a new approach aiming at improving the system when the size of P is small. This approach combines the graph-based semi-supervised learning method with the two-step method. Experiments indicate that our proposed method performs well especially when the size of P is small.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph minicuts. In: Proceedings of the 18th International Conference on Machine Learning, pp. 19–26 (2001)
Google Scholar
Denis, F., Gilleron, R., Tommasi, M.: Text classification and co-training from positive and unlabeled examples. In: Proceedings of the ICML-03 Workshop on Continuum from Labeled to Unlabeled Data, pp. 80–87 (2003)
Google Scholar
Denis, F., et al.: Learning from positive and unlabeled examples. Journal of Theoretical Computer Science 1(248), 70–83 (2005)
Article MathSciNet Google Scholar
Joachims, T.: Transductive learning via spectral graph partitioning. In: Proceedings of the 20th International Conference on Machine Learning, pp. 290–297 (2003)
Google Scholar
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the Twentieth International Conference on Machine Learning, 448–455 (2003)
Google Scholar
Li, X., Liu, B.: Learning to classify text using positive and unlabeled data. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 587–594 (2003)
Google Scholar
Liu, B., Lee, W.S., Yu, P., Li, X.: Partially supervised classification of text documents. In: Proceedings of the 19th International Conference on Machine Learning, pp. 387–394 (2002)
Google Scholar
Liu, B., et al.: Building text classifiers using positive and unlabeled examples. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 179–188 (2003)
Google Scholar
Szummer, M., Jaakkola, T.: Partially labeled classification with Markov random walks. Advances in Neural Information Processing Systems, 945–952 (2002)
Google Scholar
Yu, H., Han, J., Chang, K.: PEBL: Positive example based learning for Web page classification using SVM. In: Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery in Databases, pp. 239–248 (2002)
Google Scholar
Zhou, D., et al.: Learning with local and global consistency. Advances in Neural Information Processing Systems, 321–328 (2003)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning, pp. 912–919 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Tsinghua University, Beijing 100084, China
Shuang Yu & Chunping Li

Authors

Shuang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Chunping Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, S., Li, C. (2007). PE-PUC: A Graph Based PU-Learning Approach for Text Classification. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_43

Download citation

DOI: https://doi.org/10.1007/978-3-540-73499-4_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics