Estimate Unlabeled-Data-Distribution for Semi-supervised PU Learning

Hu, Haoji; Sha, Chaofeng; Wang, Xiaoling; Zhou, Aoying

doi:10.1007/978-3-642-29253-8_3

Haoji Hu²⁰,
Chaofeng Sha²¹,
Xiaoling Wang²⁰ &
…
Aoying Zhou^20,21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7235))

Included in the following conference series:

Asia-Pacific Web Conference

2228 Accesses
1 Citations

Abstract

Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class beyond the labeled data categories. This problem has been widely studied in recent years and the semi-supervised learning is an efficient solution to learn from positive and unlabeled examples(or PU learning). Among all the semi-supervised PU learning methods, it’s hard to choose just one approach to fit all unlabeled data distribution. This paper proposes an automatic KL-divergence based semi-supervised learning method by using unlabeled data distribution knowledge. Meanwhile, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of the former methods. The experimental results show that (1)data distribution information is very helpful for the semi-supervised PU learning method; (2)the proposed framework can achieve higher precision when compared with the-state-of-the-art method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Manevitz, L.M., Yousef, M., Cristianini, N., Shawe-taylor, J., Williamson, B.: One class svms for document classification. Journal of Machine Learning Research 2, 139–154 (2001)
Google Scholar
Yu, H., Han, J., Chang, K.C.C.: Pebl: Positive example based learning for web page classification using svm. In: KDD (2002)
Google Scholar
Li, X., Liu, B., Dai, Y., Lee, W.S., Yu, P.S.: Building Text Classifiers Using Positive and Unlabeled Examples. In: ICDM (2003)
Google Scholar
Denis, F.: PAC Learning from Positive Statistical Queries. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 112–126. Springer, Heidelberg (1998)
Chapter Google Scholar
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
MATH Google Scholar
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the Twentieth International Conference on Machine Learning (2003)
Google Scholar
Liu, Z., Shi, W., Li, D., Qin, Q.: Partially Supervised Classification – Based on Weighted Unlabeled Samples Support Vector Machine. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 118–129. Springer, Heidelberg (2005)
Chapter Google Scholar
Zhang, D., Lee, W.S.: A simple probabilistic approach to learning from positive and unlabeled examples. In: UKCI (2005)
Google Scholar
Elkan, C., Noto, K.: Learing classifiers from only positive and unlabeled data. In: KDD (2008)
Google Scholar
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised selftraining of object detection models. In: Seventh IEEE Workshop on Applications of Computer Vision (2005)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: The 20th International Conference on Machine Learning (2003a)
Google Scholar
Vapnik, V.: Statistical learning theory. Wiley-Interscience (1998)
Google Scholar
Li, X.L., Liu, B., Ng, S.K.: Learning to Identify Unexpected Instances in the Test Set. In: AAAI (2007)
Google Scholar
Zhu, X.J.: Semi-Supervised Learning Literature Survey. Technical Report 1530, Dept. Comp. Sci., Univ. Wisconsin-Madison (2006)
Google Scholar
Wang, X., Xu, Z., Sha, C., Ester, M., Zhou, A.: Semi-supervised Learning from Only Positive and Unlabeled Data Using Entropy. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 668–679. Springer, Heidelberg (2010)
Chapter Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory. Wiley Interscience, Hoboken (1991)
Book MATH Google Scholar
Xu, Z., Sha, C.F., Wang, X.L., Zhou, A.Y.: Semi-supervised Classification Based on KL Divergence. Journal of Computer Research and Development 1, 81–87 (2010)
Google Scholar
Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: NIPS 11, pp. 368–374 (1999)
Google Scholar
http://people.csail.mit.edu/jrennie/20Newsgroups
http://www.daviddlewis.com/resources/testcollections/reuters21578/

Download references

Author information

Authors and Affiliations

Shanghai Key Laboratory of Trustworthy Computing, Software Engineering Institute, East China Normal University, China
Haoji Hu, Xiaoling Wang & Aoying Zhou
Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, China
Chaofeng Sha & Aoying Zhou

Authors

Haoji Hu
View author publications
You can also search for this author in PubMed Google Scholar
Chaofeng Sha
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, The University of Adelaide, Australia
Quan Z. Sheng
College of Information Science and Engineering, Northeastern University, 110819, Shenyang, China
Guoren Wang
Aarhus University, Denmark
Christian S. Jensen
Center for Applied Informatics, Victoria University, PO Box 14428, 8001, VIC, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, H., Sha, C., Wang, X., Zhou, A. (2012). Estimate Unlabeled-Data-Distribution for Semi-supervised PU Learning. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-29253-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29252-1
Online ISBN: 978-3-642-29253-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics