Abstract
Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class beyond the labeled data categories. This problem has been widely studied in recent years and the semi-supervised learning is an efficient solution to learn from positive and unlabeled examples(or PU learning). Among all the semi-supervised PU learning methods, it’s hard to choose just one approach to fit all unlabeled data distribution. This paper proposes an automatic KL-divergence based semi-supervised learning method by using unlabeled data distribution knowledge. Meanwhile, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of the former methods. The experimental results show that (1)data distribution information is very helpful for the semi-supervised PU learning method; (2)the proposed framework can achieve higher precision when compared with the-state-of-the-art method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Manevitz, L.M., Yousef, M., Cristianini, N., Shawe-taylor, J., Williamson, B.: One class svms for document classification. Journal of Machine Learning Research 2, 139–154 (2001)
Yu, H., Han, J., Chang, K.C.C.: Pebl: Positive example based learning for web page classification using svm. In: KDD (2002)
Li, X., Liu, B., Dai, Y., Lee, W.S., Yu, P.S.: Building Text Classifiers Using Positive and Unlabeled Examples. In: ICDM (2003)
Denis, F.: PAC Learning from Positive Statistical Queries. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 112–126. Springer, Heidelberg (1998)
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the Twentieth International Conference on Machine Learning (2003)
Liu, Z., Shi, W., Li, D., Qin, Q.: Partially Supervised Classification – Based on Weighted Unlabeled Samples Support Vector Machine. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 118–129. Springer, Heidelberg (2005)
Zhang, D., Lee, W.S.: A simple probabilistic approach to learning from positive and unlabeled examples. In: UKCI (2005)
Elkan, C., Noto, K.: Learing classifiers from only positive and unlabeled data. In: KDD (2008)
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised selftraining of object detection models. In: Seventh IEEE Workshop on Applications of Computer Vision (2005)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: The 20th International Conference on Machine Learning (2003a)
Vapnik, V.: Statistical learning theory. Wiley-Interscience (1998)
Li, X.L., Liu, B., Ng, S.K.: Learning to Identify Unexpected Instances in the Test Set. In: AAAI (2007)
Zhu, X.J.: Semi-Supervised Learning Literature Survey. Technical Report 1530, Dept. Comp. Sci., Univ. Wisconsin-Madison (2006)
Wang, X., Xu, Z., Sha, C., Ester, M., Zhou, A.: Semi-supervised Learning from Only Positive and Unlabeled Data Using Entropy. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 668–679. Springer, Heidelberg (2010)
Cover, T., Thomas, J.: Elements of Information Theory. Wiley Interscience, Hoboken (1991)
Xu, Z., Sha, C.F., Wang, X.L., Zhou, A.Y.: Semi-supervised Classification Based on KL Divergence. Journal of Computer Research and Development 1, 81–87 (2010)
Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: NIPS 11, pp. 368–374 (1999)
http://www.daviddlewis.com/resources/testcollections/reuters21578/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, H., Sha, C., Wang, X., Zhou, A. (2012). Estimate Unlabeled-Data-Distribution for Semi-supervised PU Learning. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-29253-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29252-1
Online ISBN: 978-3-642-29253-8
eBook Packages: Computer ScienceComputer Science (R0)