Skip to main content

Estimate Unlabeled-Data-Distribution for Semi-supervised PU Learning

  • Conference paper
Web Technologies and Applications (APWeb 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7235))

Included in the following conference series:

Abstract

Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class beyond the labeled data categories. This problem has been widely studied in recent years and the semi-supervised learning is an efficient solution to learn from positive and unlabeled examples(or PU learning). Among all the semi-supervised PU learning methods, it’s hard to choose just one approach to fit all unlabeled data distribution. This paper proposes an automatic KL-divergence based semi-supervised learning method by using unlabeled data distribution knowledge. Meanwhile, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of the former methods. The experimental results show that (1)data distribution information is very helpful for the semi-supervised PU learning method; (2)the proposed framework can achieve higher precision when compared with the-state-of-the-art method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Manevitz, L.M., Yousef, M., Cristianini, N., Shawe-taylor, J., Williamson, B.: One class svms for document classification. Journal of Machine Learning Research 2, 139–154 (2001)

    Google Scholar 

  2. Yu, H., Han, J., Chang, K.C.C.: Pebl: Positive example based learning for web page classification using svm. In: KDD (2002)

    Google Scholar 

  3. Li, X., Liu, B., Dai, Y., Lee, W.S., Yu, P.S.: Building Text Classifiers Using Positive and Unlabeled Examples. In: ICDM (2003)

    Google Scholar 

  4. Denis, F.: PAC Learning from Positive Statistical Queries. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 112–126. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  5. Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)

    MATH  Google Scholar 

  6. Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the Twentieth International Conference on Machine Learning (2003)

    Google Scholar 

  7. Liu, Z., Shi, W., Li, D., Qin, Q.: Partially Supervised Classification – Based on Weighted Unlabeled Samples Support Vector Machine. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 118–129. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Zhang, D., Lee, W.S.: A simple probabilistic approach to learning from positive and unlabeled examples. In: UKCI (2005)

    Google Scholar 

  9. Elkan, C., Noto, K.: Learing classifiers from only positive and unlabeled data. In: KDD (2008)

    Google Scholar 

  10. Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised selftraining of object detection models. In: Seventh IEEE Workshop on Applications of Computer Vision (2005)

    Google Scholar 

  11. Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: The 20th International Conference on Machine Learning (2003a)

    Google Scholar 

  12. Vapnik, V.: Statistical learning theory. Wiley-Interscience (1998)

    Google Scholar 

  13. Li, X.L., Liu, B., Ng, S.K.: Learning to Identify Unexpected Instances in the Test Set. In: AAAI (2007)

    Google Scholar 

  14. Zhu, X.J.: Semi-Supervised Learning Literature Survey. Technical Report 1530, Dept. Comp. Sci., Univ. Wisconsin-Madison (2006)

    Google Scholar 

  15. Wang, X., Xu, Z., Sha, C., Ester, M., Zhou, A.: Semi-supervised Learning from Only Positive and Unlabeled Data Using Entropy. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 668–679. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  16. Cover, T., Thomas, J.: Elements of Information Theory. Wiley Interscience, Hoboken (1991)

    Book  MATH  Google Scholar 

  17. Xu, Z., Sha, C.F., Wang, X.L., Zhou, A.Y.: Semi-supervised Classification Based on KL Divergence. Journal of Computer Research and Development 1, 81–87 (2010)

    Google Scholar 

  18. Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: NIPS 11, pp. 368–374 (1999)

    Google Scholar 

  19. http://people.csail.mit.edu/jrennie/20Newsgroups

  20. http://www.daviddlewis.com/resources/testcollections/reuters21578/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hu, H., Sha, C., Wang, X., Zhou, A. (2012). Estimate Unlabeled-Data-Distribution for Semi-supervised PU Learning. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29253-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29252-1

  • Online ISBN: 978-3-642-29253-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics