Skip to main content

A Novel Reliable Negative Method Based on Clustering for Learning from Positive and Unlabeled Examples

  • Conference paper
Information Retrieval Technology (AIRS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4993))

Included in the following conference series:

Abstract

This paper investigates a new approach for training text classifiers when only a small set of positive examples is available together with a large set of unlabeled examples. The key feature of this problem is that there are no negative examples for learning. Recently, a few techniques have been reported are based on building a classifier in two steps. In this paper, we introduce a novel method for the first step, which cluster the unlabeled and positive examples to identify the reliable negative document, and then run SVM iteratively. We perform a comprehensive evaluation with other two methods, and show experimentally that it is efficient and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Learning to Classify Text from Labeled and Unlabeled Documents. In: AAAI-98, pp. 792–799. AAAI Press, Menlo Park (1998)

    Google Scholar 

  2. Denis, F.: PAC Learning from Positive Statistical Queries. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 112–126. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  3. Letouzey, F., Denis, F., Gilleron, R.: Learning From Positive and Unlabeled Examples. In: Proceedings of 11th International Conference on Algorithmic Learning Theory (2000)

    Google Scholar 

  4. Denis, F., Gilleron, R., Tommasi, M.: Text Classification from Positive and Unlabeled Examples. In: Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (2002)

    Google Scholar 

  5. Denis, F., Gilleron, R., Laurent, A., Tommasi, M.: Text Classification and Co-Training from Positive and Unlabeled Examples. In: Proceedings of the ICML 2003 Workshop: The Continuum from Labeled to Unlabeled Data (2003)

    Google Scholar 

  6. Liu, B., Dai, Y., Li, L.X., Lee, W.S., Yu, P.: Building Text Classifiers Using Positive and Unlabeled Examples. In: Proceedings of the Third IEEE International Conference on Data Mining (2003)

    Google Scholar 

  7. Li, X.L., Liu, B.: Learning to Classify Text using Positive and Unlabeled Data. In: Proceedings of Eighteenth International Joint Conference on Artificial Intelligence (2003)

    Google Scholar 

  8. Liu, B., Lee, W.S., Yu, P., Li, X.L.: Partially Supervised Classification of Text Documents. In: Proc. 19th Intl. Conf. on Machine Learning (2002)

    Google Scholar 

  9. Yu, H., Han, J., Chang, K.C.C.: PEBL: Web Page Classification Without Negative Examples. J. IEEE Transactions on Knowledge and Data Engineering (Special Issue on Mining and Searching the Web) 16(1), 70–81 (2004)

    Google Scholar 

  10. Zhao, Y., Karypis, G.: Hierarchical Clustering Algorithms for Document Datasets. J. Data Mining and Knowledge Discovery 10(2), 141–168 (2005)

    Article  MathSciNet  Google Scholar 

  11. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. J. of Information Retrieval 1(1/2), 67–88 (1999)

    Google Scholar 

  12. The CLUTO toolkit package, http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download

  13. Bow, A.: Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering, http://www.cs.cmu.edu/~mccallum/bow/

  14. Joachims, T.: Making large-Scale SVM Learning Practical. In: Advances in Kernel Methods - Support Vector Learning, MIT Press, Cambridge (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hang Li Ting Liu Wei-Ying Ma Tetsuya Sakai Kam-Fai Wong Guodong Zhou

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, B., Zuo, W. (2008). A Novel Reliable Negative Method Based on Clustering for Learning from Positive and Unlabeled Examples. In: Li, H., Liu, T., Ma, WY., Sakai, T., Wong, KF., Zhou, G. (eds) Information Retrieval Technology. AIRS 2008. Lecture Notes in Computer Science, vol 4993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68636-1_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68636-1_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68633-0

  • Online ISBN: 978-3-540-68636-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics