Skip to main content

Abstract

In many real world machine learning tasks, labeled training examples are expensive to obtain, while at the same time there is a lot of unlabeled examples available. One such class of learning problems is text classification. Active learning strives to reduce the required labeling effort while retaining the accuracy by intelligently selecting the examples to be labeled. However, very little comparison exists between different active learning methods. The effects of the ratio of positive to negative examples on the accuracy of such algorithms also received very little attention. This paper presents a comparison of two most promising methods and their performance on a range of categories from the Reuters Corpus Vol. 1 news article dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 159.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • ANGLUIN, D. (1988): Queries and concept learning. Machine Learning, 2(3):319–342, 1988

    Google Scholar 

  • BARAM, Y. and EL-YANIV, R. and LUZ, K. (2004): Online Choice of Active Learning Algorithms. The Journal of Machine Learning Research, 2004, 255–291

    Google Scholar 

  • FREUND, Y. and SEUNG, H. S. and SHAMIR, E. and TISHBY, N. (1993): Information, prediction, and query by committee. Advances in Neural Information Processing Systems 5, pages 483–490, 1993

    Google Scholar 

  • LEWIS, D. D. and GALE, W. A. (1994): A sequential algorithm for training text classifiers. In: Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval.

    Google Scholar 

  • MUSLEA, I. and MINTON, S. and KNOBLOCK, C. (2002): Active + Semi-supervised Learning = Robust Multi-View learning. In: Proc. of the 19th International Conference on Machine Learning, pp. 435–442.

    Google Scholar 

  • PLATT, J. C. (2002): Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers, MIT Press

    Google Scholar 

  • ROSE, T. G. and STEVENSON, M. and WHITEHEAD, M. (2002): The Reuters Corpus Volume 1 — from Yesterday’s News to Tomorrow’s Language Resources. In: 3rd International Conference on Language Resources and Evaluation, May, p. 7

    Google Scholar 

  • ROY, N. and MCCALLUM, A. (2001): Toward Optimal Active Learning through Sampling Estimation of Error Reduction. In: Proc. of the 18th International Conference on Machine Learning, pp 441–448.

    Google Scholar 

  • SALTON, G. (1991): Developments in Automatic Text Retrieval. Science, Vol 253, pp 974–979, 1991

    MathSciNet  Google Scholar 

  • SEUNG H. S. and OPPER, M. and SOMPOLINSKY, H. (1992): Query by Committee. Computational Learning Theory pp. 287–294, 1992

    Google Scholar 

  • TONG, S. and KOLLER, D. (2000): Support Vector Machine Active Learning with Applications to Text Classification. In: Proc. of the 17th International Conference on Machine Learning, pp. 999–1006.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Berlin · Heidelberg

About this paper

Cite this paper

Novak, B., Mladenič, D., Grobelnik, M. (2006). Text Classification with Active Learning. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_48

Download citation

Publish with us

Policies and ethics