Text Classification with Active Learning

Novak, Blaž; Mladenič, Dunja; Grobelnik, Marko

doi:10.1007/3-540-31314-1_48

Blaž Novak²²,
Dunja Mladenič²² &
Marko Grobelnik²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2273 Accesses
3 Citations

Abstract

In many real world machine learning tasks, labeled training examples are expensive to obtain, while at the same time there is a lot of unlabeled examples available. One such class of learning problems is text classification. Active learning strives to reduce the required labeling effort while retaining the accuracy by intelligently selecting the examples to be labeled. However, very little comparison exists between different active learning methods. The effects of the ratio of positive to negative examples on the accuracy of such algorithms also received very little attention. This paper presents a comparison of two most promising methods and their performance on a range of categories from the Reuters Corpus Vol. 1 news article dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 159.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ANGLUIN, D. (1988): Queries and concept learning. Machine Learning, 2(3):319–342, 1988
Google Scholar
BARAM, Y. and EL-YANIV, R. and LUZ, K. (2004): Online Choice of Active Learning Algorithms. The Journal of Machine Learning Research, 2004, 255–291
Google Scholar
FREUND, Y. and SEUNG, H. S. and SHAMIR, E. and TISHBY, N. (1993): Information, prediction, and query by committee. Advances in Neural Information Processing Systems 5, pages 483–490, 1993
Google Scholar
LEWIS, D. D. and GALE, W. A. (1994): A sequential algorithm for training text classifiers. In: Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval.
Google Scholar
MUSLEA, I. and MINTON, S. and KNOBLOCK, C. (2002): Active + Semi-supervised Learning = Robust Multi-View learning. In: Proc. of the 19th International Conference on Machine Learning, pp. 435–442.
Google Scholar
PLATT, J. C. (2002): Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers, MIT Press
Google Scholar
ROSE, T. G. and STEVENSON, M. and WHITEHEAD, M. (2002): The Reuters Corpus Volume 1 — from Yesterday’s News to Tomorrow’s Language Resources. In: 3rd International Conference on Language Resources and Evaluation, May, p. 7
Google Scholar
ROY, N. and MCCALLUM, A. (2001): Toward Optimal Active Learning through Sampling Estimation of Error Reduction. In: Proc. of the 18th International Conference on Machine Learning, pp 441–448.
Google Scholar
SALTON, G. (1991): Developments in Automatic Text Retrieval. Science, Vol 253, pp 974–979, 1991
MathSciNet Google Scholar
SEUNG H. S. and OPPER, M. and SOMPOLINSKY, H. (1992): Query by Committee. Computational Learning Theory pp. 287–294, 1992
Google Scholar
TONG, S. and KOLLER, D. (2000): Support Vector Machine Active Learning with Applications to Text Classification. In: Proc. of the 17th International Conference on Machine Learning, pp. 999–1006.
Google Scholar

Download references

Author information

Authors and Affiliations

Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Blaž Novak, Dunja Mladenič & Marko Grobelnik

Authors

Blaž Novak
View author publications
You can also search for this author in PubMed Google Scholar
Dunja Mladenič
View author publications
You can also search for this author in PubMed Google Scholar
Marko Grobelnik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Technische und Betriebliche Informationssysteme, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Myra Spiliopoulou
Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Rudolf Kruse , Christian Borgelt & Andreas Nürnberger , &
Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), 76128, Karlsruhe
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Novak, B., Mladenič, D., Grobelnik, M. (2006). Text Classification with Active Learning. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_48

Download citation

DOI: https://doi.org/10.1007/3-540-31314-1_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics