Abstract
In many real world machine learning tasks, labeled training examples are expensive to obtain, while at the same time there is a lot of unlabeled examples available. One such class of learning problems is text classification. Active learning strives to reduce the required labeling effort while retaining the accuracy by intelligently selecting the examples to be labeled. However, very little comparison exists between different active learning methods. The effects of the ratio of positive to negative examples on the accuracy of such algorithms also received very little attention. This paper presents a comparison of two most promising methods and their performance on a range of categories from the Reuters Corpus Vol. 1 news article dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ANGLUIN, D. (1988): Queries and concept learning. Machine Learning, 2(3):319–342, 1988
BARAM, Y. and EL-YANIV, R. and LUZ, K. (2004): Online Choice of Active Learning Algorithms. The Journal of Machine Learning Research, 2004, 255–291
FREUND, Y. and SEUNG, H. S. and SHAMIR, E. and TISHBY, N. (1993): Information, prediction, and query by committee. Advances in Neural Information Processing Systems 5, pages 483–490, 1993
LEWIS, D. D. and GALE, W. A. (1994): A sequential algorithm for training text classifiers. In: Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval.
MUSLEA, I. and MINTON, S. and KNOBLOCK, C. (2002): Active + Semi-supervised Learning = Robust Multi-View learning. In: Proc. of the 19th International Conference on Machine Learning, pp. 435–442.
PLATT, J. C. (2002): Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers, MIT Press
ROSE, T. G. and STEVENSON, M. and WHITEHEAD, M. (2002): The Reuters Corpus Volume 1 — from Yesterday’s News to Tomorrow’s Language Resources. In: 3rd International Conference on Language Resources and Evaluation, May, p. 7
ROY, N. and MCCALLUM, A. (2001): Toward Optimal Active Learning through Sampling Estimation of Error Reduction. In: Proc. of the 18th International Conference on Machine Learning, pp 441–448.
SALTON, G. (1991): Developments in Automatic Text Retrieval. Science, Vol 253, pp 974–979, 1991
SEUNG H. S. and OPPER, M. and SOMPOLINSKY, H. (1992): Query by Committee. Computational Learning Theory pp. 287–294, 1992
TONG, S. and KOLLER, D. (2000): Support Vector Machine Active Learning with Applications to Text Classification. In: Proc. of the 17th International Conference on Machine Learning, pp. 999–1006.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Berlin · Heidelberg
About this paper
Cite this paper
Novak, B., Mladenič, D., Grobelnik, M. (2006). Text Classification with Active Learning. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_48
Download citation
DOI: https://doi.org/10.1007/3-540-31314-1_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)