Abstract
Active learning refers to the task of devising a ranking function that, given a classifier trained from relatively few training examples, ranks a set of additional unlabeled examples in terms of how much further information they would carry, once manually labeled, for retraining a (hopefully) better classifier. Research on active learning in text classification has so far concentrated on single-label classification; active learning for multi-label classification, instead, has either been tackled in a simulated (and, we contend, non-realistic) way, or neglected tout court. In this paper we aim to fill this gap by examining a number of realistic strategies for tackling active learning for multi-label classification. Each such strategy consists of a rule for combining the outputs returned by the individual binary classifiers as a result of classifying a given unlabeled document. We present the results of extensive experiments in which we test these strategies on two standard text classification datasets.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Machine Learning 15(2), 201–221 (1994)
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR 1994), Dublin, IE, pp. 3–12 (1994)
Lewis, D.D.: Reuters-21578 text categorization test collection Distribution 1.0 README file, v 1.3 (2004)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)
Liu, T., Yang, Y., Wan, H., Zeng, H., Chen, Z., Ma, W.: Support vector machines classification with a very large-scale taxonomy. SIGKDD Explorations 7(1), 36–43 (2005)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45–66 (2001)
Davy, M., Luz, S.: Active learning with history-based query selection for text categorisation. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 695–698. Springer, Heidelberg (2007)
Liere, R., Tadepalli, P.: Active learning with committees for text categorization. In: Proceedings of the 14th Conference of the American Association for Artificial Intelligence (AAAI 1997), Providence, US, pp. 591–596 (1997)
McCallum, A.K., Nigam, K.: Employing EM in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning (ICML1998), Madison, US, pp. 350–358 (1998)
Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 393–407. Springer, Heidelberg (2003)
Esuli, A., Fagni, T., Sebastiani, F.: MP-boost: A multiple-pivot boosting algorithm and its application to text categorization. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 1–12. Springer, Heidelberg (2006)
Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning 39(2/3), 135–168 (2000)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR 1999), Berkeley, US, pp. 42–49 (1999)
Hoi, S.C.H., Jin, R., Lyu, M.R.: Large-scale text categorization by batch mode active learning. In: Proceedings of the 15th International Conference on World Wide Web (WWW 2006), Edinburgh, UK, pp. 633–642 (2006)
Raghavan, H., Madani, O., Jones, R.: InterActive feature selection. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), Edinburgh, UK, pp. 841–846 (2005)
Raghavan, H., Madani, O., Jones, R.: Active learning with feedback on features and instances. Journal of Machine Learning Research 7, 1655–1686 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Esuli, A., Sebastiani, F. (2009). Active Learning Strategies for Multi-Label Text Classification. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-00958-7_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)