Active Learning Strategies for Multi-Label Text Classification

Esuli, Andrea; Sebastiani, Fabrizio

doi:10.1007/978-3-642-00958-7_12

Andrea Esuli¹⁹ &
Fabrizio Sebastiani¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

European Conference on Information Retrieval

3578 Accesses
25 Citations

Abstract

Active learning refers to the task of devising a ranking function that, given a classifier trained from relatively few training examples, ranks a set of additional unlabeled examples in terms of how much further information they would carry, once manually labeled, for retraining a (hopefully) better classifier. Research on active learning in text classification has so far concentrated on single-label classification; active learning for multi-label classification, instead, has either been tackled in a simulated (and, we contend, non-realistic) way, or neglected tout court. In this paper we aim to fill this gap by examining a number of realistic strategies for tackling active learning for multi-label classification. Each such strategy consists of a rule for combining the outputs returned by the individual binary classifiers as a result of classifying a given unlabeled document. We present the results of extensive experiments in which we test these strategies on two standard text classification datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Semi-supervised Learning Algorithm for Binary Relevance Multi-label Classification

How Many Labels? Determining the Number of Labels in Multi-Label Text Classification

MCVIE: An Effective Batch-Mode Active Learning for Multi-label Text Classification

References

Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Machine Learning 15(2), 201–221 (1994)
Google Scholar
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR 1994), Dublin, IE, pp. 3–12 (1994)
Google Scholar
Lewis, D.D.: Reuters-21578 text categorization test collection Distribution 1.0 README file, v 1.3 (2004)
Google Scholar
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)
Google Scholar
Liu, T., Yang, Y., Wan, H., Zeng, H., Chen, Z., Ma, W.: Support vector machines classification with a very large-scale taxonomy. SIGKDD Explorations 7(1), 36–43 (2005)
Article Google Scholar
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45–66 (2001)
MATH Google Scholar
Davy, M., Luz, S.: Active learning with history-based query selection for text categorisation. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 695–698. Springer, Heidelberg (2007)
Chapter Google Scholar
Liere, R., Tadepalli, P.: Active learning with committees for text categorization. In: Proceedings of the 14th Conference of the American Association for Artificial Intelligence (AAAI 1997), Providence, US, pp. 591–596 (1997)
Google Scholar
McCallum, A.K., Nigam, K.: Employing EM in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning (ICML1998), Madison, US, pp. 350–358 (1998)
Google Scholar
Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 393–407. Springer, Heidelberg (2003)
Chapter Google Scholar
Esuli, A., Fagni, T., Sebastiani, F.: MP-boost: A multiple-pivot boosting algorithm and its application to text categorization. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 1–12. Springer, Heidelberg (2006)
Chapter Google Scholar
Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning 39(2/3), 135–168 (2000)
Article MATH Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR 1999), Berkeley, US, pp. 42–49 (1999)
Google Scholar
Hoi, S.C.H., Jin, R., Lyu, M.R.: Large-scale text categorization by batch mode active learning. In: Proceedings of the 15th International Conference on World Wide Web (WWW 2006), Edinburgh, UK, pp. 633–642 (2006)
Google Scholar
Raghavan, H., Madani, O., Jones, R.: InterActive feature selection. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), Edinburgh, UK, pp. 841–846 (2005)
Google Scholar
Raghavan, H., Madani, O., Jones, R.: Active learning with feedback on features and instances. Journal of Machine Learning Research 7, 1655–1686 (2006)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Istituto di Scienza e Tecnologia dell’Informazione, Consiglio Nazionale delle Ricerche, Via Giuseppe Moruzzi 1, 56124, Pisa, Italy
Andrea Esuli & Fabrizio Sebastiani

Authors

Andrea Esuli
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Sebastiani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université de Toulouse - IRIT,, 118 Route de Narbonne,, 31062,, Toulouse Cedex 4,, France
Mohand Boughanem
Laboratoire d’Informatique de Grenoble, BP 53,, Université Joseph Fourier,, 38041, Grenoble Cedex 9,, France
Catherine Berrut
Université de Toulouse - IRIT,, 118 Route de Narbonne,, 31062, Toulouse Cedex 4,, France
Josiane Mothe & Chantal Soule-Dupuy &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Esuli, A., Sebastiani, F. (2009). Active Learning Strategies for Multi-Label Text Classification. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-00958-7_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics