LeQua@CLEF2022: Learning to Quantify

Esuli, Andrea; Moreo, Alejandro; Sebastiani, Fabrizio

doi:10.1007/978-3-030-99739-7_47

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Included in the following conference series:

European Conference on Information Retrieval

2836 Accesses
4 Citations

Abstract

LeQua 2022 is a new lab for the evaluation of methods for “learning to quantify” in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of literature has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting. For each such setting we provide data either in ready-made vector form or in raw document form.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Concise Overview of LeQua@CLEF 2022: Learning to Quantify

How Many Labels? Determining the Number of Labels in Multi-Label Text Classification

Semi-supervised Learning Algorithm for Binary Relevance Multi-label Classification

Notes

1.
One reason why KLD is undesirable is that it penalizes differently underestimation and overestimation; another is that it is very little robust to outliers. See [19, §4.7 and §5.2] for a detailed discussion of these and other reasons.
2.
Everything we say here on how we generate the test samples also applies to how we generate the development samples.
3.
Other seemingly correct methods, such as drawing n random values uniformly at random from the interval [0,1] and then normalizing them so that they sum up to 1, tends to produce a set of samples that is biased towards the centre of the unit $(n-1)$-simplex, for reasons discussed in [20].
4.
The set of 28 topic classes is flat, i.e., there is no hierarchy defined upon it.
5.
https://github.com/HLT-ISTI/QuaPy.
6.
Check the branch https://github.com/HLT-ISTI/QuaPy/tree/lequa2022.

References

Alaíz-Rodríguez, R., Guerrero-Curieses, A., Cid-Sueiro, J.: Class and subclass probability re-estimation to adapt a classifier in the presence of concept drift. Neurocomputing 74(16), 2614–2623 (2011)
Google Scholar
Card, D., Smith, N.A.: The importance of calibration for estimating proportions from annotations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2018), New Orleans, US, pp. 1636–1646 (2018)
Google Scholar
Da San Martino, G., Gao, W., Sebastiani, F.: Ordinal text quantification. In: Proceedings of the 39th ACM Conference on Research and Development in Information Retrieval (SIGIR 2016), Pisa, IT, pp. 937–940 (2016)
Google Scholar
José del Coz, J., González, P., Moreo, A., Sebastiani, F.: Learning to quantify: Methods and applications (LQ 2021). In: Proceedings of the 30th ACM International Conference on Knowledge Management (CIKM 2021), Gold Coast, AU (2021). Forthcoming
Google Scholar
du Plessis, M.C., Niu, G., Sugiyama, M.: Class-prior estimation for learning from positive and unlabeled data. Mach. Learn. 106(4), 463–492 (2016). https://doi.org/10.1007/s10994-016-5604-6
Article MathSciNet MATH Google Scholar
Esuli, A., Moreo, A., Sebastiani, F.: A recurrent neural network for sentiment quantification. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), Torino, IT, pp. 1775–1778 (2018)
Google Scholar
Esuli, A., Sebastiani, F.: Optimizing text quantifiers for multivariate loss functions. ACM Trans. Knowl. Discov. Data 9(4), Article 27 (2015)
Google Scholar
Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Disc. 17(2), 164–206 (2008)
Article MathSciNet Google Scholar
Gao, W., Sebastiani, F.: From classification to quantification in tweet sentiment analysis. Soc. Netw. Anal. Min. 6(1), 1–22 (2016). https://doi.org/10.1007/s13278-016-0327-z
Article Google Scholar
González, P., Castaño, A., Chawla, N.V., José del Coz, J.: A review on quantification learning. ACM Comput. Surv. 50(5), 74:1–74:40 (2017)
Google Scholar
Higashinaka, R., Funakoshi, K., Inaba, M., Tsunomori, Y., Takahashi, T., Kaji, N.: Overview of the 3rd dialogue breakdown detection challenge. In: Proceedings of the 6th Dialog System Technology Challenge (2017)
Google Scholar
Hopkins, D.J., King, G.: A method of automated nonparametric content analysis for social science. Am. J. Polit. Sci. 54(1), 229–247 (2010)
Article Google Scholar
King, G., Ying, L.: Verbal autopsy methods with multiple causes of death. Stat. Sci. 23(1), 78–91 (2008)
Article MathSciNet MATH Google Scholar
Levin, R., Roitman, H.: Enhanced probabilistic classify and count methods for multi-label text quantification. In: Proceedings of the 7th ACM International Conference on the Theory of Information Retrieval (ICTIR 2017), Amsterdam, NL, pp. 229–232 (2017)
Google Scholar
Moreno-Torres, J.G., Raeder, T., Alaíz-Rodríguez, R., Chawla, N.V., Herrera, F.: A unifying view on dataset shift in classification. Pattern Recogn. 45(1), 521–530 (2012)
Article Google Scholar
Moreo, A., Esuli, A., Sebastiani, F.: QuaPy: a python-based framework for quantification. In: Proceedings of the 30th ACM International Conference on Knowledge Management (CIKM 2021), Gold Coast, AU (2021). Forthcoming
Google Scholar
Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 Task 4: sentiment analysis in Twitter. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), San Diego, US, pp. 1–18 (2016)
Google Scholar
Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D. (eds.): Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Google Scholar
Sebastiani, F.: Evaluation measures for quantification: an axiomatic approach. Inf. Retrieval J. 23(3), 255–288 (2020)
Article MathSciNet Google Scholar
Smith, N.A., Tromble, R.W.: Sampling uniformly from the unit simplex (2004). Unpublished manuscript. https://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Zeng, Z., Kato, S., Sakai, T.: Overview of the NTCIR-14 short text conversation task: dialogue quality and nugget detection subtasks. In: Proceedings of NTCIR-14, pp. 289–315 (2019)
Google Scholar
Zeng, Z., Kato, S., Sakai, T., Kang, I.: Overview of the NTCIR-15 dialogue evaluation task (DialEval-1). In: Proceedings of NTCIR-15, pp. 13–34 (2020)
Google Scholar

Download references

Acknowledgments

This work has been supported by the SoBigdata++ project, funded by the European Commission (Grant 871042) under the H2020 Programme INFRAIA-2019-1, and by the AI4Media project, funded by the European Commission (Grant 951911) under the H2020 Programme ICT-48-2020. The authors’ opinions do not necessarily reflect those of the European Commission. We thank Alberto Barron Cedeño, Juan José del Coz, Preslav Nakov, and Paolo Rosso, for advice on how to best set up this lab.

Author information

Authors and Affiliations

Istituto di Scienza e Tecnologie dell’Informazione Consiglio Nazionale delle Ricerche, 56124, Pisa, Italy
Andrea Esuli, Alejandro Moreo & Fabrizio Sebastiani

Authors

Andrea Esuli
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Moreo
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Sebastiani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabrizio Sebastiani .

Editor information

Editors and Affiliations

Martin Luther University Halle-Wittenberg, Halle, Germany
Matthias Hagen
Leiden University, Leiden, The Netherlands
Suzan Verberne
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Duisburg-Essen, Essen, Germany
Christin Seifert
University of Stavanger, Stavanger, Norway
Krisztian Balog
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Stavanger, Stavanger, Norway
Vinay Setty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Esuli, A., Moreo, A., Sebastiani, F. (2022). LeQua@CLEF2022: Learning to Quantify. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-99739-7_47
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99738-0
Online ISBN: 978-3-030-99739-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics