Modelling Randomness in Relevance Judgments and Evaluation Measures

Ferrante, Marco; Ferro, Nicola; Pontarollo, Silvia

doi:10.1007/978-3-319-76941-7_15

Modelling Randomness in Relevance Judgments and Evaluation Measures

Marco Ferrante¹⁷,
Nicola Ferro¹⁸ &
Silvia Pontarollo¹⁷

Conference paper
First Online: 01 March 2018

4332 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10772))

Abstract

We propose a general stochastic approach which defines relevance as a set of binomial random variables where the expectation p of each variable indicates the quantity of relevance for each relevance grade. This represents the first step in the direction of modelling evaluation measures as a transformation of random variables, turning them into random evaluation measures. We show that a consequence of this new approach is to remove the distinction between binary and multi-graded measures and, at the same time, to deal with incomplete information, providing a single unified framework for all these different aspects. We experiment on TREC collections to show how these new random measures correlate to existing ones and which desirable properties, such as robustness to pool downsampling and discriminative power, they have.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alonso, O., Mizzaro, S.: Using crowdsourcing for TREC relevance assessment. IPM 48(6), 1053–1066 (2012)
Google Scholar
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. SIGIR 2004, 25–32 (2004)
Google Scholar
Chapelle, O., Metzler, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. CIKM 2009, 621–630 (2009)
Google Scholar
Ferrante, M., Ferro, N., Maistro, M.: Towards a formal framework for utility-oriented measurements of retrieval effectiveness. ICTIR 2015, 21–30 (2015)
Article Google Scholar
Hosseini, M., Cox, I.J., Milić-Frayling, N., Kazai, G., Vinay, V.: On aggregating labels from multiple crowd workers to infer relevance of documents. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 182–194. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28997-2_16
Chapter Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. TOIS 20(4), 422–446 (2002)
Article Google Scholar
Kazai, G., Craswell, N., Yilmaz, E., Tahaghoghi, S.S.M.: An analysis of systematic judging errors in information retrieval. CIKM 2012, 105–114 (2012)
Google Scholar
Kendall, M.G.: Rank Correlation Methods. Griffin, Oxford (1948)
MATH Google Scholar
Maddalena, E., Mizzaro, S., Scholer, F., Turpin, A.: On crowdsourcing relevance magnitudes for information retrieval evaluation. TOIS 35(3), 19:1–19:32 (2017)
Article Google Scholar
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. TOIS 27(1), 201–227 (2008)
Article Google Scholar
Park, L.A.F.: Uncertainty in rank-biased precision. ADCS 2016, 73–76 (2016)
Article Google Scholar
Robertson, S.E., Kanoulas, E., Yilmaz, E.: Extending average precision to graded relevance judgments. SIGIR 2010, 603–610 (2010)
Google Scholar
Sakai, T.: Evaluating evaluation metrics based on the bootstrap. SIGIR 2006, 525–532 (2006)
Google Scholar
Sakai, T.: Alternatives to Bpref. SIGIR 2007, 71–78 (2007)
Google Scholar
Sakai, T., Kando, N.: On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Retriev. 11(5), 447–470 (2008)
Article Google Scholar
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. SIGIR 1998, 315–323 (1998)
Google Scholar
Webber, W., Chandar, P., Carterette, B.A.: Alternative assessor disagreement and retrieval depth. CIKM 2012, 125–134 (2012)
Google Scholar
Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. CIKM 2006, 102–111 (2006)
Google Scholar
Yilmaz, E., Aslam, J.A., Robertson, S.E.: A new rank correlation coefficient for information retrieval. SIGIR 2008, 587–594 (2008)
Google Scholar
Zobel, J.: How reliable are the results of large-scale information retrieval experiments. SIGIR 1998, 307–314 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Padua, Padua, Italy
Marco Ferrante & Silvia Pontarollo
Department of Information Engineering, University of Padua, Padua, Italy
Nicola Ferro

Authors

Marco Ferrante
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Ferro
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Pontarollo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicola Ferro .

Editor information

Editors and Affiliations

Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy
Gabriella Pasi
LIP6 – UPMC/CNRS, University Pierre et Marie Curie, Paris, France
Benjamin Piwowarski
University of Glasgow, Glasgow, United Kingdom
Leif Azzopardi
Technical University of Vienna, Vienna, Austria
Allan Hanbury

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferrante, M., Ferro, N., Pontarollo, S. (2018). Modelling Randomness in Relevance Judgments and Evaluation Measures. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-76941-7_15
Published: 01 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76940-0
Online ISBN: 978-3-319-76941-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics