Predicting Retrieval Performance Changes in Evolving Evaluation Environments

El-Ebshihy, Alaa; Fink, Tobias; Gonzalez-Saez, Gabriela; Galuščáková, Petra; Piroi, Florina; Iommi, David; Goeuriot, Lorraine; Mulhem, Philippe

doi:10.1007/978-3-031-42448-9_3

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14163))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

583 Accesses

Abstract

Information retrieval (IR) systems evaluation aims at comparing IR systems either (1) one to another with respect to a single test collection, and (2) across multiple collections. In the first case, the evaluation environment (test collection and evaluation metrics) stays the same, while the environment changes, in the second case. Different evaluation environments may be seen, in fact, as evolutionary versions of some given evaluation environment. In this work, we propose a methodology to predict the statistically significant change in the performance of an IR system (i.e. result delta \(\mathcal {R}\varDelta \)) by quantifying the differences between test collections (i.e. knowledge delta \(\mathcal {K}\varDelta \)). In a first phase, we quantify differences between document collections (i.e. \(\mathcal {K}_{d}\varDelta \)) in the test collections by means of TF-IDF and Language Models (LM) representations. We use the \(\mathcal {K}_{d}\varDelta \) to train SVM classification models to predict the significantly performance changes of various IR systems using evolving test collections derived from the Robust and TREC-COVID collections. We evaluate our approach against our previous \(\mathcal {K}_{d}\varDelta \) experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Recall that, in our work, a test collection, TC together with a set of appropriate metrics form an Evaluation Environment, EE.
2.
In order for the test collections to be comparable, we consider as our vocabulary all tokens across all test collections.
3.
where: \(L = Q1 - 1.5 * (Q3- Q1)\) and \(U = Q3 + 1.5 * (Q3- Q1)\).
4.
Where we apply a min-max normalization to the entries of these rows.
5.
In our previous work, we defined different types of \(\mathcal {R}\varDelta \), in this paper \(\mathcal {R}\varDelta \) coincides with \(\mathcal {R}_{e}\varDelta \) in [5].
6.
https://ir.nist.gov/covidSubmit/data.html.
7.
https://scikit-learn.org/stable/.
8.
The full set of results for the 56 classifiers can be found here: https://owncloud.tuwien.ac.at/index.php/s/opUP9QlFEUHlfsx.

References

Amati, G.: Frequentist and Bayesian approach to information retrieval. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. Lecture Notes in Computer Science, vol. 3936, pp. 13–24. Springer, Berlin (2006). https://doi.org/10.1007/11735106_3
Chapter Google Scholar
Ferro, N., Kim, Y., Sanderson, M.: Using collection shards to study retrieval performance effect sizes. ACM Trans. Inf. Syst. (TOIS) 37(3), 1–40 (2019)
Article Google Scholar
Ferro, N., Silvello, G.: Towards an anatomy of IR system component performances. J. Assoc. Inf. Sci. Technol. 69, 187–200 (2018). https://doi.org/10.1002/asi.23910
Article Google Scholar
Galuščáková, P., et al.: Longeval-retrieval: French-english dynamic test collection for continuous web search evaluation. arXiv preprint arXiv:2303.03229 (2023)
González-Sáez, G.N., Mulhem, P., Goeuriot, L.: Towards the evaluation of information retrieval systems on evolving datasets with pivot systems. In: Candan, K.S., et al. (eds.) CLEF 2021. Lecture Notes in Computer Science, vol. 12880, pp. 91–102. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-85251-1_8
Chapter Google Scholar
González-Sáez, G., et al.: Towards result delta prediction based on knowledge deltas for continuous IR evaluation. In: Faggioli, G., Ferro, N., Mothe, J., Raiber, F. (eds.) The QPP++ 2023: Query Performance Prediction and Its Evaluation in New Tasks Workshop (QPP++), pp. 20–24, no. 3366 in CEUR Workshop Proceedings, Aachen (2023). http://ceur-ws.org/Vol-3366/#paper-04
Hauff, C.: Predicting the effectiveness of queries and retrieval systems. In: SIGIR Forum, vol. 44, p. 88 (2010)
Google Scholar
He, B., Ounis, I.: Query performance prediction. Inf. Syst. 31(7), 585–594 (2006) https://doi.org/10.1016/j.is.2005.11.003, https://www.sciencedirect.com/science/article/pii/S0306437905000955. (1) SPIRE 2004 (2) Multimedia Databases
Heafield, K., Pouzyrevsky, I., Clark, J.H., Koehn, P.: Scalable modified Kneser-Ney language model estimation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 690–696 (2013)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2Nd edn. Prentice-Hall Inc, Upper Saddle River (2009)
Google Scholar
Kanoulas, E.: A short survey on online and offline methods for search quality evaluation. In: Russian Summer School on Information Retrieval (2015)
Google Scholar
Macdonald, C., Tonellotto, N.: Declarative experimentation in information retrieval using PyTerrier. In: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval, pp. 161–168 (2020)
Google Scholar
Rashidi, L., Zobel, J., Moffat, A.: Evaluating the predictivity of IR experiments. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 1667–1671. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3404835.3463040
Rogers, A., Kovaleva, O., Rumshisky, A.: A Primer in BERTology: what We know about how BERT works. Trans. Assoc. Comput. Linguist. 8, 842–866 (2021). https://doi.org/10.1162/tacl_a_00349
Article Google Scholar
Sanderson, M.: Test collection based evaluation of information retrieval systems. Now Publishers Inc (2010)
Google Scholar
Sanderson, M., Turpin, A., Zhang, Y., Scholer, F.: Differences in effectiveness across sub-collections. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 1965–1969. Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2396761.2398553
Voorhees, E., et al.: TREC-COVID: constructing a pandemic information retrieval test collection. In: ACM SIGIR Forum, vol. 54, no. 1, pp. 1–12. ACM New York (2021)
Google Scholar
Voorhees, E.M.: The TREC 2005 robust track. In: ACM SIGIR Forum, vol. 40, pp. 41–48. ACM, New York (2006)
Google Scholar
Wang, L.L., et al.: Cord-19: The covid-19 open research dataset. ArXiv (2020)
Google Scholar
Zhao, Y., Scholer, F., Tsegay, Y.: Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_8
Chapter Google Scholar

Download references

Acknowledgement

This work is supported by the ANR Kodicare bi-lateral project, grant ANR-19-CE23-0029 of the French Agence Nationale de la Recherche, and by the Austrian Science Fund (FWF), grant I-4471-N.

Author information

Authors and Affiliations

Research Studios Austria, Data Science Studio, Vienna, Austria
Alaa El-Ebshihy, Tobias Fink, Florina Piroi & David Iommi
Technische Universität Wien, Vienna, Austria
Alaa El-Ebshihy, Tobias Fink & Florina Piroi
Alexandria University, Alexandria, Egypt
Alaa El-Ebshihy
Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
Gabriela Gonzalez-Saez, Petra Galuščáková, Lorraine Goeuriot & Philippe Mulhem
Institute of Engineering, Univ. Grenoble Alpes, Grenoble, France
Gabriela Gonzalez-Saez, Petra Galuščáková, Lorraine Goeuriot & Philippe Mulhem

Authors

Alaa El-Ebshihy
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Fink
View author publications
You can also search for this author in PubMed Google Scholar
Gabriela Gonzalez-Saez
View author publications
You can also search for this author in PubMed Google Scholar
Petra Galuščáková
View author publications
You can also search for this author in PubMed Google Scholar
Florina Piroi
View author publications
You can also search for this author in PubMed Google Scholar
David Iommi
View author publications
You can also search for this author in PubMed Google Scholar
Lorraine Goeuriot
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Mulhem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alaa El-Ebshihy .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Avi Arampatzis
University of Amsterdam, Amsterdam, The Netherlands
Evangelos Kanoulas
CERTH-ITI, Thessaloniki, Greece
Theodora Tsikrika
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Utrecht University, Utrecht, The Netherlands
Anastasia Giachanou
Elsevier, Amsterdam, The Netherlands
Dan Li
University of Amsterdam, Amsterdam, The Netherlands
Mohammad Aliannejadi
University of Lausanne, Lausanne, Switzerland
Michalis Vlachos
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El-Ebshihy, A. et al. (2023). Predicting Retrieval Performance Changes in Evolving Evaluation Environments. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-42448-9_3
Published: 11 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42447-2
Online ISBN: 978-3-031-42448-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting Retrieval Performance Changes in Evolving Evaluation Environments