Abstract
Evaluation of user simulators is needed in order to use them for evaluating Interactive Information Retrieval (IIR) Systems. Previous work has proposed a tester-based approach to evaluate user simulators, but it has not addressed the important question about the reliability of the testers themselves, nor has it studied how to generate a single reliability score for a user simulator based on multiple testers. In this paper, we address these two limitations and propose a novel Reliability-Aware Tester-based Evaluation (RATE) framework for evaluating the reliability of both User Simulators and testers. In this framework, the reliability of Testers and that of Simulators are jointly learned through unsupervised learning using iterative propagation of reliability. We propose and evaluate two algorithms for unsupervised learning of reliabilities. Evaluation results using TREC data sets show that the proposed RATE framework is effective in measuring the reliability of simulators and testers, thus serving as a foundation for potentially establishing a new paradigm for evaluating IIR systems using user simulation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Azzopardi, L., De Rijke, M., Balog, K.: Building simulated queries for known-item topics: an analysis using six European languages. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 455–462. ACM (2007)
Azzopardi, L., Järvelin, K., Kamps, J., Smucker, M.D.: Report on the SIGIR 2010 workshop on the simulation of interaction. In: ACM SIGIR Forum, vol. 44, pp. 35–47. ACM New York (2011)
Baskaya, F., Keskustalo, H., Järvelin, K.: Simulating simple and fallible relevance feedback. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 593–604. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_59
Baskaya, F., Keskustalo, H., Järvelin, K.: Time drives interaction: Simulating sessions in diverse searching environments. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 105–114 (2012)
Carterette, B., Bah, A., Zengin, M.: Dynamic test collections for retrieval evaluation. In: Proceedings of the 2015 International Conference on the Theory of Information Retrieval, pp. 91–100. ACM (2015)
Chaput, M.: Whoosh. https://whoosh.readthedocs.io/en/latest/#
Chuklin, A., Markov, I., Rijke, M.: Click models for web search. Synth. Lect. Inf. Concepts Retr. Serv. 7(3), 1–115 (2015)
Clarke, C.L., Freund, L., Smucker, M.D., Yilmaz, E.: SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation. In: Proceedings of the 36th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1134–1134 (2013)
Cleverdon, C.: The Cranfield tests on index language devices. In: Aslib Proceedings. MCB UP Ltd. (1967)
Jiang, J., Allan, J.: Correlation between system and user metrics in a session. In: Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval, pp. 285–288 (2016)
Jordan, C., Watters, C., Gao, Q.: Using controlled query generation to evaluate blind relevance feedback algorithms. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 286–295. ACM (2006)
Kelly, D.: Methods for Evaluating Interactive Information Retrieval Systems with Users. Now Publishers Inc. (2009)
Keskustalo, H., Järvelin, K., Pirkola, A., Sharma, T., Lykke, M.: Test collection-based IR evaluation needs extension toward sessions – a case of extremely short queries. In: Lee, G.G., et al. (eds.) AIRS 2009. LNCS, vol. 5839, pp. 63–74. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04769-5_6
Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: The web as a graph: measurements, models, and methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, S., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, pp. 1–17. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48686-0_1
Labhishetty, S., Zhai, C.: An exploration of tester-based evaluation of user simulators for comparing interactive retrieval systems. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 1598–1602. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3404835.3463091
Maxwell, D., Azzopardi, L.: Agents, simulated users and humans: an analysis of performance and behaviour. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 731–740. ACM (2016)
Maxwell, D., Azzopardi, L.: Simulating interactive information retrieval: SimIIR: a framework for the simulation of interaction. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1141–1144 (2016)
Maxwell, D., Azzopardi, L., Järvelin, K., Keskustalo, H.: Searching and stopping: an analysis of stopping rules and strategies. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 313–322 (2015)
Rocchio, J.: Relevance feedback in information retrieval. The Smart retrieval system-experiments in automatic document processing, pp. 313–323 (1971)
Salle, A., Malmasi, S., Rokhlenko, O., Agichtein, E.: Studying the effectiveness of conversational search refinement through user simulation. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12656, pp. 587–602. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72113-8_39
Sanderson, M.: Test Collection Based Evaluation of Information Retrieval Systems. Now Publishers Inc. (2010)
Sriram, S., Shen, X., Zhai, C.: A session-based search engine. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 492–493 (2004)
Trotman, A., Puurula, A., Burgess, B.: Improvements to BM25 and language models examined. In: Proceedings of the 2014 Australasian Document Computing Symposium, pp. 58–65 (2014)
Verberne, S., Sappelli, M., Järvelin, K., Kraaij, W.: User simulations for interactive search: evaluating personalized query suggestion. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 678–690. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_75
Zeigler, B.P., Kim, T.G., Praehofer, H.: Theory of Modeling and Simulation, 2nd edn. Academic Press Inc., USA (2000)
Zhang, S., Balog, K.: Evaluating conversational recommender systems via user simulation. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1512–1520 (2020)
Zhang, Y., Liu, X., Zhai, C.: Information retrieval evaluation as search simulation: a general formal framework for IR evaluation. In: ACM ICTIR, pp. 193–200. ACM (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Labhishetty, S., Zhai, C. (2022). RATE: A Reliability-Aware Tester-Based Evaluation Framework of User Simulators. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13185. Springer, Cham. https://doi.org/10.1007/978-3-030-99736-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-99736-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99735-9
Online ISBN: 978-3-030-99736-6
eBook Packages: Computer ScienceComputer Science (R0)