Skip to main content

RATE: A Reliability-Aware Tester-Based Evaluation Framework of User Simulators

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13185))

Included in the following conference series:

Abstract

Evaluation of user simulators is needed in order to use them for evaluating Interactive Information Retrieval (IIR) Systems. Previous work has proposed a tester-based approach to evaluate user simulators, but it has not addressed the important question about the reliability of the testers themselves, nor has it studied how to generate a single reliability score for a user simulator based on multiple testers. In this paper, we address these two limitations and propose a novel Reliability-Aware Tester-based Evaluation (RATE) framework for evaluating the reliability of both User Simulators and testers. In this framework, the reliability of Testers and that of Simulators are jointly learned through unsupervised learning using iterative propagation of reliability. We propose and evaluate two algorithms for unsupervised learning of reliabilities. Evaluation results using TREC data sets show that the proposed RATE framework is effective in measuring the reliability of simulators and testers, thus serving as a foundation for potentially establishing a new paradigm for evaluating IIR systems using user simulation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Azzopardi, L., De Rijke, M., Balog, K.: Building simulated queries for known-item topics: an analysis using six European languages. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 455–462. ACM (2007)

    Google Scholar 

  2. Azzopardi, L., Järvelin, K., Kamps, J., Smucker, M.D.: Report on the SIGIR 2010 workshop on the simulation of interaction. In: ACM SIGIR Forum, vol. 44, pp. 35–47. ACM New York (2011)

    Google Scholar 

  3. Baskaya, F., Keskustalo, H., Järvelin, K.: Simulating simple and fallible relevance feedback. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 593–604. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_59

    Chapter  Google Scholar 

  4. Baskaya, F., Keskustalo, H., Järvelin, K.: Time drives interaction: Simulating sessions in diverse searching environments. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 105–114 (2012)

    Google Scholar 

  5. Carterette, B., Bah, A., Zengin, M.: Dynamic test collections for retrieval evaluation. In: Proceedings of the 2015 International Conference on the Theory of Information Retrieval, pp. 91–100. ACM (2015)

    Google Scholar 

  6. Chaput, M.: Whoosh. https://whoosh.readthedocs.io/en/latest/#

  7. Chuklin, A., Markov, I., Rijke, M.: Click models for web search. Synth. Lect. Inf. Concepts Retr. Serv. 7(3), 1–115 (2015)

    Google Scholar 

  8. Clarke, C.L., Freund, L., Smucker, M.D., Yilmaz, E.: SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation. In: Proceedings of the 36th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1134–1134 (2013)

    Google Scholar 

  9. Cleverdon, C.: The Cranfield tests on index language devices. In: Aslib Proceedings. MCB UP Ltd. (1967)

    Google Scholar 

  10. Jiang, J., Allan, J.: Correlation between system and user metrics in a session. In: Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval, pp. 285–288 (2016)

    Google Scholar 

  11. Jordan, C., Watters, C., Gao, Q.: Using controlled query generation to evaluate blind relevance feedback algorithms. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 286–295. ACM (2006)

    Google Scholar 

  12. Kelly, D.: Methods for Evaluating Interactive Information Retrieval Systems with Users. Now Publishers Inc. (2009)

    Google Scholar 

  13. Keskustalo, H., Järvelin, K., Pirkola, A., Sharma, T., Lykke, M.: Test collection-based IR evaluation needs extension toward sessions – a case of extremely short queries. In: Lee, G.G., et al. (eds.) AIRS 2009. LNCS, vol. 5839, pp. 63–74. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04769-5_6

    Chapter  Google Scholar 

  14. Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: The web as a graph: measurements, models, and methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, S., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, pp. 1–17. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48686-0_1

    Chapter  Google Scholar 

  15. Labhishetty, S., Zhai, C.: An exploration of tester-based evaluation of user simulators for comparing interactive retrieval systems. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 1598–1602. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3404835.3463091

  16. Maxwell, D., Azzopardi, L.: Agents, simulated users and humans: an analysis of performance and behaviour. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 731–740. ACM (2016)

    Google Scholar 

  17. Maxwell, D., Azzopardi, L.: Simulating interactive information retrieval: SimIIR: a framework for the simulation of interaction. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1141–1144 (2016)

    Google Scholar 

  18. Maxwell, D., Azzopardi, L., Järvelin, K., Keskustalo, H.: Searching and stopping: an analysis of stopping rules and strategies. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 313–322 (2015)

    Google Scholar 

  19. Rocchio, J.: Relevance feedback in information retrieval. The Smart retrieval system-experiments in automatic document processing, pp. 313–323 (1971)

    Google Scholar 

  20. Salle, A., Malmasi, S., Rokhlenko, O., Agichtein, E.: Studying the effectiveness of conversational search refinement through user simulation. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12656, pp. 587–602. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72113-8_39

    Chapter  Google Scholar 

  21. Sanderson, M.: Test Collection Based Evaluation of Information Retrieval Systems. Now Publishers Inc. (2010)

    Google Scholar 

  22. Sriram, S., Shen, X., Zhai, C.: A session-based search engine. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 492–493 (2004)

    Google Scholar 

  23. Trotman, A., Puurula, A., Burgess, B.: Improvements to BM25 and language models examined. In: Proceedings of the 2014 Australasian Document Computing Symposium, pp. 58–65 (2014)

    Google Scholar 

  24. Verberne, S., Sappelli, M., Järvelin, K., Kraaij, W.: User simulations for interactive search: evaluating personalized query suggestion. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 678–690. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_75

    Chapter  Google Scholar 

  25. Zeigler, B.P., Kim, T.G., Praehofer, H.: Theory of Modeling and Simulation, 2nd edn. Academic Press Inc., USA (2000)

    MATH  Google Scholar 

  26. Zhang, S., Balog, K.: Evaluating conversational recommender systems via user simulation. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1512–1520 (2020)

    Google Scholar 

  27. Zhang, Y., Liu, X., Zhai, C.: Information retrieval evaluation as search simulation: a general formal framework for IR evaluation. In: ACM ICTIR, pp. 193–200. ACM (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sahiti Labhishetty .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Labhishetty, S., Zhai, C. (2022). RATE: A Reliability-Aware Tester-Based Evaluation Framework of User Simulators. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13185. Springer, Cham. https://doi.org/10.1007/978-3-030-99736-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99736-6_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99735-9

  • Online ISBN: 978-3-030-99736-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics