skip to main content
10.1145/3409256.3409832acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article
Best Paper

Exploiting Stopping Time to Evaluate Accumulated Relevance

Published:14 September 2020Publication History

ABSTRACT

Evaluation measures are more or less explicitly based on user models which abstract how users interact with a ranked result list and how they accumulate utility from it. However, traditional measures typically come with a hard-coded user model which can be, at best, parametrized. Moreover, they take a deterministic approach which leads to assign a precise score to a system run.

In this paper, we take a different angle and, by relying on Markov chains and random walks, we propose a new family of evaluation measures which are able to accommodate for different and flexible user models, allow for simulating the interaction of different users, and turn the score into a random variable which more richly describes the performance of a system. We also show how the proposed framework allows for instantiating and better explaining some state-of-the-art measures, like AP, RBP, DCG, and ERR.

References

  1. L. Azzopardi, P. Thomas, and N. Craswell. 2018. Measuring the Utility of Search Engine Result Pages. In Proc. 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), K. Collins-Thompson, Q. Mei, B. Davison, Y. Liu, and E. Yilmaz (Eds.). ACM Press, New York, USA, 605--614.Google ScholarGoogle Scholar
  2. P. Bailey, A. Moffat, F. Scholer, and P. Thomas. 2015. User Variability and IR System Evaluation. In Proc. 38th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015), R. Baeza-Yates, M. Lalmas, A. Moffat, and B. Ribeiro-Neto (Eds.). ACM Press, New York, USA, 625--634.Google ScholarGoogle Scholar
  3. F. Baskaya, H. Keskustalo, and K. Jarvelin. 2013. Modeling Behavioral Factors in Interactive Information Retrieval. In Proc. 22nd International Conference on Information and Knowledge Management (CIKM 2013), A. Iyengar, Q. He, J. Pei, R. Rastogi, and W. Nejdl (Eds.). ACM Press, New York, USA, 2297--2302.Google ScholarGoogle Scholar
  4. C. Buckley and E. M. Voorhees. 2005. Retrieval System Evaluation. In TREC. Experiment and Evaluation in Information Retrieval, D. K. Harman and E. M. Voorhees (Eds.). MIT Press, Cambridge (MA), USA, 53--78.Google ScholarGoogle Scholar
  5. B. A. Carterette. 2011. System Effectiveness, User Models, and User Utility: A Conceptual Framework for Investigation. In Proc. 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011), W.-Y. Ma, J.-Y. Nie, R. Baeza-Yaetes, T.-S. Chua, and W. B. Croft (Eds.). ACM Press, New York, USA, 903--912.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. A. Carterette, E. Kanoulas, and E. Yilmaz. 2012. Incorporating Variability in User Behavior into Systems Based Evaluation. In Proc. 21st International Conference on Information and Knowledge Management (CIKM 2012), X. Chen, G. Lebanon, H. Wang, and M. J. Zaki (Eds.). ACM Press, New York, USA, 135--144.Google ScholarGoogle Scholar
  7. O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In Proc. 18th International Conference on Information and Knowledge Management (CIKM 2009), D. W.-L. Cheung, I.-Y. Song, W. W. Chu, X. Hu, and J. J. Lin (Eds.). ACM Press, New York, USA, 621--630.Google ScholarGoogle Scholar
  8. W. S. Cooper. 1968. Expected Search Length: A Single Measure of Retrieval Effectiveness. American Documentation, Vol. 19, 1 (January 1968), 30--41.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Dungs and N. Fuhr. 2017. Advanced Hidden Markov Models for Recognizing Search Phases. In Proc. 3rd ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2017), J. Kamps, E. Kanoulas, M. de Rijke, H. Fang, and E. Yilmaz (Eds.). ACM Press, New York, USA, 257--260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Ferrante, N. Ferro, and E. Losiouk. 2020. How do interval scales help us with better understanding IR evaluation measures? Information Retrieval Journal, Vol. 23, 3 (June 2020), 289--317.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Ferrante, N. Ferro, and M. Maistro. 2014. Injecting User Models and Time into Precision via Markov Chains. In Proc. 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2014), S. Geva, A. Trotman, P. Bruza, C. L. A. Clarke, and K. J"arvelin (Eds.). ACM Press, New York, USA, 597--606.Google ScholarGoogle Scholar
  12. M. Ferrante, N. Ferro, and M. Maistro. 2015. Towards a Formal Framework for Utility-oriented Measurements of Retrieval Effectiveness. In Proc. 1st ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2015), J. Allan, W. B. Croft, A. P. de Vries, C. Zhai, N. Fuhr, and Y. Zhang (Eds.). ACM Press, New York, USA, 21--30.Google ScholarGoogle Scholar
  13. M. Ferrante, N. Ferro, and S. Pontarollo. 2017. Are IR Evaluation Measures on an Interval Scale?. In Proc. 3rd ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2017), J. Kamps, E. Kanoulas, M. de Rijke, H. Fang, and E. Yilmaz (Eds.). ACM Press, New York, USA, 67--74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Ferrante, N. Ferro, and S. Pontarollo. 2019. A General Theory of IR Evaluation Measures. IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 31, 3 (March 2019), 409--422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Fuhr. 2017. Some Common Mistakes In IR Evaluation, And How They Can Be Avoided. SIGIR Forum, Vol. 51, 3 (December 2017), 32--41.Google ScholarGoogle Scholar
  16. J. Hadar and W. R. Russell. 1969. Rules for Ordering Uncertain Prospects. The American Economic Review, Vol. 59, 1 (1969), 25--34.Google ScholarGoogle Scholar
  17. K. Jarvelin and J. Kekalainen. 2002. Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (October 2002), 422--446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Maxwell and L. Azzopardi. 2016. Simulating Interactive Information Retrieval: SimIIR: A Framework for the Simulation of Interaction. In Proc. 39th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016), R. Perego, F. Sebastiani, J. Aslam, I. Ruthven, and J. Zobel (Eds.). ACM Press, New York, USA, 1141--1144.Google ScholarGoogle Scholar
  19. D. M. Maxwell. 2019. Modelling Search and Stopping in Interactive Information Retrieval. Ph.D. Dissertation. School of Computing Science, College of Science and Engineering, University of Glasgow, Scotland, UK.Google ScholarGoogle Scholar
  20. A. Moffat and J. Zobel. 2008. Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM Transactions on Information Systems (TOIS), Vol. 27, 1 (December 2008), 2:1--2:27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. R. Norris. 1998. Markov chains. Cambridge University Press, UK.Google ScholarGoogle Scholar
  22. S. Robertson. 2008. A New Interpretation of Average Precision. In Proc. 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), T.-S. Chua, M.-K. Leong, D. W. Oard, and F. Sebastiani (Eds.). ACM Press, New York, USA, 689--690.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. B. Rossi. 2014. Measurement and Probability. A Probabilistic Theory of Measurement with Applications. Springer-Verlag, New York, USA.Google ScholarGoogle Scholar
  24. T. Sakai and Z. Dou. 2013. Summaries, Ranked Retrieval and Sessions: A Unified Framework for Information Access Evaluation. In Proc. 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2013), G. J. F. Jones, P. Sheridan, D. Kelly, M. de Rijke, and T. Sakai (Eds.). ACM Press, New York, USA, 473--482.Google ScholarGoogle Scholar
  25. M. Sanderson. 2010. Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval (FnTIR), Vol. 4, 4 (2010), 247--375.Google ScholarGoogle ScholarCross RefCross Ref
  26. P. Serdyukov, N. Craswell, and G. Dupret. 2012. WSCD2012: Workshop on Web Search Click Data 2012. In Proc. 5th ACM International Conference on Web Searching and Data Mining (WSDM 2012), E. Adar, J. Teevan, E. Agichtein, and Y. Maarek (Eds.). ACM Press, New York, USA, 771--772.Google ScholarGoogle Scholar
  27. M. D. Smucker and C. L. A. Clarke. 2012a. Stochastic Simulation of Time-Biased Gain. In Proc. 21st International Conference on Information and Knowledge Management (CIKM 2012), X. Chen, G. Lebanon, H. Wang, and M. J. Zaki (Eds.). ACM Press, New York, USA, 2040--2044.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. D. Smucker and C. L. A. Clarke. 2012b. Time-Based Calibration of Effectiveness Measures. In Proc. 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012), W. Hersh, J. Callan, Y. Maarek, and M. Sanderson (Eds.). ACM Press, New York, USA, 95--104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Thomas, P. Bailey, A. Moffat, and F. Scholer. 2014. Modeling Decision Points in User Search Behavior. In Proc. 5th Symposium on Information Interaction in Context (IIiX 2014), D. Elsweiler, B. Ludwig, L. Azzopardi, and M. Wilson (Eds.). ACM Press, New York, USA, 239--242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. van Dijk, M. Ferrante, N. Ferro, and E. Kanoulas. 2019. A Markovian Approach to Evaluate Session-based IR Systems. In Advances in Information Retrieval. Proc. 41st European Conference on IR Research (ECIR 2019) -- Part I, L. Azzopardi, B. Stein, N. Fuhr, P. Mayr, C. Hauff, and D. Hiemstra (Eds.). Lecture Notes in Computer Science (LNCS) 11437, Springer, Heidelberg, Germany, 621--635.Google ScholarGoogle Scholar
  31. J. von Neumann and O. Morgenstern. 1953. Theory of Games and Economic Behavior 3rd ed.). Princeton University Press, Princeton (NJ), USA.Google ScholarGoogle Scholar
  32. E. Yilmaz and J. A. Aslam. 2006. Estimating Average Precision With Incomplete and Imperfect Judgments. In Proc. 15th International Conference on Information and Knowledge Management (CIKM 2006), P. S. Yu, V. Tsotras, E. A. Fox, and C.-B. Liu (Eds.). ACM Press, New York, USA, 102--111.Google ScholarGoogle Scholar
  33. E. Yilmaz, M. Shokouhi, N. Craswell, and S. Robertson. 2010. Expected Browsing Utility for Web Search Evaluation. In Proc. 19th International Conference on Information and Knowledge Management (CIKM 2010), J. Huang, N. Koudas, G. J. F. Jones, X. Wu, K. Collins-Thompson, and A. An (Eds.). ACM Press, New York, USA, 1561--1565.Google ScholarGoogle Scholar
  34. F. Zhang, Y. Liu, X. Li, M. Zhang, Y. Xu, and S. Ma. 2017b. EvaluatingWeb Search with a Bejeweled Player Model. In Proc. 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017), N. Kando, T. Sakai, H. Joho, H. Li, A. P. de Vries, and R. W. White (Eds.). ACM Press, New York, USA, 425--434.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. Zhang, X. Liu, and C. Zhai. 2017a. Information Retrieval Evaluation as Search Simulation: A General Formal Framework for IR Evaluation. In Proc. 3rd ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2017), J. Kamps, E. Kanoulas, M. de Rijke, H. Fang, and E. Yilmaz (Eds.). ACM Press, New York, USA, 193--200.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting Stopping Time to Evaluate Accumulated Relevance

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval
      September 2020
      207 pages
      ISBN:9781450380676
      DOI:10.1145/3409256

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 September 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate209of482submissions,43%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader