skip to main content
10.1145/3477495.3532787acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
keynote

Users: Can't Work With Them, Can't Work Without Them?

Published: 07 July 2022 Publication History

Abstract

If we could design the ideal IR "effectiveness" experiment (as distinct from an IR "efficiency" experiment), what would it look like? It would probably be a lab-based observational study [3] involving multiple search systems masked behind a uniform interface, and with hundreds (or thousands) of users each progressing some "real" search activity they were interested in. And we'd plan to (non-intrusively, somehow) capture per-snippet, per-document, per-SERP, and per-session annotations and satisfaction responses. The collected data could then be compared against a range of measured "task completion quality" indicators, and also against search effectiveness metric scores computed from the elements contained in the SERPs that were served by the systems. That's a tremendously big ask! So we often use offline evaluation techniques instead, employing test collections, static qrels sets, and effectiveness metrics [6]. We abstract the user into a deterministic evaluation script, supposing for pragmatic reasons that we know what query they would issue, and at the same time assuming that we can apply an effectiveness metric to calculate how much usefulness (or satisfaction) they will derive from any given SERP. The great advantage of this approach is that aside from the process of collecting the qrels, it is free of the need for users, meaning that it is repeatable. Indeed, we often do repeat, iterating to set parameters (and to rectify programming errors). Then, once metric scores have been computed, we carry out one or more paired statistical tests and draw conclusions as to relative system effectiveness.

References

[1]
P. Bailey, A. Moffat, F. Scholer, and P. Thomas. UQV100: A test collection with query variability. In Proc. SIGIR, pages 725--728, 2016. Public data: http://dx.doi.org/10.4225/49/5726E597B8376 .
[2]
P. Bailey, A. Moffat, F. Scholer, and P. Thomas. Retrieval consistency in the presence of query variations. In Proc. SIGIR, pages 395--404, 2017.
[3]
D. Kelly. Methods for evaluating interactive information retrieval systems with users. Found. & Trends in IR, 3 (1--2): 1--224, 2009.
[4]
A. Moffat, P. Bailey, F. Scholer, and P. Thomas. Incorporating user expectations and behavior into the measurement of search effectiveness. ACM Trans. Inf. Sys., 35 (3): 24:1--24:38, 2017.
[5]
A. Moffat, J. Mackenzie, P. Thomas, and L. Azzopardi. A flexible framework for offline effectiveness metrics. In Proc. SIGIR, 2022.
[6]
M. Sanderson. Test collection based evaluation of information retrieval systems. Found. & Trends in IR, 4 (4): 247--375, 2010.
[7]
A. F. Wicaksono and A. Moffat. Metrics, user models, and satisfaction. In Proc. WSDM, pages 654--662, Feb. 2020.
[8]
A. F. Wicaksono and A. Moffat. Modeling search and session effectiveness. Inf. Proc. & Man., 58 (4): 102601, 2021.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2022
3569 pages
ISBN:9781450387323
DOI:10.1145/3477495
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Check for updates

Author Tags

  1. effectiveness metric
  2. offline evaluation
  3. user browsing model

Qualifiers

  • Keynote

Funding Sources

Conference

SIGIR '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 312
    Total Downloads
  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)4
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media