research-article

Incorporating User Expectations and Behavior into the Measurement of Search Effectiveness

Authors:

Alistair Moffat,

Peter Bailey,

Falk Scholer,

Paul ThomasAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 35, Issue 3

Article No.: 24, Pages 1 - 38

https://doi.org/10.1145/3052768

Published: 05 June 2017 Publication History

Get Access

Abstract

Information retrieval systems aim to help users satisfy information needs. We argue that the goal of the person using the system, and the pattern of behavior that they exhibit as they proceed to attain that goal, should be incorporated into the methods and techniques used to evaluate the effectiveness of IR systems, so that the resulting effectiveness scores have a useful interpretation that corresponds to the users’ search experience. In particular, we investigate the role of search task complexity, and show that it has a direct bearing on the number of relevant answer documents sought by users in response to an information need, suggesting that useful effectiveness metrics must be goal sensitive. We further suggest that user behavior while scanning results listings is affected by the rate at which their goal is being realized, and hence that appropriate effectiveness metrics must be adaptive to the presence (or not) of relevant documents in the ranking. In response to these two observations, we present a new effectiveness metric, INST, that has both of the desired properties: INST employs a parameter T, a direct measure of the user’s search goal that adjusts the top-weightedness of the evaluation score; moreover, as progress towards the target T is made, the modeled user behavior is adapted, to reflect the remaining expectations. INST is experimentally compared to previous effectiveness metrics, including Average Precision (AP), Normalized Discounted Cumulative Gain (NDCG), and Rank-Biased Precision (RBP), demonstrating our claims as to INST’s usefulness. Like RBP, INST is a weighted-precision metric, meaning that each score can be accompanied by a residual that quantifies the extent of the score uncertainty caused by unjudged documents. As part of our experimentation, we use crowd-sourced data and score residuals to demonstrate that a wide range of queries arise for even quite specific information needs, and that these variant queries introduce significant levels of residual uncertainty into typical experimental evaluations. These causes of variability have wide-reaching implications for experiment design, and for the construction of test collections.

References

[1]

N. Alemayehu. 2003. Analysis of performance variation using query expansion. Journal of the American Society for Information Science and Technology 54, 5 (2003), 379--391.

Abstract

References

Cited By

Recommendations

User Behavior Modeling for Web Image Search

Estimating Measurement Uncertainty for Information Retrieval Effectiveness Metrics

Metrics, User Models, and Satisfaction

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations