research-article

Evaluating Web Search with a Bejeweled Player Model

Authors:

Xin Li,

Shaoping MaAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 425 - 434

https://doi.org/10.1145/3077136.3080841

Published: 07 August 2017 Publication History

Get Access

Abstract

The design of a Web search evaluation metric is closely related with how the user's interaction process is modeled. Each behavioral model results in a different metric used to evaluate search performance. In these models and the user behavior assumptions behind them, when a user ends a search session is one of the prime concerns because it is highly related to both benefit and cost estimation. Existing metric design usually adopts some simplified criteria to decide the stopping time point: (1) upper limit for benefit (e.g. RR, AP); (2) upper limit for cost (e.g. Precision@N, DCG@N). However, in many practical search sessions (e.g. exploratory search), the stopping criterion is more complex than the simplified case. Analyzing benefit and cost of actual users' search sessions, we find that the stopping criteria vary with search tasks and are usually combination effects of both benefit and cost factors. Inspired by a popular computer game named Bejeweled, we propose a Bejeweled Player Model (BPM) to simulate users' search interaction processes and evaluate their search performances. In the BPM, a user stops when he/she either has found sufficient useful information or has no more patience to continue. Given this assumption, a new evaluation framework based on upper limits (either fixed or changeable as search proceeds) for both benefit and cost is proposed. We show how to derive a new metric from the framework and demonstrate that it can be adopted to revise traditional metrics like Discounted Cumulative Gain (DCG), Expected Reciprocal Rank (ERR) and Average Precision (AP). To show effectiveness of the proposed framework, we compare it with a number of existing metrics in terms of correlation between user satisfaction and the metrics based on a dataset that collects users' explicit satisfaction feedbacks and assessors' relevance judgements. Experiment results show that the framework is better correlated with user satisfaction feedbacks.

References

[1]

Azzah Al-Maskari, Mark Sanderson, and Paul Clough. 2007. The relationship between IR effectiveness measures and user satisfaction. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 773--774.

Abstract

References

Cited By

Index Terms

Recommendations

How Well do Offline and Online Evaluation Metrics Measure User Satisfaction in Web Image Search?

Grid-based Evaluation Metrics for Web Image Search

Meta-evaluation of Online and Offline Web Search Evaluation Metrics

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Badges

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations