skip to main content
10.1145/3077136.3080841acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Evaluating Web Search with a Bejeweled Player Model

Published: 07 August 2017 Publication History

Abstract

The design of a Web search evaluation metric is closely related with how the user's interaction process is modeled. Each behavioral model results in a different metric used to evaluate search performance. In these models and the user behavior assumptions behind them, when a user ends a search session is one of the prime concerns because it is highly related to both benefit and cost estimation. Existing metric design usually adopts some simplified criteria to decide the stopping time point: (1) upper limit for benefit (e.g. RR, AP); (2) upper limit for cost (e.g. Precision@N, DCG@N). However, in many practical search sessions (e.g. exploratory search), the stopping criterion is more complex than the simplified case. Analyzing benefit and cost of actual users' search sessions, we find that the stopping criteria vary with search tasks and are usually combination effects of both benefit and cost factors. Inspired by a popular computer game named Bejeweled, we propose a Bejeweled Player Model (BPM) to simulate users' search interaction processes and evaluate their search performances. In the BPM, a user stops when he/she either has found sufficient useful information or has no more patience to continue. Given this assumption, a new evaluation framework based on upper limits (either fixed or changeable as search proceeds) for both benefit and cost is proposed. We show how to derive a new metric from the framework and demonstrate that it can be adopted to revise traditional metrics like Discounted Cumulative Gain (DCG), Expected Reciprocal Rank (ERR) and Average Precision (AP). To show effectiveness of the proposed framework, we compare it with a number of existing metrics in terms of correlation between user satisfaction and the metrics based on a dataset that collects users' explicit satisfaction feedbacks and assessors' relevance judgements. Experiment results show that the framework is better correlated with user satisfaction feedbacks.

References

[1]
Azzah Al-Maskari, Mark Sanderson, and Paul Clough. 2007. The relationship between IR effectiveness measures and user satisfaction. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 773--774.
[2]
Paavo Arvola, Jaana Kekäläinen, and Marko Junkkari. 2010. Expected reading effort in focused retrieval evaluation. Information Retrieval 13, 5 (2010), 460--484.
[3]
Leif Azzopardi. 2011. The economics in interactive information retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 15--24.
[4]
Leif Azzopardi and Guido Zuccon. 2015. An analysis of theories of search and search behavior. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval. ACM, 81--90.
[5]
Leif Azzopardi and Guido Zuccon. 2016. An analysis of the cost and benefit of search interactions. In Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval. ACM, 59--68.
[6]
Peter Bailey, Alistair Moffat, Falk Scholer, and Paul Thomas. 2015. User variability and IR system evaluation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 625--634.
[7]
Marcia J Bates. 1989. The design of browsing and berrypicking techniques for the online search interface. Online review 13, 5 (1989), 407--424.
[8]
Urs Birchler and Monika Bütler. 2007. Information economics. Routledge.
[9]
Andrei Broder. 2002. A taxonomy of web search. In ACM Sigir forum, Vol. 36. ACM, 3--10.
[10]
Ben Carterette. 2011. System effectiveness, user models, and user utility: a conceptual framework for investigation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 903--912.
[11]
Irina Ceaparu, Jonathan Lazar, Katie Bessiere, John Robinson, and Ben Shneiderman. 2004. Determining causes and severity of end-user frustration. International journal of human-computer interaction 17, 3 (2004), 333--356.
[12]
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 621--630.
[13]
Ye Chen, Yiqun Liu, Ke Zhou, Meng Wang, Min Zhang, and Shaoping Ma. 2015. Does vertical bring more satisfaction? Predicting search satisfaction in a heterogeneous environment. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1581--1590.
[14]
Aleksandr Chuklin, Pavel Serdyukov, and Maarten De Rijke. 2013. Click modelbased information retrieval metrics. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 493--502.
[15]
Cyril W. Cleverdon, Jack Mills, and Michael Keen. 1966. Factors determining the performance of indexing systems. (1966).
[16]
Jacob Cohen, Patricia Cohen, Stephen G. West, and Leona S. Aiken. 2013. Applied multiple regression/correlation analysis for the behavioral sciences. Routledge.
[17]
William S. Cooper. 1973. On selecting a measure of retrieval effectiveness. Journal of the American Society for Information Science 24, 2 (1973), 87--100.
[18]
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 87--94.
[19]
Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan Dumais, and Thomas White. 2005. Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS) 23, 2 (2005), 147--168.
[20]
Norbert Fuhr. 2008. A probability ranking principle for interactive information retrieval. Information Retrieval 11, 3 (2008), 251--265.
[21]
Ahmed Hassan, Ryen W. White, Susan T. Dumais, and Yi-Min Wang. 2014. Struggling or exploring? Disambiguating long search sessions. In Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 53--62.
[22]
Scott B. Huffman and Michael Hochster. 2007. How well does result relevance predict session satisfaction? In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 567--574.
[23]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422--446.
[24]
Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 133--142.
[25]
Diane Kelly and others. 2009. Methods for evaluating interactive information retrieval systems with users. Foundations and Trends® in Information Retrieval 3, 1-2 (2009), 1--224.
[26]
Yiqun Liu, Ye Chen, Jinhui Tang, Jiashen Sun, Min Zhang, Shaoping Ma, and Xuan Zhu. 2015. Different users, different opinions: Predicting search satisfaction with mouse movement information. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 493--502.
[27]
Jiaxin Mao, Yiqun Liu, Ke Zhou, Jian-Yun Nie, Jingtao Song, Min Zhang, Shaoping Ma, Jiashen Sun, and Hengliang Luo. 2016. When does Relevance Mean Usefulness and User Satisfaction in Web Search? In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 463--472.
[28]
Stefano Mizzaro. 1997. Relevance: The whole history. JASIS 48, 9 (1997), 810--832.
[29]
Alistair Moffat, Paul Thomas, and Falk Scholer. 2013. Users versus models: What observation tells us about effectiveness metrics. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 659--668.
[30]
Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS) 27, 1 (2008), 2.
[31]
Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological review 106, 4 (1999), 643.
[32]
Stephen E. Robertson. 1977. The probability ranking principle in IR. Journal of documentation 33, 4 (1977), 294--304.
[33]
Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: A unified framework for information access evaluation. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 473--482.
[34]
Mark D. Smucker and Charles L. A. Clarke. 2012. Time-based calibration of effectiveness measures. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 95--104.
[35]
George J. Stigler. 1961. The economics of information. Journal of political economy 69, 3 (1961), 213--225.
[36]
Hal R. Varian and Jack Repcheck. 2010. Intermediate microeconomics: a modern approach. Vol. 6. WW Norton & Company New York.
[37]
Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. 2010. Expected browsing utility for web search evaluation. In Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 1561--1564.
[38]
Yinglong Zhang, Jin Zhang, Matthew Lease, and Jacek Gwizdka. 2014. Multidimensional relevance modeling via psychometrics and crowdsourcing. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 435--444.
[39]
Guido Zuccon. 2016. Understandability biased evaluation for information retrieval. In European Conference on Information Retrieval. Springer, 280--292.

Cited By

View all
  • (2024)What Matters in a Measure? A Perspective from Large-Scale Search EvaluationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657845(282-292)Online publication date: 10-Jul-2024
  • (2024)User-oriented metrics for search engine deterministic sort ordersInformation Processing & Management10.1016/j.ipm.2023.10354761:1(103547)Online publication date: Jan-2024
  • (2024)An Intrinsic Framework of Information Retrieval Evaluation MeasuresIntelligent Systems and Applications10.1007/978-3-031-47721-8_47(692-713)Online publication date: 10-Jan-2024
  • Show More Cited By

Index Terms

  1. Evaluating Web Search with a Bejeweled Player Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
    August 2017
    1476 pages
    ISBN:9781450350228
    DOI:10.1145/3077136
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Best Student Paper

    Author Tags

    1. benefit and cost
    2. evaluation metrics
    3. user model

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGIR '17
    Sponsor:

    Acceptance Rates

    SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)What Matters in a Measure? A Perspective from Large-Scale Search EvaluationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657845(282-292)Online publication date: 10-Jul-2024
    • (2024)User-oriented metrics for search engine deterministic sort ordersInformation Processing & Management10.1016/j.ipm.2023.10354761:1(103547)Online publication date: Jan-2024
    • (2024)An Intrinsic Framework of Information Retrieval Evaluation MeasuresIntelligent Systems and Applications10.1007/978-3-031-47721-8_47(692-713)Online publication date: 10-Jan-2024
    • (2023)A Reference-Dependent Model for Web Search EvaluationProceedings of the ACM Web Conference 202310.1145/3543507.3583551(3396-3405)Online publication date: 30-Apr-2023
    • (2023)Practice and Challenges in Building a Business-oriented Search Engine Quality MetricProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591841(3295-3299)Online publication date: 19-Jul-2023
    • (2023)Investigating the role of in-situ user expectations in Web searchInformation Processing & Management10.1016/j.ipm.2023.10330060:3(103300)Online publication date: May-2023
    • (2023)Formally Modeling Users in Information RetrievalA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_2(23-64)Online publication date: 18-Feb-2023
    • (2022)Evaluating Interpolation and Extrapolation Performance of Neural Retrieval ModelsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557312(2486-2496)Online publication date: 17-Oct-2022
    • (2022)The Dark Side of Relevance: The Effect of Non-Relevant Results on Search BehaviorProceedings of the 2022 Conference on Human Information Interaction and Retrieval10.1145/3498366.3505770(1-11)Online publication date: 14-Mar-2022
    • (2022)Constructing Better Evaluation Metrics by Incorporating the Anchoring Effect into the User ModelProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531953(2709-2714)Online publication date: 6-Jul-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media