skip to main content
10.1145/2505515.2507665acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Users versus models: what observation tells us about effectiveness metrics

Published: 27 October 2013 Publication History

Abstract

Retrieval system effectiveness can be measured in two quite different ways: by monitoring the behavior of users and gathering data about the ease and accuracy with which they accomplish certain specified information-seeking tasks; or by using numeric effectiveness metrics to score system runs in reference to a set of relevance judgments. In the second approach, the effectiveness metric is chosen in the belief that user task performance, if it were to be measured by the first approach, should be linked to the score provided by the metric.
This work explores that link, by analyzing the assumptions and implications of a number of effectiveness metrics, and exploring how these relate to observable user behaviors. Data recorded as part of a user study included user self-assessment of search task difficulty; gaze position; and click activity. Our results show that user behavior is influenced by a blend of many factors, including the extent to which relevant documents are encountered, the stage of the search process, and task difficulty. These insights can be used to guide development of batch effectiveness metrics.

References

[1]
H. Akaike. A new look at the statistical model identification. IEEE Trans. Automatic Control, 19(6):716--723, 1974.
[2]
A. Al-Maskari, M. Sanderson, and P. Clough. The relationship between IR effectiveness measures and user satisfaction. In Proc. SIGIR, pages 773--774, Amsterdam, The Netherlands, 2007.
[3]
C. Buckley and E. M. Voorhees. Retrieval system evaluation. In E. M. Voorhees and D. K. Harman, editors, TREC: Experiment and Evaluation in Information Retrieval, chapter 3, pages 53--75. MIT Press, Cambridge, Massachusetts, 2005.
[4]
K. P. Burnham and D. R. Anderson. Model selection and multimodel inference: A practical information-theoretic approach. Springer, New York, 2nd edition, 2005.
[5]
B. Carterette. System effectiveness, user models, and user utility: A conceptual framework for investigation. In Proc. SIGIR, pages 903--912, Beijing, China, 2011.
[6]
B. Carterette, E. Kanoulas, and E. Yilmaz. Simulating simple user behavior for system effectiveness evaluation. In Proc. CIKM, pages 611--620, Glasgow, Scotland, 2011.
[7]
O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proc. CIKM, pages 621--630, Hong Kong, China, 2009.
[8]
O. Chapelle and Y. Zhang. A dynamic Bayesian network click model for web search ranking. In Proc. WWW, pages 1--10, Madrid, Spain, 2009.
[9]
N. Craswell, O. Zoeter, M. J. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In Proc. WSDM, pages 87--94, Palo Alto, CA, 2008.
[10]
G. Dupret. Discounted cumulative gain and user decision models. In Proc. SPIRE, pages 2--13, Pisa, Italy, 2011.
[11]
G. Dupret and B. Piwowarski. A user behavior model for average precision and its generalization to graded judgments. In Proc. SIGIR, pages 531--538, Geneva, Switzerland, 2010.
[12]
S. B. Huffman and M. Hochster. How well does result relevance predict session satisfaction? In Proc. SIGIR, pages 567--574, Amsterdam, The Netherlands, 2007.
[13]
K. J\"arvelin and J. Kek\"al\"ainen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Information Systems, 20(4):422--446, 2002.
[14]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In Proc. SIGIR, pages 154--161, Salvador, Brazil, 2005.
[15]
T. Jones, D. Hawking, P. Thomas, and R. Sankaranarayana. Relative effect of spam and irrelevant documents on user interaction with search engines. In Proc. CIKM, pages 2113--2116, Glasgow, 2011.
[16]
A. Moffat. Seven numeric properties of effectiveness metrics. In Proc. AIRS, 2013. To appear.
[17]
A. Moffat, F. Scholer, and P. Thomas. Models and metrics: IR evaluation as a user process. In Proc. Australasian Document Computing Symp., pages 47--54, Dec. 2012.
[18]
A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Information Systems, 27(1):2:1--2:27, 2008.
[19]
S. Robertson. A new interpretation of average precision. In Proc. SIGIR, pages 689--690, Singapore, 2008.
[20]
M. D. Smucker and C. L. A. Clarke. Stochastic simulation of time-biased gain. In Proc. CIKM, pages 2040--2044, Maui, Hawaii, 2012.
[21]
M. D. Smucker and C. L. A. Clarke. Time-based calibration of effectiveness measures. In Proc. SIGIR, pages 95--104, Portland, Oregon, 2012.
[22]
P. Thomas, T. Jones, and D. Hawking. What deliberately degrading search quality tells us about discount functions. In Proc. SIGIR, pages 1107--1108, Beijing, 2011.
[23]
P. Thomas, F. Scholer, and A. Moffat. What users do: The eyes have it. In Proc. AIRS, 2013. To appear.
[24]
A. Turpin and F. Scholer. User performance versus precision measures for simple search tasks. In Proc. SIGIR, pages 11--18, Seattle, Washington, 2006.
[25]
W.-C. Wu, D. Kelly, A. Edwards, and J. Arguello. Grannies, tanning beds, tattoos and NASCAR: Evaluation of search tasks with varying levels of cognitive complexity. In Proc. 4th Information Interaction in Context Symp., pages 254--257, Nijmegen, The Netherlands, 2012.
[26]
E. Yilmaz, M. Shokouhi, N. Craswell, and S. Robertson. Expected browsing utility for web search evaluation. In Proc. CIKM, pages 1561--1564, Toronto, Canada, 2010.
[27]
Y. Zhang, L. A. F. Park, and A. Moffat. Click-based evidence for decaying weight distributions in search effectiveness metrics. Information Retrieval, 13(1):46--69, 2010.

Cited By

View all
  • (2024)Evaluating Relative Retrieval Effectiveness with Normalized Residual GainProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698410(64-71)Online publication date: 8-Dec-2024
  • (2024)Evaluating Generative Ad Hoc Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657849(1916-1929)Online publication date: 10-Jul-2024
  • (2024)Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search EnginesAdvances in Information Retrieval10.1007/978-3-031-56063-7_4(56-71)Online publication date: 24-Mar-2024
  • Show More Cited By

Index Terms

  1. Users versus models: what observation tells us about effectiveness metrics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
    October 2013
    2612 pages
    ISBN:9781450322638
    DOI:10.1145/2505515
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. evaluation
    2. retrieval experiment
    3. system measurement

    Qualifiers

    • Research-article

    Conference

    CIKM'13
    Sponsor:
    CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
    October 27 - November 1, 2013
    California, San Francisco, USA

    Acceptance Rates

    CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)46
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 05 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Evaluating Relative Retrieval Effectiveness with Normalized Residual GainProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698410(64-71)Online publication date: 8-Dec-2024
    • (2024)Evaluating Generative Ad Hoc Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657849(1916-1929)Online publication date: 10-Jul-2024
    • (2024)Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search EnginesAdvances in Information Retrieval10.1007/978-3-031-56063-7_4(56-71)Online publication date: 24-Mar-2024
    • (2024)Understanding users' dynamic perceptions of search gain and cost in sessionsJournal of the Association for Information Science and Technology10.1002/asi.2493575:9(937-956)Online publication date: 17-Jun-2024
    • (2023)When Measurement MisleadsACM SIGIR Forum10.1145/3582524.358254056:1(1-20)Online publication date: 27-Jan-2023
    • (2023)A is for Adele: An Offline Evaluation Metric for Instant SearchProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605115(3-12)Online publication date: 9-Aug-2023
    • (2023)A Reference-Dependent Model for Web Search EvaluationProceedings of the ACM Web Conference 202310.1145/3543507.3583551(3396-3405)Online publication date: 30-Apr-2023
    • (2023)How Well do Offline Metrics Predict Online Performance of Product Ranking Models?Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591865(3415-3420)Online publication date: 19-Jul-2023
    • (2023)Practice and Challenges in Building a Business-oriented Search Engine Quality MetricProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591841(3295-3299)Online publication date: 19-Jul-2023
    • (2023)Preference-Based Offline EvaluationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3572725(1248-1251)Online publication date: 27-Feb-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media