skip to main content
10.1145/2911451.2911507acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

When does Relevance Mean Usefulness and User Satisfaction in Web Search?

Published: 07 July 2016 Publication History

Abstract

Relevance is a fundamental concept in information retrieval (IR) studies. It is however often observed that relevance as annotated by secondary assessors may not necessarily mean usefulness and satisfaction perceived by users. In this study, we confirm the difference by a laboratory study in which we collect relevance annotations by external assessors, usefulness and user satisfaction information by users, for a set of search tasks. We also find that a measure based on usefulness rather than relevance annotated has a better correlation with user satisfaction. However, we show that external assessors are capable of annotating usefulness when provided with more search context information. In addition, we also show that it is possible to generate automatically usefulness labels when some training data is available. Our findings explain why traditional system-centric evaluation metrics are not well aligned with user satisfaction and suggest that a usefulness-based evaluation method can be defined to better reflect the quality of search systems perceived by the users.

References

[1]
A. Al-Maskari, M. Sanderson, and P. Clough. The relationship between ir effectiveness measures and user satisfaction. In Proc. SIGIR '07, pages 773--774, New York, NY, USA, 2007. ACM.
[2]
A. Al-Maskari, M. Sanderson, and P. Clough. Relevance judgments between trec and non-trec assessors. In Proc. SIGIR '08, pages 683--684, New York, NY, USA, 2008. ACM.
[3]
N. J. Belkin, M. Cole, and R. Bierig. Is relevance the right criterion for evaluating interactive information retrieval. In Proc. SIGIR '08 Workshop on Beyond Binary Relevance: Preferences, Diversity, and Set-Level Judgments., 2008.
[4]
G. Buscher, L. van Elst, and A. Dengel. Segment-level display time as implicit feedback: A comparison to eye tracking. In Proc. SIGIR '09, pages 67--74, New York, NY, USA, 2009. ACM.
[5]
J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proc. SIGIR '98, pages 335--336, New York, NY, USA, 1998. ACM.
[6]
B. Carterette, E. Kanoulas, M. Hall, and P. Clough. Overview of the trec 2014 session track. 2013.
[7]
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proc. CIKM '09, pages 621--630, New York, NY, USA, 2009. ACM.
[8]
O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In Proc. WWW '09, pages 1--10, 2009.
[9]
C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proc. SIGIR '08, pages 659--666, 2008.
[10]
C. Cleverdon. The cranfield tests on index language devices. In Aslib proceedings, volume 19, pages 173--194. MCB UP Ltd, 1967.
[11]
J. Cohen. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin, 70(4):213, 1968.
[12]
J. Cohen and P. Cohen. Applied multiple regression/correlation analysis for the behavioral sciences, chapter 2, pages 53--54. Lawrence Erlbaum Associates, 1975.
[13]
M. Cole, J. Liu, N. Belkin, R. Bierig, J. Gwizdka, C. Liu, J. Zhang, and X. Zhang. Usefulness as the criterion for evaluation of interactive information retrieval. Proc. HCIR, pages 1--4, 2009.
[14]
W. S. Cooper. On selecting a measure of retrieval effectiveness. JASIS, 24(2):87--100, 1973.
[15]
Z. Dou, R. Song, and J.-R. Wen. A large-scale evaluation and analysis of personalized search strategies. In Proc. WWW '07, pages 581--590, New York, NY, USA, 2007. ACM.
[16]
S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit measures to improve web search. ACM TOIS, 23(2):147--168, 2005.
[17]
J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189--1232, 2001.
[18]
A. Hassan, R. Jones, and K. Klinkner. Beyond dcg: User behavior as a predictor of a successful search. In Proc. WSDM '10, pages 221--230, 2010.
[19]
A. Hassan, R. W. White, S. T. Dumais, and Y.-M. Wang. Struggling or exploring?: Disambiguating long search sessions. In Proc. WSDM '14, pages 53--62, New York, NY, USA, 2014. ACM.
[20]
S. Huffman and M. Hochster. How well does result relevance predict session satisfaction? In Proc. SIGIR '07, pages 567--574, 2007.
[21]
K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM TOIS, 20(4):422--446, Oct. 2002.
[22]
K. J\"arvelin, S. L. Price, L. M. Delcambre, and M. L. Nielsen. Discounted cumulated gain based evaluation of multiple-query ir sessions. In Advances in Information Retrieval, pages 4--15. 2008.
[23]
J. Jiang, A. Hassan Awadallah, X. Shi, and R. W. White. Understanding and predicting graded search satisfaction. In Proc. WSDM '15, pages 57--66, New York, NY, USA, 2015. ACM.
[24]
S. Jung, J. L. Herlocker, and J. Webster. Click data as implicit relevance feedback in web search. Information Processing & Management, 43(3):791--807, 2007.
[25]
E. Kanoulas, B. Carterette, P. Clough, and M. Sanderson. Evaluating multi-query sessions. In Proc. SIGIR '11, pages 1053--1062, 2011.
[26]
J. Kekalainen and K. Jarvelin. Using graded relevance assessments in ir evaluation. JASIST, 53(13):1120--1129, 2002.
[27]
D. Kelly. Methods for evaluating interactive information retrieval systems with users. Foundations and Trends in Information Retrieval, 3(1--2):1--224, 2009.
[28]
D. Kelly, X. Fu, and C. Shah. Effects of rank and precision of search results on users' evaluations of system performance. University of North Carolina, 2007.
[29]
Y. Kim, A. Hassan, R. W. White, and I. Zitouni. Modeling dwell time to predict click-level satisfaction. In Proc. WSDM '14, pages 193--202, New York, NY, USA, 2014. ACM.
[30]
J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. biometrics, pages 159--174, 1977.
[31]
C. Liu, J. Liu, N. Belkin, M. Cole, and J. Gwizdka. Using dwell time as an implicit measure of usefulness in different task types. Proc. ASIST, 48(1):1--4, 2011.
[32]
Y. Liu, Y. Chen, and et. al. Different users, different opinions: Predicting search satisfaction with mouse movement information. In Proc. SIGIR '15, pages 493--502, 2015.
[33]
A. Moffat, P. Thomas, and F. Scholer. Users versus models: What observation tells us about effectiveness metrics. In Proc. CIKM '13, pages 659--668, New York, NY, USA, 2013. ACM.
[34]
D. Odijk, R. W. White, A. Hassan Awadallah, and S. T. Dumais. Struggling and success in web search. In Proc. CIKM '15, pages 1551--1560, New York, NY, USA, 2015. ACM.
[35]
T. Sakai and R. Song. Evaluating diversified search results using per-intent graded relevance. In Proc. SIGIR '11, pages 1043--1052.
[36]
M. Sanderson, M. L. Paramita, P. Clough, and E. Kanoulas. Do user preferences and evaluation measures line up? In Proc. SIGIR '10, pages 555--562, New York, NY, USA, 2010. ACM.
[37]
T. Saracevic. Relevance reconsidered. In the Second Conference on Conceptions of Library and Information Science, volume 1, pages 201--218, 1996.
[38]
M. Shokouhi, R. W. White, P. Bennett, and F. Radlinski. Fighting search engine amnesia: Reranking repeated results. In Proc. SIGIR '13, pages 273--282, New York, NY, USA, 2013. ACM.
[39]
J. Teevan, S. T. Dumais, and E. Horvitz. Personalizing search via automated analysis of interests and activities. In Proc. SIGIR '05, pages 449--456, New York, NY, USA, 2005. ACM.
[40]
A. Turpin and F. Scholer. User performance versus precision measures for simple search tasks. In Proc. SIGIR '06, pages 11--18, 2006.
[41]
P. Vakkari and E. Sormunen. The influence of relevance levels on the effectiveness of interactive information retrieval. JASIST, 55(11):963--969, 2004.
[42]
S. Verberne, M. Heijden, M. Hinne, M. Sappelli, S. Koldijk, E. Hoenkamp, and W. Kraaij. Reliability and validity of query intent assessments. JASIST, 64(11):2224--2237, 2013.
[43]
E. M. Voorhees. The philosophy of information retrieval evaluation. In the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, CLEF '01, pages 355--370, 2002.
[44]
E. M. Voorhees and D. Harman. Overview of trec 2001. In Trec, 2001.
[45]
R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In Proc. WWW '07, pages 21--30, 2007.
[46]
R. W. White and D. Kelly. A study on the effects of personalization and task information on implicit feedback performance. In Proc. CIKM '06, pages 297--306, New York, NY, USA, 2006. ACM.
[47]
E. Yilmaz, M. Verma, N. Craswell, F. Radlinski, and P. Bailey. Relevance and effort: An analysis of document utility. In Proc. CIKM '14, pages 91--100, New York, NY, USA, 2014. ACM.

Cited By

View all
  • (2024)Decoy Effect in Search Interaction: Understanding User Behavior and Measuring System VulnerabilityACM Transactions on Information Systems10.1145/370888443:2(1-58)Online publication date: 19-Dec-2024
  • (2024)Modeling Attentive Interaction Behavior for Web Content Identification in Exploratory Information SeekingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997508:4(1-28)Online publication date: 21-Nov-2024
  • (2024)AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance AssessmentProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698420(54-63)Online publication date: 8-Dec-2024
  • Show More Cited By

Index Terms

  1. When does Relevance Mean Usefulness and User Satisfaction in Web Search?

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
      July 2016
      1296 pages
      ISBN:9781450340694
      DOI:10.1145/2911451
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 July 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. evaluation
      2. relevance
      3. usefulness
      4. user satisfaction

      Qualifiers

      • Research-article

      Conference

      SIGIR '16
      Sponsor:

      Acceptance Rates

      SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)67
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 27 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Decoy Effect in Search Interaction: Understanding User Behavior and Measuring System VulnerabilityACM Transactions on Information Systems10.1145/370888443:2(1-58)Online publication date: 19-Dec-2024
      • (2024)Modeling Attentive Interaction Behavior for Web Content Identification in Exploratory Information SeekingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997508:4(1-28)Online publication date: 21-Nov-2024
      • (2024)AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance AssessmentProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698420(54-63)Online publication date: 8-Dec-2024
      • (2024)Understanding users' dynamic perceptions of search gain and cost in sessions: An expectation confirmation modelJournal of the Association for Information Science and Technology10.1002/asi.24935Online publication date: 17-Jun-2024
      • (2024)An empirical exploration of the subjectivity problem of information qualitiesJournal of the Association for Information Science and Technology10.1002/asi.24884Online publication date: 25-Mar-2024
      • (2024)How much freedom does an effectiveness metric really have?Journal of the Association for Information Science and Technology10.1002/asi.2487475:6(686-703)Online publication date: 15-Feb-2024
      • (2023)Constants and Variables: How Does the Visual Representation of the Holocaust by AI Change Over TimeEastern European Holocaust Studies10.1515/eehs-2023-00551:2(365-371)Online publication date: 27-Nov-2023
      • (2023)Report on the Dagstuhl Seminar on Frontiers of Information Access Experimentation for Research and EducationACM SIGIR Forum10.1145/3636341.363635157:1(1-28)Online publication date: 4-Dec-2023
      • (2023)On the Reliability of User Feedback for Evaluating the Quality of Conversational AgentsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615286(4185-4189)Online publication date: 21-Oct-2023
      • (2023)Metric-agnostic Ranking OptimizationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591935(2669-2680)Online publication date: 19-Jul-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media