ABSTRACT
We explore a set of hypothesis on user behavior that are potentially at the origin of the (Mean) Average Precision (AP) metric. This allows us to propose a more realistic version of AP where users click non-deterministically on relevant documents and where the number of relevant documents in the collection needs not be known in advance. We then depart from the assumption that a document is either relevant or irrelevant and we use instead relevance judgment similar to editorial labels used for Discounted Cumulated Gain (DCG). We assume that clicked documents provide users with a certain level of "utility" and that a user ends a search when she gathered enough utility. Based on the query logs of a commercial search engine we show how to evaluate the utility associated with a label from the record of past user interactions with the search engine and we show how the two different user models can be evaluated based on their ability to predict accurately future clicks. Finally, based on these user models, we propose a measure that captures the relative quality of two rankings.
- G. Dupret. User models to compare and evaluate web IR metrics. In Proceedings of SIGIR 2009 Workshop on The Future of IR Evaluation, 2009.Google Scholar
- G. Dupret and C. Liao. Estimating intrinsic document relevance from clicks. In Proceedings of the 3rd WSDM conference, 2010. Google ScholarDigital Library
- D. Kelly. Methods for Evaluating Interactive Information Retrieval Systems with Users, volume 3 of Foundations and Trends in Information Retrieval. 2009. Google ScholarDigital Library
- A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., 27(1):1--27, 2008. Google ScholarDigital Library
- S. Robertson. A new interpretation of average precision. In Proceedings of SIGIR'08, pages 689--690, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- E. M. Voorhees and D. Harman, editors. TREC: Experiment and Evaluation in Information Retrieval. MIT press, 2005. Google ScholarDigital Library
- K. Wang, T. Walker, and Z. Zheng. Pskip: estimating relevance ranking quality from web search clickthrough data. In Proceedings of the 15th ACM SIGKDD, pages 1355--1364, New York, NY, USA, 2009. ACM. More complete references for this work can be found in {1}. Google ScholarDigital Library
Index Terms
- A user behavior model for average precision and its generalization to graded judgments
Recommendations
A user browsing model to predict search engine click data from past observations.
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalSearch engine click logs provide an invaluable source of relevance information but this information is biased because we ignore which documents from the result list the users have actually seen before and after they clicked. Otherwise, we could estimate ...
User behavior driven ranking without editorial judgments
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementWe explore the potential of using users click-through logs where no editorial judgment is available to improve the ranking function of a vertical search engine. We base our analysis on the Cumulate Relevance Model, a user behavior model recently ...
User Behavior Modeling for Web Image Search
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data MiningWeb-based image search engines differ from Web search engines greatly. The intents or goals behind human interactions with image search engines are different. In image search, users mainly search images instead of Web pages or online services. It is ...
Comments