ABSTRACT
There has been considerable interest in incorporating diversity in search results to account for redundancy and the space of possible user needs. Most work on this problem is based on subtopics: diversity rankers score documents against a set of hypothesized subtopics, and diversity rankings are evaluated by assigning a value to each ranked document based on the number of novel (and redundant) subtopics it is relevant to. This can be seen as modeling a user who is always interested in seeing more novel subtopics, with progressively decreasing interest in seeing the same subtopic multiple times. We put this model to test: if it is correct, then users, when given a choice, should prefer to see a document that has more value to the evaluation. We formulate some specific hypotheses from this model and test them with actual users in a novel preference-based design in which users express a preference for document A or document B given document C. We argue that while the user study shows the subtopic model is good, there are many other factors apart from novelty and redundancy that may be influencing user preferences. From this, we introduce a new framework to construct an ideal diversity ranking using only preference judgments, with no explicit subtopic judgments whatsoever.
- Amazon mechanical turk. http://www.mturk.com.Google Scholar
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proc. of WSDM, pages 5--14, 2009. Google ScholarDigital Library
- J. Allan, B. Carterette, and J. Lewis. When will information retrieval be "good enough"? In Proc. of SIGIR, pages 433--440, 2005. Google ScholarDigital Library
- O. Alonso, D. Rose, and B. Stewart. Crowdsourcing for relevance evaluation. In ACM SIGIR Forum, number 2, pages 9--15, Nov. 2008. Google ScholarDigital Library
- J. Arguello, F. Diaz, J. Callan, and B. Carterette. A methodology for evaluating aggregated search results. In Proceedings of the 33rd European Conference on Information Retrieval (ECIR, 2011. Google ScholarDigital Library
- B. Carterette, P. N. Bennett, D. M. Chickering, and S. T. Dumais. Here or there: preference judgments for relevance. In Proceedings of the IR research, 30th European conference on Advances in information retrieval, Proceedings of ECIR, pages 16--27, 2008. Google ScholarDigital Library
- P. Chandar and B. Carterette. What qualities do users prefer in diversity ranking. In Proceedings of the 2nd Workshop on Diversity in Document Retrieval, 2012.Google Scholar
- O. Chapelle, S. Ji, C. Liao, E. Velipasaoglu, L. Lai, and S.-L. Wu. Intent-based diversification of web search results. Information Retrieval, pages 1--21, 2011. Google ScholarDigital Library
- O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proceeding of CIKM, pages 621--630, 2009. Google ScholarDigital Library
- C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proceedings of SIGIR, pages 659--666, 2008. Google ScholarDigital Library
- T. Moore and R. Clayton. Evaluating the wisdom of crowds in assessing phishing websites. In 12th International Financial Cryptography and Data Security Conference, pages 16--30. Springer-Verlag, 2008. Google ScholarDigital Library
- F. Radlinski, P. N. Bennett, B. Carterette, and T. Joachims. Redundancy, diversity and interdependent document relevance. SIGIR Forum, 43:46--52, Dec 2009. Google ScholarDigital Library
- M. E. Rorvig. The simple scalability of documents. JASIS, 41(8):590--598, 1990.Google ScholarCross Ref
- T. Sakai, N. Craswell, R. Song, S. Robertson, Z. Dou, and C. Y. Lin. Simple evaluation metrics for diversified search results. In Proc. EVIA, 2010.Google Scholar
- M. Sanderson, M. L. Paramita, P. Clough, and E. Kanoulas. Do user preferences and evaluation measures line up? In Proceedings of SIGIR, pages 555--562, 2010. Google ScholarDigital Library
- E. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. In Proceedings of SIGIR, pages 315--323, 1998. Google ScholarDigital Library
- C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Proceedings of SIGIR, pages 10--17, 2003. Google ScholarDigital Library
Index Terms
- Using preference judgments for novel document retrieval
Recommendations
Evaluation measures for preference judgments
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalThere has been recent interest in collecting user or assessor preferences, rather than absolute judgments of relevance, for the evaluation or learning of ranking algorithms. Since measures like precision, recall, and DCG are defined over absolute ...
Machine-Assisted Search Preference Evaluation
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementInformation Retrieval systems are traditionally evaluated using the relevance of web pages to individual queries. Other work on IR evaluation has focused on exploring the use of preference judgments over two search result lists. Unlike traditional query-...
A document rating system for preference judgements
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalHigh quality relevance judgments are essential for the evaluation of information retrieval systems. Traditional methods of collecting relevance judgments are based on collecting binary or graded nominal judgments, but such judgments are limited by ...
Comments