skip to main content
10.1145/3121050.3121064acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Dealing with Incomplete Judgments in Cascade Measures

Published: 01 October 2017 Publication History

Abstract

Cascade measures like alpha-nDCG, ERR-IA, and NRBP take into account novelty and diversity of query results and are computed using judgments provided by humans, which are costly to collect. These measures expect that all documents in the result list of a query are judged and cannot make use of judgments beyond the assigned labels. Existing work has demonstrated that condensing the query results by taking out documents without judgment can address this problem to some extent. However, how highly incomplete judgments can affect cascade measures and how to cope with such incompleteness have not been addressed yet. In this paper, we propose an approach which mitigates incomplete judgments by leveraging the content of documents relevant to the query's subtopics. These language models are estimated at each rank taking into account the document and the upper ranked ones. Then, our method determines gain values based on the Kullback-Leibler divergence between the language models. Experiments on the diversity tasks of the TREC Web Track 2009--2012 show that with only 15% of the judgments our method accurately reconstructs the original rankings determined by the established cascade measures.

References

[1]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM '09, pages 5--14, New York, NY, USA, 2009. ACM.
[2]
E. Amitay, D. Carmel, R. Lempel, and A. Soffer. Scaling IR-system Evaluation Using Term Relevance Sets. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '04, pages 10--17, New York, NY, USA, 2004. ACM.
[3]
J. A. Aslam, V. Pavlu, and E. Yilmaz. A statistical method for system evaluation using incomplete judgments. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '06, pages 541--548, New York, NY, USA, 2006. ACM.
[4]
T. Bompada, C.-C. Chang, J. Chen, R. Kumar, and R. Shenoy. On the robustness of relevance measures with incomplete judgments. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '07, pages 359--366, New York, NY, USA, 2007. ACM.
[5]
C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '04, pages 25--32, New York, NY, USA, 2004. ACM.
[6]
S. Büttcher, C. L. A. Clarke, P. C. K. Yeung, and I. Soboroff. Reliable information retrieval evaluation with incomplete and biased judgements. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '07, pages 63--70, New York, NY, USA, 2007. ACM.
[7]
B. Carterette and J. Allan. Semiautomatic evaluation of retrieval systems using document similarities. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management, CIKM '07, pages 873--876, New York, NY, USA, 2007. ACM.
[8]
B. Carterette, J. Allan, and R. Sitaraman. Minimal test collections for retrieval evaluation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '06, pages 268--275, New York, NY, USA, 2006. ACM.
[9]
O. Chapelle, S. Ji, C. Liao, E. Velipasaoglu, L. Lai, and S.-L. Wu. Intent-based diversification of web search results: metrics and algorithms. Information Retrieval, 14(6):572--592, 2011.
[10]
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management, CIKM '09, pages 621--630, New York, NY, USA, 2009. ACM.
[11]
C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '08, pages 659--666, New York, NY, USA, 2008. ACM.
[12]
C. L. Clarke, M. Kolla, and O. Vechtomova. An effectiveness measure for ambiguous and underspecified queries. In Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory, ICTIR '09, pages 188--199, Berlin, Heidelberg, 2009. Springer-Verlag.
[13]
V. Dang and W. B. Croft. Diversity by proportionality: an election-based approach to search result diversification. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 65--74. ACM, 2012.
[14]
V. Dang and W. B. Croft. Term level search result diversification. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 603--612. ACM, 2013.
[15]
S. Hu, Z. Dou, X. Wang, T. Sakai, and J.-R. Wen. Search result diversification based on hierarchical intents. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 63--72. ACM, 2015.
[16]
K. Hui and K. Berberich. Selective labeling and incomplete label mitigation for low-cost evaluation. In International Symposium on String Processing and Information Retrieval, pages 137--148. Springer International Publishing, 2015.
[17]
K. J"arvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20:422--446, October 2002.
[18]
G. K. Jayasinghe, W. Webber, M. Sanderson, and J. S. Culpepper. Improving test collection pools with machine learning. In Proceedings of the 2014 Australasian Document Computing Symposium, ADCS '14, pages 2:2--2:9, New York, NY, USA, 2014. ACM.
[19]
A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., 27(1):2:1--2:27, Dec. 2008.
[20]
T. Sakai. Alternatives to bpref. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 71--78. ACM, 2007.
[21]
T. Sakai. The unreusability of diversified search test collections. In EVIA@ NTCIR, 2013.
[22]
T. Sakai, Z. Dou, R. Song, and N. Kando. The reusability of a diversified search test collection. In Asia Information Retrieval Symposium, pages 26--38. Springer, 2012.
[23]
T. Sakai and N. Kando. On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Retr., 11(5):447--470, 2008.
[24]
E. M. Voorhees. Evaluation by highly relevant documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '01, pages 74--82, New York, NY, USA, 2001. ACM.
[25]
X. Wang, Z. Dou, T. Sakai, and J.-R. Wen. Evaluating search result diversity using intent hierarchies. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 415--424. ACM, 2016.
[26]
E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM '06, pages 102--111, New York, NY, USA, 2006. ACM.
[27]
C. Zhai. Statistical language models for information retrieval a critical review. Found. Trends Inf. Retr., 2:137--213, March 2008.
[28]
C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, SIGIR '03, pages 10--17, New York, NY, USA, 2003. ACM.

Cited By

View all
  • (2023)One-Shot Labeling for Automatic Relevance EstimationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592032(2230-2235)Online publication date: 19-Jul-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval
October 2017
348 pages
ISBN:9781450344906
DOI:10.1145/3121050
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ir evaluation
  2. low-cost evaluation
  3. novelty and diversity

Qualifiers

  • Research-article

Conference

ICTIR '17
Sponsor:

Acceptance Rates

ICTIR '17 Paper Acceptance Rate 27 of 54 submissions, 50%;
Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)One-Shot Labeling for Automatic Relevance EstimationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592032(2230-2235)Online publication date: 19-Jul-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media