column

Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval

Authors:

ChengXiang Zhai,

William W. Cohen,

John LaffertyAuthors Info & Claims

ACM SIGIR Forum, Volume 49, Issue 1

Pages 2 - 9

https://doi.org/10.1145/2795403.2795405

Published: 23 June 2015 Publication History

Abstract

We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.

References

[1]

J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of news topics. In Proceedings of SIGIR 2001, pages 10--18, 2001.

Digital Library

[2]

J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR 1998, pages 335--336, 1998.

Digital Library

[3]

U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 45(4):634--652, July 1998.

Digital Library

[4]

D. Harman. Overview of the trec 2002 novelty track. In Proceedings of TREC 2002, 2002.

[5]

W. Hersh and P. Over. Trec-8 interactive track report. In E. Voorhees and D. Harman, editors, The Seventh Text REtrieval Conference (TREC-8), pages 57--64, 2000. NIST Special Publication 500-246.

[6]

K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of ACM SIGIR 2000, pages 41--48, 2000.

Digital Library

[7]

J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR'2001, pages 111--119, Sept 2001.

Digital Library

[8]

P. Ogilvie and J. Callan. Experiments using the lemur toolkit. In Proceedings of the 2001 Text REtrieval Conference, pages 103--108, 2002.

[9]

P. Over. Trec-6 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-6), pages 73--82, 1998. NIST Special Publication 500-240.

[10]

P. Over. Trec-7 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-7), pages 65--72, 1999. NIST Special Publication 500-242.

[11]

S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33(4):294--304, Dec. 1977.

[12]

T. Saracevic. Relevance reconsidered. In Proceedings of the 2nd Conference on Conceptions of Library and Information Science, pages 201--218, 1996.

[13]

H. R. Varian. Economics and search (Invited talk at SIGIR 1999). SIGIR Forum, 33(3), 1999.

Digital Library

[14]

C. Zhai and J. Lafferty. Model-based feedback in the KL-divergence retrieval model. In Tenth International Conference on Information and Knowledge Management (CIKM 2001), pages 403--410, 2001.

Digital Library

[15]

C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR'2001, pages 334--342, Sept 2001.

Digital Library

[16]

Y. Zhang, J. Callan, and T. Minka. Redundancy detection in adaptive filtering. In Proceedings of SIGIR'2002, pages 81--88, Aug 2002.

Digital Library

Cited By

Singhal ANeveditsin NTanveer HMago V(2024)Toward Fairness, Accountability, Transparency, and Ethics in AI for Social Media and Health Care: Scoping ReviewJMIR Medical Informatics10.2196/5004812(e50048)Online publication date: 3-Apr-2024
https://doi.org/10.2196/50048
Deng ZDou ZZhu YQin XCheng PWu JWang HHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)JDivPS: A Diversified Product Search DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657888(1152-1161)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657888
Alaofi MArabzadeh NClarke CSanderson M(2024)Generative Information Retrieval EvaluationInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_6(135-159)Online publication date: 12-Sep-2024
https://doi.org/10.1007/978-3-031-73147-1_6
Show More Cited By

Index Terms

Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval
1. Information systems
  1. Information retrieval

Recommendations

Beyond independent relevance: methods and evaluation metrics for subtopic retrieval
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking ...
Diverse retrieval via greedy optimization of expected 1-call@k in a latent subtopic relevance model
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

It has been previously observed that optimization of the 1-call@k relevance objective (i.e., a set-based objective that is 1 if at least one document is relevant, otherwise 0) empirically correlates with diverse retrieval. In this paper, we proceed one ...
Evaluating subtopic retrieval methods: Clustering versus diversification of search results

To address the inability of current ranking systems to support subtopic retrieval, two main post-processing techniques of search results have been investigated: clustering and diversification. In this paper we present a comparative study of their ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGIR Forum

ACM SIGIR Forum Volume 49, Issue 1

June 2015

69 pages

ISSN:0163-5840

DOI:10.1145/2795403

Editors:
Ben Carterette
University of Delaware, Newark, DE, USA
,
Craig Macdonald
University of Glasgow, Glasgow, UK

Issue’s Table of Contents

Copyright © 2015 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2015

Published in SIGIR Volume 49, Issue 1

Check for updates

Author Tags

Qualifiers

Column

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
572
Total Downloads

Downloads (Last 12 months)72
Downloads (Last 6 weeks)5

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Singhal ANeveditsin NTanveer HMago V(2024)Toward Fairness, Accountability, Transparency, and Ethics in AI for Social Media and Health Care: Scoping ReviewJMIR Medical Informatics10.2196/5004812(e50048)Online publication date: 3-Apr-2024
https://doi.org/10.2196/50048
Deng ZDou ZZhu YQin XCheng PWu JWang HHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)JDivPS: A Diversified Product Search DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657888(1152-1161)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657888
Alaofi MArabzadeh NClarke CSanderson M(2024)Generative Information Retrieval EvaluationInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_6(135-159)Online publication date: 12-Sep-2024
https://doi.org/10.1007/978-3-031-73147-1_6
Silva PJuneja BDesai SSingh AFawaz N(2023)Representation Online Matters: Practical End-to-End Diversification in Search and Recommender SystemsProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3594112(1735-1746)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3593013.3594112
Reddy BKumar R(2023)Classification of health care products using hybrid CNN-LSTM modelSoft Computing10.1007/s00500-023-08279-627:13(9199-9216)Online publication date: 2-May-2023
https://doi.org/10.1007/s00500-023-08279-6
Türkmen MLease MKutlu M(2023)New Metrics to Encourage Innovation and Diversity in Information Retrieval ApproachesAdvances in Information Retrieval10.1007/978-3-031-28238-6_16(239-254)Online publication date: 17-Mar-2023
https://doi.org/10.1007/978-3-031-28238-6_16
Liu JLiu J(2023)Back to the Fundamentals: Extend the Rational AssumptionsA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_5(131-152)Online publication date: 18-Feb-2023
https://doi.org/10.1007/978-3-031-23229-9_5
Hajian Hoseinabadi ACheshmehSohrabi M(2022)Proposing a New Combined Indicator for Measuring Search Engine Performance and Evaluating Google, Yahoo, DuckDuckGo, and Bing Search Engines based on Combined IndicatorJournal of Librarianship and Information Science10.1177/0961000622113857956:1(178-197)Online publication date: 8-Dec-2022
https://doi.org/10.1177/09610006221138579
Alhijawi BAwajan AFraihat S(2022)Survey on the Objectives of Recommender Systems: Measures, Solutions, Evaluation Methodology, and New PerspectivesACM Computing Surveys10.1145/352744955:5(1-38)Online publication date: 3-Dec-2022
https://dl.acm.org/doi/10.1145/3527449
Gao ZShen TMai ZBouadjenek MWaller IAnderson ABodkin RSanner SAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Mitigating the Filter Bubble While Maintaining RelevanceProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531890(2524-2531)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531890
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents