skip to main content
column

Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval

Published: 23 June 2015 Publication History

Abstract

We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.

References

[1]
J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of news topics. In Proceedings of SIGIR 2001, pages 10--18, 2001.
[2]
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR 1998, pages 335--336, 1998.
[3]
U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 45(4):634--652, July 1998.
[4]
D. Harman. Overview of the trec 2002 novelty track. In Proceedings of TREC 2002, 2002.
[5]
W. Hersh and P. Over. Trec-8 interactive track report. In E. Voorhees and D. Harman, editors, The Seventh Text REtrieval Conference (TREC-8), pages 57--64, 2000. NIST Special Publication 500-246.
[6]
K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of ACM SIGIR 2000, pages 41--48, 2000.
[7]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR'2001, pages 111--119, Sept 2001.
[8]
P. Ogilvie and J. Callan. Experiments using the lemur toolkit. In Proceedings of the 2001 Text REtrieval Conference, pages 103--108, 2002.
[9]
P. Over. Trec-6 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-6), pages 73--82, 1998. NIST Special Publication 500-240.
[10]
P. Over. Trec-7 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-7), pages 65--72, 1999. NIST Special Publication 500-242.
[11]
S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33(4):294--304, Dec. 1977.
[12]
T. Saracevic. Relevance reconsidered. In Proceedings of the 2nd Conference on Conceptions of Library and Information Science, pages 201--218, 1996.
[13]
H. R. Varian. Economics and search (Invited talk at SIGIR 1999). SIGIR Forum, 33(3), 1999.
[14]
C. Zhai and J. Lafferty. Model-based feedback in the KL-divergence retrieval model. In Tenth International Conference on Information and Knowledge Management (CIKM 2001), pages 403--410, 2001.
[15]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR'2001, pages 334--342, Sept 2001.
[16]
Y. Zhang, J. Callan, and T. Minka. Redundancy detection in adaptive filtering. In Proceedings of SIGIR'2002, pages 81--88, Aug 2002.

Cited By

View all
  • (2024)Toward Fairness, Accountability, Transparency, and Ethics in AI for Social Media and Health Care: Scoping ReviewJMIR Medical Informatics10.2196/5004812(e50048)Online publication date: 3-Apr-2024
  • (2024)JDivPS: A Diversified Product Search DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657888(1152-1161)Online publication date: 10-Jul-2024
  • (2024)Generative Information Retrieval EvaluationInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_6(135-159)Online publication date: 12-Sep-2024
  • Show More Cited By

Index Terms

  1. Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGIR Forum
    ACM SIGIR Forum  Volume 49, Issue 1
    June 2015
    69 pages
    ISSN:0163-5840
    DOI:10.1145/2795403
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2015
    Published in SIGIR Volume 49, Issue 1

    Check for updates

    Author Tags

    1. Subtopic retrieval
    2. language models
    3. maximal marginal relevance

    Qualifiers

    • Column

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)72
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Toward Fairness, Accountability, Transparency, and Ethics in AI for Social Media and Health Care: Scoping ReviewJMIR Medical Informatics10.2196/5004812(e50048)Online publication date: 3-Apr-2024
    • (2024)JDivPS: A Diversified Product Search DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657888(1152-1161)Online publication date: 10-Jul-2024
    • (2024)Generative Information Retrieval EvaluationInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_6(135-159)Online publication date: 12-Sep-2024
    • (2023)Representation Online Matters: Practical End-to-End Diversification in Search and Recommender SystemsProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3594112(1735-1746)Online publication date: 12-Jun-2023
    • (2023)Classification of health care products using hybrid CNN-LSTM modelSoft Computing10.1007/s00500-023-08279-627:13(9199-9216)Online publication date: 2-May-2023
    • (2023)New Metrics to Encourage Innovation and Diversity in Information Retrieval ApproachesAdvances in Information Retrieval10.1007/978-3-031-28238-6_16(239-254)Online publication date: 17-Mar-2023
    • (2023)Back to the Fundamentals: Extend the Rational AssumptionsA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_5(131-152)Online publication date: 18-Feb-2023
    • (2022)Proposing a New Combined Indicator for Measuring Search Engine Performance and Evaluating Google, Yahoo, DuckDuckGo, and Bing Search Engines based on Combined IndicatorJournal of Librarianship and Information Science10.1177/0961000622113857956:1(178-197)Online publication date: 8-Dec-2022
    • (2022)Survey on the Objectives of Recommender Systems: Measures, Solutions, Evaluation Methodology, and New PerspectivesACM Computing Surveys10.1145/352744955:5(1-38)Online publication date: 3-Dec-2022
    • (2022)Mitigating the Filter Bubble While Maintaining RelevanceProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531890(2524-2531)Online publication date: 6-Jul-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media