skip to main content
10.1145/3477495.3531819acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

From Cluster Ranking to Document Ranking

Published: 07 July 2022 Publication History

Abstract

The common approach of using clusters of similar documents for ad hoc document retrieval is to rank the clusters in response to the query; then, the cluster ranking is transformed to document ranking. We present a novel supervised approach to transform cluster ranking to document ranking. The approach allows to simultaneously utilize different clusterings and the resultant cluster rankings; this helps to improve the modeling of the document similarity space. Empirical evaluation shows that using our approach results in performance that substantially transcends the state-of-the-art in cluster-based document retrieval.

References

[1]
Kevyn Collins-Thompson, Paul N. Bennett, Fernando Diaz, Charlie Clarke, and Ellen M. Voorhees. 2013. TREC 2013 Web Track Overview. In Proceedings of TREC.
[2]
Gordon V. Cormack, Charles L. A. Clarke, and Stefan Büttcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of SIGIR. 758--759.
[3]
Gordon V. Cormack, Mark D. Smucker, and Charles L. A. Clarke. 2011. Efficient and effective spam filtering and re-ranking for large web datasets. Informaltiom Retrieval Journal 14, 5 (2011), 441--465.
[4]
W. Bruce Croft. 1980. A model of cluster searching based on classification. Information Systems 5 (1980), 189--195.
[5]
Abdelmoula El-Hamdouchi and Pitter Willett. 1986. Hierarchic Document Clustering using Ward's Method. In Proceedings of SIGIR. 149--156.
[6]
Alan Griffiths, H. Claire Luckhurst, and Peter Willett. 1986. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS) 37, 1 (1986), 3--11.
[7]
N. Jardine and C. J. van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7, 5 (1971), 217--240.
[8]
Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proceedings of KDD. 217--226.
[9]
Oren Kurland. 2008. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proceedings of SIGIR. 171--178.
[10]
Oren Kurland. 2009. Re-ranking search results using language models of queryspecific clusters. Journal of Information Retrieval 12, 4 (August 2009), 437--460.
[11]
Oren Kurland and Carmel Domshlak. 2008. A rank-aggregation approach to searching for optimal query-specific clusters. In Proceedings of SIGIR. 547--554.
[12]
Oren Kurland and Eyal Krikon. 2011. The Opposite of Smoothing: A Language Model Approach to Ranking Query-Specific Document Clusters. Journal of Artificial Intelligence Research (JAIR) 41 (2011), 367--395.
[13]
Oren Kurland and Lillian Lee. 2004. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of SIGIR. 194--201.
[14]
Oren Kurland and Lillian Lee. 2005. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of SIGIR. 306--313.
[15]
Oren Kurland and Lillian Lee. 2006. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proceedings of SIGIR. 83--90.
[16]
Or Levi, Fiana Raiber, Oren Kurland, and Ido Guy. 2016. Selective Cluster-Based Document Retrieval. In Proceedings of CIKM. 1473--1482.
[17]
Xiaoyong Liu and W. Bruce Croft. 2004. Cluster-Based Retrieval Using Language Models. In Proceedings of SIGIR. 186--193.
[18]
Xiaoyong Liu and W. Bruce Croft. 2006. Experiments on retrieval of optimal clusters. Technical Report IR-478. Center for Intelligent Information Retrieval (CIIR), University of Massachusetts.
[19]
Xiaoyong Liu and W. Bruce Croft. 2006. Representing clusters for retrieval. In Proceedings of SIGIR. 671--672. Poster.
[20]
Xiaoyong Liu and W. Bruce Croft. 2008. Evaluating text representations for retrieval of the best group of documents. In Proceedings of ECIR. 454--462.
[21]
Hisham Mohamed and Stéphane Marchand-Maillet. 2015. Quantized ranking for permutation-based indexing. Inf. Syst. 52 (2015), 163--175.
[22]
Scott E. Preece. 1973. Clustering as an output option. In Proceedings of the American Society for Information Science. 189--190.
[23]
Fiana Raiber and Oren Kurland. 2013. Ranking document clusters using markov random fields. In Proceedings of SIGIR. 333--342.
[24]
Fiana Raiber and Oren Kurland. 2013. The Technion at TREC 2013 Web Track: Cluster-based Document Retrieval. In Proceedings of TREC.
[25]
Tetsuya Sakai, Toshihiko Manabe, and Makoto Koyama. 2005. Flexible pseudorelevance feedback via selective sampling. ACM Transactions on Asian Language Information Processing (TALIP) 4, 2 (2005), 111--135.
[26]
Gerard Salton. 1968. Automatic Information Organization and Retrieval. McGraw-Hill, New York.
[27]
Ellen M. Voorhees. 1985. The cluster hypothesis revisited. In Proceedings of SIGIR. 188--196.
[28]
Oren Zamir and Oren Etzioni. 1998. Web document clustering: a feasibility demonstration. In Proceedings of SIGIR. 46--54.
[29]
Chengxiang Zhai and John D. Lafferty. 2001. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In Proceedings of SIGIR. 334--342.

Cited By

View all
  • (2024)Reliable Information Retrieval Systems Performance Evaluation: A ReviewIEEE Access10.1109/ACCESS.2024.337723912(51740-51751)Online publication date: 2024

Index Terms

  1. From Cluster Ranking to Document Ranking

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2022
    3569 pages
    ISBN:9781450387323
    DOI:10.1145/3477495
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ad hoc retrieval
    2. cluster ranking
    3. document ranking

    Qualifiers

    • Short-paper

    Funding Sources

    • VATAT

    Conference

    SIGIR '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Reliable Information Retrieval Systems Performance Evaluation: A ReviewIEEE Access10.1109/ACCESS.2024.337723912(51740-51751)Online publication date: 2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media