short-paper

From Cluster Ranking to Document Ranking

Authors:

Egor Markovskiy,

Oren KurlandAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2137 - 2141

https://doi.org/10.1145/3477495.3531819

Published: 07 July 2022 Publication History

Abstract

The common approach of using clusters of similar documents for ad hoc document retrieval is to rank the clusters in response to the query; then, the cluster ranking is transformed to document ranking. We present a novel supervised approach to transform cluster ranking to document ranking. The approach allows to simultaneously utilize different clusterings and the resultant cluster rankings; this helps to improve the modeling of the document similarity space. Empirical evaluation shows that using our approach results in performance that substantially transcends the state-of-the-art in cluster-based document retrieval.

References

[1]

Kevyn Collins-Thompson, Paul N. Bennett, Fernando Diaz, Charlie Clarke, and Ellen M. Voorhees. 2013. TREC 2013 Web Track Overview. In Proceedings of TREC.

[2]

Gordon V. Cormack, Charles L. A. Clarke, and Stefan Büttcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of SIGIR. 758--759.

Digital Library

[3]

Gordon V. Cormack, Mark D. Smucker, and Charles L. A. Clarke. 2011. Efficient and effective spam filtering and re-ranking for large web datasets. Informaltiom Retrieval Journal 14, 5 (2011), 441--465.

Digital Library

[4]

W. Bruce Croft. 1980. A model of cluster searching based on classification. Information Systems 5 (1980), 189--195.

[5]

Abdelmoula El-Hamdouchi and Pitter Willett. 1986. Hierarchic Document Clustering using Ward's Method. In Proceedings of SIGIR. 149--156.

Digital Library

[6]

Alan Griffiths, H. Claire Luckhurst, and Peter Willett. 1986. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS) 37, 1 (1986), 3--11.

[7]

N. Jardine and C. J. van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7, 5 (1971), 217--240.

[8]

Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proceedings of KDD. 217--226.

Digital Library

[9]

Oren Kurland. 2008. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proceedings of SIGIR. 171--178.

Digital Library

[10]

Oren Kurland. 2009. Re-ranking search results using language models of queryspecific clusters. Journal of Information Retrieval 12, 4 (August 2009), 437--460.

Digital Library

[11]

Oren Kurland and Carmel Domshlak. 2008. A rank-aggregation approach to searching for optimal query-specific clusters. In Proceedings of SIGIR. 547--554.

Digital Library

[12]

Oren Kurland and Eyal Krikon. 2011. The Opposite of Smoothing: A Language Model Approach to Ranking Query-Specific Document Clusters. Journal of Artificial Intelligence Research (JAIR) 41 (2011), 367--395.

Digital Library

[13]

Oren Kurland and Lillian Lee. 2004. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of SIGIR. 194--201.

Digital Library

[14]

Oren Kurland and Lillian Lee. 2005. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of SIGIR. 306--313.

Digital Library

[15]

Oren Kurland and Lillian Lee. 2006. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proceedings of SIGIR. 83--90.

[16]

Or Levi, Fiana Raiber, Oren Kurland, and Ido Guy. 2016. Selective Cluster-Based Document Retrieval. In Proceedings of CIKM. 1473--1482.

Digital Library

[17]

Xiaoyong Liu and W. Bruce Croft. 2004. Cluster-Based Retrieval Using Language Models. In Proceedings of SIGIR. 186--193.

[18]

Xiaoyong Liu and W. Bruce Croft. 2006. Experiments on retrieval of optimal clusters. Technical Report IR-478. Center for Intelligent Information Retrieval (CIIR), University of Massachusetts.

[19]

Xiaoyong Liu and W. Bruce Croft. 2006. Representing clusters for retrieval. In Proceedings of SIGIR. 671--672. Poster.

[20]

Xiaoyong Liu and W. Bruce Croft. 2008. Evaluating text representations for retrieval of the best group of documents. In Proceedings of ECIR. 454--462.

Digital Library

[21]

Hisham Mohamed and Stéphane Marchand-Maillet. 2015. Quantized ranking for permutation-based indexing. Inf. Syst. 52 (2015), 163--175.

Digital Library

[22]

Scott E. Preece. 1973. Clustering as an output option. In Proceedings of the American Society for Information Science. 189--190.

[23]

Fiana Raiber and Oren Kurland. 2013. Ranking document clusters using markov random fields. In Proceedings of SIGIR. 333--342.

Digital Library

[24]

Fiana Raiber and Oren Kurland. 2013. The Technion at TREC 2013 Web Track: Cluster-based Document Retrieval. In Proceedings of TREC.

[25]

Tetsuya Sakai, Toshihiko Manabe, and Makoto Koyama. 2005. Flexible pseudorelevance feedback via selective sampling. ACM Transactions on Asian Language Information Processing (TALIP) 4, 2 (2005), 111--135.

Digital Library

[26]

Gerard Salton. 1968. Automatic Information Organization and Retrieval. McGraw-Hill, New York.

Digital Library

[27]

Ellen M. Voorhees. 1985. The cluster hypothesis revisited. In Proceedings of SIGIR. 188--196.

Digital Library

[28]

Oren Zamir and Oren Etzioni. 1998. Web document clustering: a feasibility demonstration. In Proceedings of SIGIR. 46--54.

Digital Library

[29]

Chengxiang Zhai and John D. Lafferty. 2001. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In Proceedings of SIGIR. 334--342.

Digital Library

Cited By

Joseph MRavana S(2024)Reliable Information Retrieval Systems Performance Evaluation: A ReviewIEEE Access10.1109/ACCESS.2024.337723912(51740-51751)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3377239

Index Terms

From Cluster Ranking to Document Ranking
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Ranking document clusters using markov random fields
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

An important challenge in cluster-based document retrieval is ranking document clusters by their relevance to the query. We present a novel cluster ranking approach that utilizes Markov Random Fields (MRFs). MRFs enable the integration of various types ...
Context-sensitive document ranking

Ranking is a main research issue in IR-styled keyword search over a set of documents. In this paper, we study a new keyword search problem, called context-sensitive document ranking, which is to rank documents with an additional context that provides ...
A study of the integration of passage-, document-, and cluster-based information for re-ranking search results
Abstract
Cluster-based and passage-based document retrieval paradigms were shown to be effective. While the former are based on utilizing query-related corpus context manifested in clusters of similar documents, the latter address the fact that a document ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

VATAT

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
238
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)6

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Joseph MRavana S(2024)Reliable Information Retrieval Systems Performance Evaluation: A ReviewIEEE Access10.1109/ACCESS.2024.337723912(51740-51751)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3377239

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten