research-article

Selective Cluster-Based Document Retrieval

Authors:

Ido GuyAuthors Info & Claims

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 1473 - 1482

https://doi.org/10.1145/2983323.2983737

Published: 24 October 2016 Publication History

Abstract

We address the long standing challenge of selective cluster-based retrieval; namely, deciding on a per-query basis whether to apply cluster-based document retrieval or standard document retrieval. To address this classification task, we propose a few sets of features based on those utilized by the cluster-based ranker, query-performance predictors, and properties of the clustering structure. Empirical evaluation shows that our method outperforms state-of-the-art retrieval approaches, including cluster-based, query expansion, and term proximity methods.

References

[1]

N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. UMASS at TREC 2004 -- novelty and hard. In Proc. of TREC-13, 2004.

[2]

N. Balasubramanian and J. Allan. Learning to select rankers. In Proc. of SIGIR, pages 855--856, 2010.

Digital Library

[3]

M. Bendersky, W. B. Croft, and Y. Diao. Quality-biased ranking of web documents. In Proc. of WSDM, pages 95--104, 2011.

Digital Library

[4]

D. Carmel and E. Yom-Tov. Estimating the Query Difficulty for Information Retrieval. Synthesis lectures on information concepts, retrieval, and services. Morgan & Claypool, 2010.

Digital Library

[5]

K. Collins-Thompson, P. N. Bennett, F. Diaz, C. Clarke, and E. M. Voorhees. TREC 2013 web track overview. In Proc. of TREC, 2013.

[6]

G. V. Cormack, C. L. A. Clarke, and S. Büttcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proc. of SIGIR, pages 758--759, 2009.

Digital Library

[7]

G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval Journal, 14(5):441--465, 2011.

Digital Library

[8]

W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.

[9]

W. B. Croft and R. Thompson. The use of adaptive mechanisms for selection of search strategies in document retrieval systems. In Proc. of SIGIR, pages 95--110, 1984.

Digital Library

[10]

W. B. Croft and R. H. Thompson. I 3 R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science and Technology, 38(6):389--404, 1984.

Digital Library

[11]

S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proc. of SIGIR, pages 299--306, 2002.

Digital Library

[12]

F. Diaz. Regularizing ad hoc retrieval scores. In Proc. of CIKM, pages 672--679, 2005.

Digital Library

[13]

B. Gaonkar, A. Sotiras, and C. Davatzikos. Deriving statistical significance maps for support vector regression using medical imaging data. In International Workshop on Pattern Recognition in Neuroimaging, PRNI, pages 13--16, 2013.

Digital Library

[14]

A. Griffiths, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS), 37(1):3--11, 1986.

[15]

M. A. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. SIGKDD Explorations, 11(1):10--18, 2009.

Digital Library

[16]

B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In Proc. of SPIRE, pages 43--54, 2004.

[17]

N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.

[18]

O. Kurland. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proc. of SIGIR, pages 171--178, 2008.

Digital Library

[19]

O. Kurland. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval, 12(4):437--460, August 2009.

Digital Library

[20]

O. Kurland and C. Domshlak. A rank-aggregation approach to searching for optimal query-specific clusters. In Proc. of SIGIR, pages 547--554, 2008.

Digital Library

[21]

O. Kurland and L. Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proc. of SIGIR, pages 306--313, 2005.

Digital Library

[22]

O. Kurland and L. Lee. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proc. of SIGIR, pages 83--90, 2006.

Digital Library

[23]

O. Kurland, F. Raiber, and A. Shtok. Query-performance prediction and cluster ranking: Two sides of the same coin. In Proc. of CIKM, pages 2459--2462, 2012.

Digital Library

[24]

J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR, pages 111--119, 2001.

Digital Library

[25]

V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of SIGIR, pages 120--127, 2001.

Digital Library

[26]

K.-S. Lee, W. B. Croft, and J. Allan. A cluster-based resampling method for pseudo-relevance feedback. In Proc. of SIGIR, pages 235--242, 2008.

Digital Library

[27]

K.-S. Lee, Y.-C. Park, and K.-S. Choi. Re-ranking model based on document clusters. Information Processing and Management, 37(1):1--14, 2001.

Digital Library

[28]

X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proc. of SIGIR, pages 186--193, 2004.

Digital Library

[29]

X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, University of Massachusetts, 2006.

[30]

X. Liu and W. B. Croft. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR, pages 454--462, 2008.

Digital Library

[31]

C. Macdonald, R. L. T. Santos, and I. Ounis. On the usefulness of query features for learning to rank. In Proc. of CIKM, pages 2559--2562, 2012.

Digital Library

[32]

L. Meister, O. Kurland, and I. G. Kalmanovich. Re-ranking search results using an additional retrieved list. Information Retrieval, 14(4):413--437, 2010.

Digital Library

[33]

D. Metzler and W. B. Croft. A Markov random field model for term dependencies. In Proc. of SIGIR, pages 472--479, 2005.

Digital Library

[34]

J. C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical report, Advances in Kernel Methods - Support Vector Learning, 1998.

[35]

F. Raiber and O. Kurland. Ranking document clusters using markov random fields. In Proc. of SIGIR, pages 333--342, 2013.

Digital Library

[36]

F. Raiber and O. Kurland. Query-performance prediction: setting the expectations straight. In Proc. of SIGIR, pages 13--22, 2014.

Digital Library

[37]

A. Shtok, O. Kurland, and D. Carmel. Predicting query performance by query-drift estimation. In Proc. of ICTIR, pages 305--312, 2009.

Digital Library

[38]

A. Tombros, R. Villa, and C. van Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management, 38(4):559--582, 2002.

Digital Library

[39]

V. Vinay, I. J. Cox, N. Milic-Frayling, and K. R. Wood. On ranking the effectiveness of searches. In Proc. of SIGIR, pages 398--404, 2006.

Digital Library

[40]

E. M. Voorhees. The cluster hypothesis revisited. In Proc. of SIGIR, pages 188--196, 1985.

Digital Library

[41]

P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.

[42]

P. Willett. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management, 24(5):577--97, 1988.

Digital Library

[43]

L. Yang, D. Ji, G. Zhou, Y. Nie, and G. Xiao. Document re-ranking using cluster validation and label propagation. In Proc. of CIKM, pages 690--697, 2006.

Digital Library

[44]

C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. of SIGIR, pages 334--342, 2001.

Digital Library

[45]

Y. Zhao, F. Scholer, and Y. Tsegay. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proc. of ECIR, pages 52--64, 2008.

Digital Library

Cited By

Markovskiy ERaiber FSabach SKurland OAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531819
Djenouri YBelhadi ADjenouri DLin J(2020)Cluster-based information retrieval using pattern miningApplied Intelligence10.1007/s10489-020-01922-xOnline publication date: 17-Oct-2020
https://doi.org/10.1007/s10489-020-01922-x
Kanavos AKotoula PMakris CIliadis L(2019)Employing query disambiguation using clustering techniquesEvolving Systems10.1007/s12530-019-09292-711:2(305-315)Online publication date: 11-Jul-2019
https://doi.org/10.1007/s12530-019-09292-7
Show More Cited By

Index Terms

Selective Cluster-Based Document Retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

From Cluster Ranking to Document Ranking
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

The common approach of using clusters of similar documents for ad hoc document retrieval is to rank the clusters in response to the query; then, the cluster ranking is transformed to document ranking. We present a novel supervised approach to transform ...
Cluster-Based Document Retrieval with Multiple Queries
ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

The merits of using multiple queries representing the same information need to improve retrieval effectiveness have recently been demonstrated in several studies. In this paper we present the first study of utilizing multiple queries in cluster-based ...
Selective Cluster Presentation on the Search Results Page

Web search engines present, for some queries, a cluster of results from the same specialized domain (“vertical”) on the search results page (SERP). We introduce a comprehensive analysis of the presentation of such clusters from seven different verticals ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

October 2016

2566 pages

ISBN:9781450340731

DOI:10.1145/2983323

General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM'16

Sponsor:

CIKM'16: ACM Conference on Information and Knowledge Management

October 24 - 28, 2016

Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
249
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Markovskiy ERaiber FSabach SKurland OAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531819
Djenouri YBelhadi ADjenouri DLin J(2020)Cluster-based information retrieval using pattern miningApplied Intelligence10.1007/s10489-020-01922-xOnline publication date: 17-Oct-2020
https://doi.org/10.1007/s10489-020-01922-x
Kanavos AKotoula PMakris CIliadis L(2019)Employing query disambiguation using clustering techniquesEvolving Systems10.1007/s12530-019-09292-711:2(305-315)Online publication date: 11-Jul-2019
https://doi.org/10.1007/s12530-019-09292-7
Kurland OCulpepper JCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Fusion in Information RetrievalThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210186(1383-1386)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210186
Levi OGuy IRaiber FKurland O(2018)Selective Cluster Presentation on the Search Results PageACM Transactions on Information Systems10.1145/315867236:3(1-42)Online publication date: 28-Feb-2018
https://dl.acm.org/doi/10.1145/3158672
Kotoula PMakris C(2018)Query Disambiguation Based on Clustering TechniquesArtificial Intelligence Applications and Innovations10.1007/978-3-319-92016-0_13(133-145)Online publication date: 22-May-2018
https://doi.org/10.1007/978-3-319-92016-0_13

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten