research-article

Cluster-Based Document Retrieval with Multiple Queries

Authors:

Kfir Bernstein,

J. Shane CulpepperAuthors Info & Claims

ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

Pages 33 - 40

https://doi.org/10.1145/3409256.3409825

Published: 14 September 2020 Publication History

Abstract

The merits of using multiple queries representing the same information need to improve retrieval effectiveness have recently been demonstrated in several studies. In this paper we present the first study of utilizing multiple queries in cluster-based document retrieval; that is, using information induced from clusters of similar documents to rank documents. Specifically, we propose a conceptual framework of retrieval templates that can adapt cluster-based document retrieval methods, originally devised for a single query, to leverage multiple queries. The adaptations operate at the query, document list and similarity-estimate levels. Retrieval methods are instantiated from the templates by selecting, for example, the clustering algorithm and the cluster-based retrieval method. Empirical evaluation attests to the merits of the retrieval templates with respect to very strong baselines: state-of-the-art cluster-based retrieval with a single query and highly effective fusion of document lists retrieved for multiple queries. In addition, we present findings about the impact of the effectiveness of queries used to represent an information need on (i) cluster hypothesis test results, (ii) percentage of relevant documents in clusters of similar documents, and (iii) effectiveness of state-of-the-art cluster-based retrieval methods.

References

[1]

Y. Anava, A. Shtok, O. Kurland, and E. Rabinovich. 2016. A Probabilistic Fusion Framework. In Proc. of CIKM. 1463--1472.

[2]

P. Bailey, A. Moffat, F. Scholer, and P. Thomas. 2016. UQV100: A Test Collection with Query Variability. In Proc. of SIGIR. 725--728.

[3]

P. Bailey, A. Moffat, F. Scholer, and P. Thomas. 2017. Retrieval Consistency in the Presence of Query Variations. In Proc. of SIGIR. 395--404.

[4]

N. J. Belkin, C. Cool, W. B. Croft, and J. P. Callan. 1993. The effect of multiple query representations on information retrieval system performance. In Proc. of SIGIR. 339--346.

[5]

N. J. Belkin, P. B. Kantor, E. A. Fox, and J. A. Shaw. 1995. Combining evidence of multiple query representation for information retrieval. Information Processing and Management, Vol. 31, 3 (1995), 431--448.

Digital Library

[6]

R. Benham and J. S. Culpepper. 2017. Risk-Reward Trade-offs in Rank Fusion. In Proc. of ADCS. 1--8.

[7]

R. Benham, J. Mackenzie, A. Moffat, and J. S. Culpepper. 2019. Boosting Search Performance Using Query Variations. ACM Trans. Inf. Syst., Vol. 37, 4 (2019), 41:1--41:25.

Digital Library

[8]

M. Catena and N. Tonellotto. 2019. Multiple Query Processing via Logic Function Factoring. In Proc. of SIGIR. 937--940.

[9]

K. Collins-Thompson, P. N. Bennett, F. Diaz, C. Clarke, and E. M. Voorhees. 2013. TREC 2013 Web Track Overview. In Proc. of TREC.

[10]

G. V. Cormack, C. L. A. Clarke, and S. Bü ttcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proc. of SIGIR. 758--759.

[11]

W. B. Croft. 1980. A model of cluster searching based on classification. Information Systems, Vol. 5 (1980), 189--195.

[12]

D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. 1992. Scatter/Gather: A cluster-based approach to browsing large document collections. In Proc. of SIGIR. 318--329.

[13]

A. El-Hamdouchi and P. Willett. 1987. Techniques for the measurement of clustering tendency in document retrieval systems. Journal of Information Science, Vol. 13 (1987), 361--365.

Digital Library

[14]

A. El-Hamdouchi and P. Willett. 1989. Comparison of hierarchic agglomerative clustering methods for document retrieval. The Computer journal, Vol. 32, 3 (1989), 220--227.

Digital Library

[15]

N. Fuhr, M. Lechtenfeld, B. Stein, and T. Gollub. 2012. The optimum clustering framework: implementing the cluster hypothesis. Information Retrieval Journal, Vol. 15, 2 (2012), 93--115.

Digital Library

[16]

M. Gupta and M. Bendersky. 2015. Information Retrieval with Verbose Queries. Foundations and Trends in Information Retrieval, Vol. 9, 3--4 (2015), 91--208.

[17]

N. Jardine and C. Joost van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, Vol. 7, 5 (1971), 217--240.

[18]

A. Khudyak Kozorovitzky and O. Kurland. 2011. Cluster-based fusion of retrieved lists. In Proc. of SIGIR. 893--902.

[19]

O. Kurland. 2009. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval, Vol. 12, 4 (August 2009), 437--460.

Digital Library

[20]

O. Kurland and C. Domshlak. 2008. A rank-aggregation approach to searching for optimal query-specific clusters. In Proc. of SIGIR. 547--554.

[21]

O. Kurland and E. Krikon. 2011. The Opposite of Smoothing: A Language Model Approach to Ranking Query-Specific Document Clusters. Journal of Artificial Intelligence Research (JAIR), Vol. 41 (2011), 367--395.

Digital Library

[22]

O. Kurland and L. Lee. 2004. Corpus structure, language models, and ad hoc information retrieval. In Proc. of SIGIR. 194--201.

[23]

O. Kurland and L. Lee. 2006. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proc. of SIGIR. 83--90.

[24]

J. D. Lafferty and C. Zhai. 2001. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR. 111--119.

[25]

S. Liang, Z. Ren, and M. de Rijke. 2014. Fusion helps diversification. In Proc. of SIGIR. 303--312.

Digital Library

[26]

B. Liu, N. Craswell, X. Lu, O. Kurland, and J. S. Culpepper. 2019. A Comparative Analysis of Human and Automatic Query Variants. In Proc. of ICTIR. 47--50.

[27]

X. Liu and W. B. Croft. 2004. Cluster-Based Retrieval Using Language Models. In Proc. of SIGIR. 186--193.

[28]

X. Liu and W. B. Croft. 2006 a. Experiments on retrieval of optimal clusters. Technical Report IR-478. University of Massachusetts.

[29]

X. Liu and W. B. Croft. 2006 b. Representing clusters for retrieval. In Proc. of SIGIR. 671--672.

[30]

X. Liu and W. B. Croft. 2008. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR. 454--462.

[31]

X. Lu, O. Kurland, J. S. Culpepper, N. Craswell, and O. Rom. 2019. Relevance Modeling with Multiple Query Variations. Proc. of ICTIR. 27--34.

[32]

S.-H. Na, I.-S. Kang, and J.-H. Lee. 2008. Structural re-ranking with cluster-based retrieval. In Proc. of ECIR. 658--662.

[33]

S.-H. Na, I.-S. Kang, J.-E. Roh, and J.-H. Lee. 2007. An empirical study of query expansion and cluster-based retrieval in language modeling approach. Information Processing and Management, Vol. 43, 2 (2007), 302--314.

Digital Library

[34]

J. Pickens, G. Golovchinsky, C. Shah, P. Qvarfordt, and M. Back. 2008. Algorithmic mediation for collaborative exploratory search. In Proc. of SIGIR. 315--322.

[35]

F. Raiber and O. Kurland. 2013. Ranking document clusters using markov random fields. In Proc. of SIGIR. 333--342.

[36]

F. Raiber and O. Kurland. 2014. The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval. In Proc. of SIGIR. 1155--1158.

[37]

J. Seo and W. B. Croft. 2010. Geometric representations for multiple documents. Proc. of SIGIR. 251--258.

[38]

D. Sheldon, M. Shokouhi, M. Szummer, and N. Craswell. 2011. LambdaMerge: merging the results of query reformulations. In Proc. of WSDM. 795--804.

[39]

A. Singhal and F. Pereira. 1999. Document expansion for speech retrieval. In Proc. of SIGIR. 34--41.

[40]

M. D. Smucker and J. Allan. 2009. A New Measure of the Cluster Hypothesis. In Proc. of ICTIR. 281--288.

[41]

F. Song and W. B. Croft. 1999. A general language model for information retrieval. In Proc. of SIGIR. 279--280.

[42]

P. Thomas, F. Scholer, P. Bailey, and A. Moffat. 2017. Tasks, Queries, and Rankers in Pre-Retrieval Performance Prediction. In Proc. of ADCS. 11:1--11:4.

[43]

A. Tombros, R. Villa, and C.J. van Rijsbergen. 2002. The Effectiveness of Query-Specific Hierarchic Clustering in Information Retrieval. Information Processing and Management, Vol. 38, 4 (2002), 559--582.

Digital Library

[44]

E. M. Voorhees. 1985. The cluster hypothesis revisited. In Proc. of SIGIR. 188--196.

Digital Library

[45]

L. Yang, D. Ji, G. Zhou, Y. Nie, and G. Xiao. 2006. Document re-ranking using cluster validation and label propagation. In Proc. of CIKM. 690--697.

[46]

O. Zamir and O. Etzioni. 1999. Grouper: A Dynamic Clustering Interface to Web Search Results. Computer Networks, Vol. 31, 11--16 (1999), 1361--1374.

Digital Library

[47]

O. Zendel, A. Shtok, F. Raiber, O. Kurland, and J. S. Culpepper. 2019. Information Needs, Queries, and Query Performance Prediction. In Proc. of SIGIR. 395--404.

[48]

C. Zhai and J. D. Lafferty. 2001. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In Proc. of SIGIR. 334--342.

Cited By

Xain AGoyal ASingh BSharma S(2020)Multilinguistic approach towards Information Retrieval System for Big Data2020 3rd International Conference on Intelligent Sustainable Systems (ICISS)10.1109/ICISS49785.2020.9315969(159-164)Online publication date: 3-Dec-2020
https://doi.org/10.1109/ICISS49785.2020.9315969

Index Terms

Cluster-Based Document Retrieval with Multiple Queries
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Non-relevance Feedback for Document Retrieval
KAM '09: Proceedings of the 2009 Second International Symposium on Knowledge Acquisition and Modeling - Volume 02

We need to find documents that relate to human interesting from a large data set of documents. The relevance feedback method needs a set of relevant and non-relevant documents to work usefully. However, the initial retrieved documents, which are ...
A Document Retrieval Strategy Based on Non-Relevance Feedback
FITME '09: Proceedings of the 2009 Second International Conference on Future Information Technology and Management Engineering

From a large data set of documents, we need to find documents that relate to human interesting. The relevance feedback method needs a set of relevant and non-relevant documents to work usefully. However, the initial retrieved documents, which are ...
An approach for document retrieval using cluster-based inverted indexing

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

September 2020

207 pages

ISBN:9781450380676

DOI:10.1145/3409256

General Chairs:
Krisztian Balog
University of Stavanger, Norway
,
Vinay Setty
University of Stavanger, Norway
,
Program Chairs:
Christina Lioma
University of Copenhagen, Denmark
,
Yiqun Liu
Tsinghua University, China
,
Min Zhang
Tsinghua University, China
,
Klaus Berberich
HTW Saar & MPI for Informatics, Germany

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICTIR '20

Sponsor:

SIGIR

ICTIR '20: The 2020 ACM SIGIR International Conference on the Theory of Information Retrieval

September 14 - 17, 2020

Virtual Event, Norway

Acceptance Rates

Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
155
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xain AGoyal ASingh BSharma S(2020)Multilinguistic approach towards Information Retrieval System for Big Data2020 3rd International Conference on Intelligent Sustainable Systems (ICISS)10.1109/ICISS49785.2020.9315969(159-164)Online publication date: 3-Dec-2020
https://doi.org/10.1109/ICISS49785.2020.9315969

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten