skip to main content
10.1145/3409256.3409825acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Cluster-Based Document Retrieval with Multiple Queries

Published: 14 September 2020 Publication History

Abstract

The merits of using multiple queries representing the same information need to improve retrieval effectiveness have recently been demonstrated in several studies. In this paper we present the first study of utilizing multiple queries in cluster-based document retrieval; that is, using information induced from clusters of similar documents to rank documents. Specifically, we propose a conceptual framework of retrieval templates that can adapt cluster-based document retrieval methods, originally devised for a single query, to leverage multiple queries. The adaptations operate at the query, document list and similarity-estimate levels. Retrieval methods are instantiated from the templates by selecting, for example, the clustering algorithm and the cluster-based retrieval method. Empirical evaluation attests to the merits of the retrieval templates with respect to very strong baselines: state-of-the-art cluster-based retrieval with a single query and highly effective fusion of document lists retrieved for multiple queries. In addition, we present findings about the impact of the effectiveness of queries used to represent an information need on (i) cluster hypothesis test results, (ii) percentage of relevant documents in clusters of similar documents, and (iii) effectiveness of state-of-the-art cluster-based retrieval methods.

References

[1]
Y. Anava, A. Shtok, O. Kurland, and E. Rabinovich. 2016. A Probabilistic Fusion Framework. In Proc. of CIKM. 1463--1472.
[2]
P. Bailey, A. Moffat, F. Scholer, and P. Thomas. 2016. UQV100: A Test Collection with Query Variability. In Proc. of SIGIR. 725--728.
[3]
P. Bailey, A. Moffat, F. Scholer, and P. Thomas. 2017. Retrieval Consistency in the Presence of Query Variations. In Proc. of SIGIR. 395--404.
[4]
N. J. Belkin, C. Cool, W. B. Croft, and J. P. Callan. 1993. The effect of multiple query representations on information retrieval system performance. In Proc. of SIGIR. 339--346.
[5]
N. J. Belkin, P. B. Kantor, E. A. Fox, and J. A. Shaw. 1995. Combining evidence of multiple query representation for information retrieval. Information Processing and Management, Vol. 31, 3 (1995), 431--448.
[6]
R. Benham and J. S. Culpepper. 2017. Risk-Reward Trade-offs in Rank Fusion. In Proc. of ADCS. 1--8.
[7]
R. Benham, J. Mackenzie, A. Moffat, and J. S. Culpepper. 2019. Boosting Search Performance Using Query Variations. ACM Trans. Inf. Syst., Vol. 37, 4 (2019), 41:1--41:25.
[8]
M. Catena and N. Tonellotto. 2019. Multiple Query Processing via Logic Function Factoring. In Proc. of SIGIR. 937--940.
[9]
K. Collins-Thompson, P. N. Bennett, F. Diaz, C. Clarke, and E. M. Voorhees. 2013. TREC 2013 Web Track Overview. In Proc. of TREC.
[10]
G. V. Cormack, C. L. A. Clarke, and S. Bü ttcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proc. of SIGIR. 758--759.
[11]
W. B. Croft. 1980. A model of cluster searching based on classification. Information Systems, Vol. 5 (1980), 189--195.
[12]
D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. 1992. Scatter/Gather: A cluster-based approach to browsing large document collections. In Proc. of SIGIR. 318--329.
[13]
A. El-Hamdouchi and P. Willett. 1987. Techniques for the measurement of clustering tendency in document retrieval systems. Journal of Information Science, Vol. 13 (1987), 361--365.
[14]
A. El-Hamdouchi and P. Willett. 1989. Comparison of hierarchic agglomerative clustering methods for document retrieval. The Computer journal, Vol. 32, 3 (1989), 220--227.
[15]
N. Fuhr, M. Lechtenfeld, B. Stein, and T. Gollub. 2012. The optimum clustering framework: implementing the cluster hypothesis. Information Retrieval Journal, Vol. 15, 2 (2012), 93--115.
[16]
M. Gupta and M. Bendersky. 2015. Information Retrieval with Verbose Queries. Foundations and Trends in Information Retrieval, Vol. 9, 3--4 (2015), 91--208.
[17]
N. Jardine and C. Joost van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, Vol. 7, 5 (1971), 217--240.
[18]
A. Khudyak Kozorovitzky and O. Kurland. 2011. Cluster-based fusion of retrieved lists. In Proc. of SIGIR. 893--902.
[19]
O. Kurland. 2009. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval, Vol. 12, 4 (August 2009), 437--460.
[20]
O. Kurland and C. Domshlak. 2008. A rank-aggregation approach to searching for optimal query-specific clusters. In Proc. of SIGIR. 547--554.
[21]
O. Kurland and E. Krikon. 2011. The Opposite of Smoothing: A Language Model Approach to Ranking Query-Specific Document Clusters. Journal of Artificial Intelligence Research (JAIR), Vol. 41 (2011), 367--395.
[22]
O. Kurland and L. Lee. 2004. Corpus structure, language models, and ad hoc information retrieval. In Proc. of SIGIR. 194--201.
[23]
O. Kurland and L. Lee. 2006. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proc. of SIGIR. 83--90.
[24]
J. D. Lafferty and C. Zhai. 2001. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR. 111--119.
[25]
S. Liang, Z. Ren, and M. de Rijke. 2014. Fusion helps diversification. In Proc. of SIGIR. 303--312.
[26]
B. Liu, N. Craswell, X. Lu, O. Kurland, and J. S. Culpepper. 2019. A Comparative Analysis of Human and Automatic Query Variants. In Proc. of ICTIR. 47--50.
[27]
X. Liu and W. B. Croft. 2004. Cluster-Based Retrieval Using Language Models. In Proc. of SIGIR. 186--193.
[28]
X. Liu and W. B. Croft. 2006 a. Experiments on retrieval of optimal clusters. Technical Report IR-478. University of Massachusetts.
[29]
X. Liu and W. B. Croft. 2006 b. Representing clusters for retrieval. In Proc. of SIGIR. 671--672.
[30]
X. Liu and W. B. Croft. 2008. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR. 454--462.
[31]
X. Lu, O. Kurland, J. S. Culpepper, N. Craswell, and O. Rom. 2019. Relevance Modeling with Multiple Query Variations. Proc. of ICTIR. 27--34.
[32]
S.-H. Na, I.-S. Kang, and J.-H. Lee. 2008. Structural re-ranking with cluster-based retrieval. In Proc. of ECIR. 658--662.
[33]
S.-H. Na, I.-S. Kang, J.-E. Roh, and J.-H. Lee. 2007. An empirical study of query expansion and cluster-based retrieval in language modeling approach. Information Processing and Management, Vol. 43, 2 (2007), 302--314.
[34]
J. Pickens, G. Golovchinsky, C. Shah, P. Qvarfordt, and M. Back. 2008. Algorithmic mediation for collaborative exploratory search. In Proc. of SIGIR. 315--322.
[35]
F. Raiber and O. Kurland. 2013. Ranking document clusters using markov random fields. In Proc. of SIGIR. 333--342.
[36]
F. Raiber and O. Kurland. 2014. The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval. In Proc. of SIGIR. 1155--1158.
[37]
J. Seo and W. B. Croft. 2010. Geometric representations for multiple documents. Proc. of SIGIR. 251--258.
[38]
D. Sheldon, M. Shokouhi, M. Szummer, and N. Craswell. 2011. LambdaMerge: merging the results of query reformulations. In Proc. of WSDM. 795--804.
[39]
A. Singhal and F. Pereira. 1999. Document expansion for speech retrieval. In Proc. of SIGIR. 34--41.
[40]
M. D. Smucker and J. Allan. 2009. A New Measure of the Cluster Hypothesis. In Proc. of ICTIR. 281--288.
[41]
F. Song and W. B. Croft. 1999. A general language model for information retrieval. In Proc. of SIGIR. 279--280.
[42]
P. Thomas, F. Scholer, P. Bailey, and A. Moffat. 2017. Tasks, Queries, and Rankers in Pre-Retrieval Performance Prediction. In Proc. of ADCS. 11:1--11:4.
[43]
A. Tombros, R. Villa, and C.J. van Rijsbergen. 2002. The Effectiveness of Query-Specific Hierarchic Clustering in Information Retrieval. Information Processing and Management, Vol. 38, 4 (2002), 559--582.
[44]
E. M. Voorhees. 1985. The cluster hypothesis revisited. In Proc. of SIGIR. 188--196.
[45]
L. Yang, D. Ji, G. Zhou, Y. Nie, and G. Xiao. 2006. Document re-ranking using cluster validation and label propagation. In Proc. of CIKM. 690--697.
[46]
O. Zamir and O. Etzioni. 1999. Grouper: A Dynamic Clustering Interface to Web Search Results. Computer Networks, Vol. 31, 11--16 (1999), 1361--1374.
[47]
O. Zendel, A. Shtok, F. Raiber, O. Kurland, and J. S. Culpepper. 2019. Information Needs, Queries, and Query Performance Prediction. In Proc. of SIGIR. 395--404.
[48]
C. Zhai and J. D. Lafferty. 2001. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In Proc. of SIGIR. 334--342.

Cited By

View all
  • (2020)Multilinguistic approach towards Information Retrieval System for Big Data2020 3rd International Conference on Intelligent Sustainable Systems (ICISS)10.1109/ICISS49785.2020.9315969(159-164)Online publication date: 3-Dec-2020

Index Terms

  1. Cluster-Based Document Retrieval with Multiple Queries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval
    September 2020
    207 pages
    ISBN:9781450380676
    DOI:10.1145/3409256
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 September 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cluster hypothesis
    2. cluster-based document retrieval
    3. document retrieval

    Qualifiers

    • Research-article

    Conference

    ICTIR '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 235 of 527 submissions, 45%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Multilinguistic approach towards Information Retrieval System for Big Data2020 3rd International Conference on Intelligent Sustainable Systems (ICISS)10.1109/ICISS49785.2020.9315969(159-164)Online publication date: 3-Dec-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media