ABSTRACT
We address the long standing challenge of selective cluster-based retrieval; namely, deciding on a per-query basis whether to apply cluster-based document retrieval or standard document retrieval. To address this classification task, we propose a few sets of features based on those utilized by the cluster-based ranker, query-performance predictors, and properties of the clustering structure. Empirical evaluation shows that our method outperforms state-of-the-art retrieval approaches, including cluster-based, query expansion, and term proximity methods.
- N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. UMASS at TREC 2004 -- novelty and hard. In Proc. of TREC-13, 2004.Google ScholarCross Ref
- N. Balasubramanian and J. Allan. Learning to select rankers. In Proc. of SIGIR, pages 855--856, 2010. Google ScholarDigital Library
- M. Bendersky, W. B. Croft, and Y. Diao. Quality-biased ranking of web documents. In Proc. of WSDM, pages 95--104, 2011. Google ScholarDigital Library
- D. Carmel and E. Yom-Tov. Estimating the Query Difficulty for Information Retrieval. Synthesis lectures on information concepts, retrieval, and services. Morgan & Claypool, 2010. Google ScholarDigital Library
- K. Collins-Thompson, P. N. Bennett, F. Diaz, C. Clarke, and E. M. Voorhees. TREC 2013 web track overview. In Proc. of TREC, 2013.Google Scholar
- G. V. Cormack, C. L. A. Clarke, and S. Büttcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proc. of SIGIR, pages 758--759, 2009. Google ScholarDigital Library
- G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval Journal, 14(5):441--465, 2011. Google ScholarDigital Library
- W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.Google ScholarCross Ref
- W. B. Croft and R. Thompson. The use of adaptive mechanisms for selection of search strategies in document retrieval systems. In Proc. of SIGIR, pages 95--110, 1984. Google ScholarDigital Library
- W. B. Croft and R. H. Thompson. I 3 R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science and Technology, 38(6):389--404, 1984. Google ScholarDigital Library
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proc. of SIGIR, pages 299--306, 2002. Google ScholarDigital Library
- F. Diaz. Regularizing ad hoc retrieval scores. In Proc. of CIKM, pages 672--679, 2005. Google ScholarDigital Library
- B. Gaonkar, A. Sotiras, and C. Davatzikos. Deriving statistical significance maps for support vector regression using medical imaging data. In International Workshop on Pattern Recognition in Neuroimaging, PRNI, pages 13--16, 2013. Google ScholarDigital Library
- A. Griffiths, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS), 37(1):3--11, 1986.Google Scholar
- M. A. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. SIGKDD Explorations, 11(1):10--18, 2009. Google ScholarDigital Library
- B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In Proc. of SPIRE, pages 43--54, 2004.Google ScholarCross Ref
- N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.Google ScholarCross Ref
- O. Kurland. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proc. of SIGIR, pages 171--178, 2008. Google ScholarDigital Library
- O. Kurland. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval, 12(4):437--460, August 2009. Google ScholarDigital Library
- O. Kurland and C. Domshlak. A rank-aggregation approach to searching for optimal query-specific clusters. In Proc. of SIGIR, pages 547--554, 2008. Google ScholarDigital Library
- O. Kurland and L. Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proc. of SIGIR, pages 306--313, 2005. Google ScholarDigital Library
- O. Kurland and L. Lee. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proc. of SIGIR, pages 83--90, 2006. Google ScholarDigital Library
- O. Kurland, F. Raiber, and A. Shtok. Query-performance prediction and cluster ranking: Two sides of the same coin. In Proc. of CIKM, pages 2459--2462, 2012. Google ScholarDigital Library
- J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR, pages 111--119, 2001. Google ScholarDigital Library
- V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of SIGIR, pages 120--127, 2001. Google ScholarDigital Library
- K.-S. Lee, W. B. Croft, and J. Allan. A cluster-based resampling method for pseudo-relevance feedback. In Proc. of SIGIR, pages 235--242, 2008. Google ScholarDigital Library
- K.-S. Lee, Y.-C. Park, and K.-S. Choi. Re-ranking model based on document clusters. Information Processing and Management, 37(1):1--14, 2001. Google ScholarDigital Library
- X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proc. of SIGIR, pages 186--193, 2004. Google ScholarDigital Library
- X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, University of Massachusetts, 2006.Google Scholar
- X. Liu and W. B. Croft. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR, pages 454--462, 2008. Google ScholarDigital Library
- C. Macdonald, R. L. T. Santos, and I. Ounis. On the usefulness of query features for learning to rank. In Proc. of CIKM, pages 2559--2562, 2012. Google ScholarDigital Library
- L. Meister, O. Kurland, and I. G. Kalmanovich. Re-ranking search results using an additional retrieved list. Information Retrieval, 14(4):413--437, 2010. Google ScholarDigital Library
- D. Metzler and W. B. Croft. A Markov random field model for term dependencies. In Proc. of SIGIR, pages 472--479, 2005. Google ScholarDigital Library
- J. C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical report, Advances in Kernel Methods - Support Vector Learning, 1998.Google Scholar
- F. Raiber and O. Kurland. Ranking document clusters using markov random fields. In Proc. of SIGIR, pages 333--342, 2013. Google ScholarDigital Library
- F. Raiber and O. Kurland. Query-performance prediction: setting the expectations straight. In Proc. of SIGIR, pages 13--22, 2014. Google ScholarDigital Library
- A. Shtok, O. Kurland, and D. Carmel. Predicting query performance by query-drift estimation. In Proc. of ICTIR, pages 305--312, 2009. Google ScholarDigital Library
- A. Tombros, R. Villa, and C. van Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management, 38(4):559--582, 2002. Google ScholarDigital Library
- V. Vinay, I. J. Cox, N. Milic-Frayling, and K. R. Wood. On ranking the effectiveness of searches. In Proc. of SIGIR, pages 398--404, 2006. Google ScholarDigital Library
- E. M. Voorhees. The cluster hypothesis revisited. In Proc. of SIGIR, pages 188--196, 1985. Google ScholarDigital Library
- P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.Google Scholar
- P. Willett. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management, 24(5):577--97, 1988. Google ScholarDigital Library
- L. Yang, D. Ji, G. Zhou, Y. Nie, and G. Xiao. Document re-ranking using cluster validation and label propagation. In Proc. of CIKM, pages 690--697, 2006. Google ScholarDigital Library
- C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. of SIGIR, pages 334--342, 2001. Google ScholarDigital Library
- Y. Zhao, F. Scholer, and Y. Tsegay. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proc. of ECIR, pages 52--64, 2008. Google ScholarDigital Library
Index Terms
- Selective Cluster-Based Document Retrieval
Recommendations
From Cluster Ranking to Document Ranking
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalThe common approach of using clusters of similar documents for ad hoc document retrieval is to rank the clusters in response to the query; then, the cluster ranking is transformed to document ranking. We present a novel supervised approach to transform ...
Cluster-Based Document Retrieval with Multiple Queries
ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information RetrievalThe merits of using multiple queries representing the same information need to improve retrieval effectiveness have recently been demonstrated in several studies. In this paper we present the first study of utilizing multiple queries in cluster-based ...
Selective Cluster Presentation on the Search Results Page
Web search engines present, for some queries, a cluster of results from the same specialized domain (“vertical”) on the search results page (SERP). We introduce a comprehensive analysis of the presentation of such clusters from seven different verticals ...
Comments