ABSTRACT
Data fusion has been shown to be a simple and effective way to improve retrieval results. Most existing data fusion methods combine ranked lists from different retrieval functions for a single given query. But in many real search settings, the diversity of retrieval functions required to achieve good fusion performance is not available. Researchers are typically limited to a few variants on a scoring function used by the engine of their choice, with these variants often producing similar results due to being based on the same underlying term statistics.
This paper presents a framework for data fusion based on combining ranked lists from different queries that users could have entered for their information need. If we can identify a set of "possible queries" for an information need, and estimate probability distributions concerning the probability of generating those queries, the probability of retrieving certain documents for those queries, and the probability of documents being relevant to that information need, we have the potential to dramatically improve results over a baseline system given a single user query. Our framework is based on several component models that can be mixed and matched. We present several simple estimation methods for components. In order to demonstrate effectiveness, we present experimental results on 5 different datasets covering tasks such as ad-hoc search, novelty and diversity search, and search in the presence of implicit user feedback. Our results show strong performances for our method; it is competitive with state-of-the-art methods on the same datasets, and in some cases outperforms them.
- J. Aslam and M. Montague. Models for metasearch. In Proc. SIGIR, 2001. Google ScholarDigital Library
- A. Bah and B. Carterette. Aggregating results from multiple related queries to improve web search over sessions. In Proc. AIRS, 2014.Google ScholarCross Ref
- B. Bartell, G. Cottrell, and R. Belew. Automatic combination of multiple ranked retrieval systems. In Proc. SIGIR, 1995. Google ScholarDigital Library
- N. J. Belkin, P. Kantor, E. Fox, and J. A. Shaw. Combining the evidence of multiple query representations for information retrieval. IPM, 31(3), 1995. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The user of mmr, diversity-based reranking for reordering documents and producing summaries. In Proc. SIGIR, 1998. Google ScholarDigital Library
- B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topic retrieval. In Proc. CIKM, 2009. Google ScholarDigital Library
- C. L. A. Clarke, M. Kolla, G. V. Cormack,O. Vechtomova, A. Ashkan, and S. Buttcher. Novelty and diversity in information retrieval evaluation. In SIGIR, 2008. Google ScholarDigital Library
- K. Collins-Thompson, P. Bennett, and F. Diaz. TREC 2014 Web track overview. In TREC, 2014.Google Scholar
- K. Collins-Thompson, P. Bennett, F. Diaz, and C. L. A. Clarke. TREC 2013 Web track overview. In TREC, 2013.Google Scholar
- Z. Dou, S. Hu, Y. Luo, R. Song, and J. R. Wen. Finding dimensions for queries. In Proc. CIKM, 2011. Google ScholarDigital Library
- E. A. Fox and J. A. Shaw. Combination of multiple searches. NIST SP, pages 243{243, 1994.Google Scholar
- J. Garofolo, C. Auzanne, and E. Voorhees. The trec spoken document retrieval track: A success story. In TREC, 2000.Google Scholar
- D. Guan. Structured Query Formulation and Result Organization for Session Search. PhD thesis, Georgetown University, 2013.Google Scholar
- D. Guan, S. Zhang, and H. Yang. Utilizing query change for session search. In Proc. SIGIR, 2013. Google ScholarDigital Library
- J. Jiang, D. He, and S. Han. On duplicate results in a search session. In Proc. TREC, 2012.Google Scholar
- G. J. Jones, J. T. Foote, K. S. Jones, and S. J. Young. Retrieving spoken documents by combining multiple index sources. In Proc. SIGIR, 1996. Google ScholarDigital Library
- E. Kanoulas, B. Carterette, P. D. Clough, and M. Sanderson. Overview of the TREC 2013 Session track. In Proc. TREC, 2013.Google Scholar
- E. Kanoulas, B. Carterette, P. D. Clough, and M. Sanderson. Overview of the TREC 2014 Session track. In Proc. TREC, 2014.Google Scholar
- U. Kruschwitz. University of essex at the trec 2012 session track. In Proc. TREC, 2012.Google Scholar
- G. Lebanon and J. Lafferty. Cranking: Combining rankings using conditional probability models on permutations. In Proc. ICML, 2002. Google ScholarDigital Library
- J. Lee. Combining multiple evidence from different properties of weighting schemes. In Proc. SIGIR, 1995. Google ScholarDigital Library
- J. H. Lee. Analyses of multiple evidence combination. In Proc. SIGIR, 1997. Google ScholarDigital Library
- D. Lillis, F. Toolan, R. Collier, and J. Dunnion. Probfuse: a probabilistic approach to data fusion. In Proc. SIGIR, 2006. Google ScholarDigital Library
- Y. Liu, R. Song, M. Zhang, Z. Dou, T. Yamamoto, M. Kato, and K. Zhou. Overview of the NTCIR-14 iMine task. In Proc. NTCIR, 2014.Google Scholar
- M. Montague and J. Aslam. Condorcet fusion for improved retrieval. In Proc. CIKM, 2002. Google ScholarDigital Library
- K. B. Ng and P. B. Kantor. Predicting the effectiveness of naive data fusion on the basis of system characteristics. Journal of the American Society for Information Science, 51(13):1177{1189, 2000. Google ScholarDigital Library
- F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In Proc. SIGIR, 2006. Google ScholarDigital Library
- K. Raman, P. N. Bennett, and K. Collins-Thompson. Toward whole-session relevance: exploring intrinsic diversity in web search. In Proc. SIGIR, 2013. Google ScholarDigital Library
- S. E. Robertson. The probability ranking principle in ir. Journal of Documentation, 33(4):294{304, Dec. 1977.Google ScholarCross Ref
- R. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In Proc. WWW, 2010. Google ScholarDigital Library
- T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, 2005.Google Scholar
- C. C. Vogt and G. W. Cottrell. Predicting the performance of linearly combined ir systems. In Proc. SIGIR, 1998. Google ScholarDigital Library
- J. Wang and J. Zhu. Portfolio theory of information retrieval. In Proc. SIGIR, 2009. Google ScholarDigital Library
- S. Wu and S. McClean. Performance prediction of data fusion for information retrieval. Information processing & management, 42(4):899{915, 2006. Google ScholarDigital Library
- C. Zhai, W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Proc. SIGIR, 2003. Google ScholarDigital Library
- S. Zhang, D. Guan, and H. Yang. Query change as relevance feedback in session search. In Proc. SIGIR, 2013. Google ScholarDigital Library
Index Terms
- PDF: A Probabilistic Data Fusion Framework for Retrieval and Ranking
Recommendations
Model Fusion Experiments for the CLSR Task at CLEF 2007
Advances in Multilingual and Multimodal Information RetrievalThis paper presents the participation of the University of Ottawa group in the Cross-Language Speech Retrieval (CL-SR) task at CLEF 2007. We present the results of the submitted runs for the English collection. We have used two Information Retrieval ...
How good is a span of terms?: exploiting proximity to improve web retrieval
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalRanking search results is a fundamental problem in information retrieval. In this paper we explore whether the use of proximity and phrase information can improve web retrieval accuracy. We build on existing research by incorporating novel ranking ...
A machine learning approach for improved BM25 retrieval
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementDespite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine the effectiveness of BM25 on various document fields. We find that BM25 ...
Comments