research-article

Public Access

PDF: A Probabilistic Data Fusion Framework for Retrieval and Ranking

Authors:
Ashraf Bah Rabiou

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Ben Carterette

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information RetrievalSeptember 2016Pages 31–39https://doi.org/10.1145/2970398.2970419

Published:12 September 2016Publication History

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

Pages 31–39

ABSTRACT

Data fusion has been shown to be a simple and effective way to improve retrieval results. Most existing data fusion methods combine ranked lists from different retrieval functions for a single given query. But in many real search settings, the diversity of retrieval functions required to achieve good fusion performance is not available. Researchers are typically limited to a few variants on a scoring function used by the engine of their choice, with these variants often producing similar results due to being based on the same underlying term statistics.

This paper presents a framework for data fusion based on combining ranked lists from different queries that users could have entered for their information need. If we can identify a set of "possible queries" for an information need, and estimate probability distributions concerning the probability of generating those queries, the probability of retrieving certain documents for those queries, and the probability of documents being relevant to that information need, we have the potential to dramatically improve results over a baseline system given a single user query. Our framework is based on several component models that can be mixed and matched. We present several simple estimation methods for components. In order to demonstrate effectiveness, we present experimental results on 5 different datasets covering tasks such as ad-hoc search, novelty and diversity search, and search in the presence of implicit user feedback. Our results show strong performances for our method; it is competitive with state-of-the-art methods on the same datasets, and in some cases outperforms them.

References

J. Aslam and M. Montague. Models for metasearch. In Proc. SIGIR, 2001. Google ScholarDigital Library
A. Bah and B. Carterette. Aggregating results from multiple related queries to improve web search over sessions. In Proc. AIRS, 2014.Google ScholarCross Ref
B. Bartell, G. Cottrell, and R. Belew. Automatic combination of multiple ranked retrieval systems. In Proc. SIGIR, 1995. Google ScholarDigital Library
N. J. Belkin, P. Kantor, E. Fox, and J. A. Shaw. Combining the evidence of multiple query representations for information retrieval. IPM, 31(3), 1995. Google ScholarDigital Library
J. Carbonell and J. Goldstein. The user of mmr, diversity-based reranking for reordering documents and producing summaries. In Proc. SIGIR, 1998. Google ScholarDigital Library
B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topic retrieval. In Proc. CIKM, 2009. Google ScholarDigital Library
C. L. A. Clarke, M. Kolla, G. V. Cormack,O. Vechtomova, A. Ashkan, and S. Buttcher. Novelty and diversity in information retrieval evaluation. In SIGIR, 2008. Google ScholarDigital Library
K. Collins-Thompson, P. Bennett, and F. Diaz. TREC 2014 Web track overview. In TREC, 2014.Google Scholar
K. Collins-Thompson, P. Bennett, F. Diaz, and C. L. A. Clarke. TREC 2013 Web track overview. In TREC, 2013.Google Scholar
Z. Dou, S. Hu, Y. Luo, R. Song, and J. R. Wen. Finding dimensions for queries. In Proc. CIKM, 2011. Google ScholarDigital Library
E. A. Fox and J. A. Shaw. Combination of multiple searches. NIST SP, pages 243{243, 1994.Google Scholar
J. Garofolo, C. Auzanne, and E. Voorhees. The trec spoken document retrieval track: A success story. In TREC, 2000.Google Scholar
D. Guan. Structured Query Formulation and Result Organization for Session Search. PhD thesis, Georgetown University, 2013.Google Scholar
D. Guan, S. Zhang, and H. Yang. Utilizing query change for session search. In Proc. SIGIR, 2013. Google ScholarDigital Library
J. Jiang, D. He, and S. Han. On duplicate results in a search session. In Proc. TREC, 2012.Google Scholar
G. J. Jones, J. T. Foote, K. S. Jones, and S. J. Young. Retrieving spoken documents by combining multiple index sources. In Proc. SIGIR, 1996. Google ScholarDigital Library
E. Kanoulas, B. Carterette, P. D. Clough, and M. Sanderson. Overview of the TREC 2013 Session track. In Proc. TREC, 2013.Google Scholar
E. Kanoulas, B. Carterette, P. D. Clough, and M. Sanderson. Overview of the TREC 2014 Session track. In Proc. TREC, 2014.Google Scholar
U. Kruschwitz. University of essex at the trec 2012 session track. In Proc. TREC, 2012.Google Scholar
G. Lebanon and J. Lafferty. Cranking: Combining rankings using conditional probability models on permutations. In Proc. ICML, 2002. Google ScholarDigital Library
J. Lee. Combining multiple evidence from different properties of weighting schemes. In Proc. SIGIR, 1995. Google ScholarDigital Library
J. H. Lee. Analyses of multiple evidence combination. In Proc. SIGIR, 1997. Google ScholarDigital Library
D. Lillis, F. Toolan, R. Collier, and J. Dunnion. Probfuse: a probabilistic approach to data fusion. In Proc. SIGIR, 2006. Google ScholarDigital Library
Y. Liu, R. Song, M. Zhang, Z. Dou, T. Yamamoto, M. Kato, and K. Zhou. Overview of the NTCIR-14 iMine task. In Proc. NTCIR, 2014.Google Scholar
M. Montague and J. Aslam. Condorcet fusion for improved retrieval. In Proc. CIKM, 2002. Google ScholarDigital Library
K. B. Ng and P. B. Kantor. Predicting the effectiveness of naive data fusion on the basis of system characteristics. Journal of the American Society for Information Science, 51(13):1177{1189, 2000. Google ScholarDigital Library
F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In Proc. SIGIR, 2006. Google ScholarDigital Library
K. Raman, P. N. Bennett, and K. Collins-Thompson. Toward whole-session relevance: exploring intrinsic diversity in web search. In Proc. SIGIR, 2013. Google ScholarDigital Library
S. E. Robertson. The probability ranking principle in ir. Journal of Documentation, 33(4):294{304, Dec. 1977.Google ScholarCross Ref
R. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In Proc. WWW, 2010. Google ScholarDigital Library
T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, 2005.Google Scholar
C. C. Vogt and G. W. Cottrell. Predicting the performance of linearly combined ir systems. In Proc. SIGIR, 1998. Google ScholarDigital Library
J. Wang and J. Zhu. Portfolio theory of information retrieval. In Proc. SIGIR, 2009. Google ScholarDigital Library
S. Wu and S. McClean. Performance prediction of data fusion for information retrieval. Information processing & management, 42(4):899{915, 2006. Google ScholarDigital Library
C. Zhai, W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Proc. SIGIR, 2003. Google ScholarDigital Library
S. Zhang, D. Guan, and H. Yang. Query change as relevance feedback in session search. In Proc. SIGIR, 2013. Google ScholarDigital Library

Index Terms

PDF: A Probabilistic Data Fusion Framework for Retrieval and Ranking
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Model Fusion Experiments for the CLSR Task at CLEF 2007
Advances in Multilingual and Multimodal Information Retrieval

This paper presents the participation of the University of Ottawa group in the Cross-Language Speech Retrieval (CL-SR) task at CLEF 2007. We present the results of the submitted runs for the English collection. We have used two Information Retrieval ...
Read More
How good is a span of terms?: exploiting proximity to improve web retrieval
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Ranking search results is a fundamental problem in information retrieval. In this paper we explore whether the use of proximity and phrase information can improve web retrieval accuracy. We build on existing research by incorporating novel ranking ...
Read More
A machine learning approach for improved BM25 retrieval
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine the effectiveness of BM25 on various document fields. We find that BM25 ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
September 2016
318 pages
ISBN:9781450344975
DOI:10.1145/2970398
General Chairs:
Ben Carterette
University of Delaware, USA
,
Hui Fang
University of Delaware, USA
,
Program Chairs:
Mounia Lalmas
Yahoo! Labs, UK
,
Jian-Yun Nie
University of Montreal, Canada
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 September 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
diversified ranking
probabilistic data fusion
retrieval models
search over sessions
Qualifiers
- research-article
Conference

Acceptance Rates
ICTIR '16 Paper Acceptance Rate41of79submissions,52%Overall Acceptance Rate209of482submissions,43%
More
Upcoming Conference
ICTIR '24

Sponsor:

sigir

The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval

July 13, 2024

Washington DC , DC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 215
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PDF: A Probabilistic Data Fusion Framework for Retrieval and Ranking

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Model Fusion Experiments for the CLSR Task at CLEF 2007

How good is a span of terms?: exploiting proximity to improve web retrieval

A machine learning approach for improved BM25 retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PDF: A Probabilistic Data Fusion Framework for Retrieval and Ranking

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Model Fusion Experiments for the CLSR Task at CLEF 2007

How good is a span of terms?: exploiting proximity to improve web retrieval

A machine learning approach for improved BM25 retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media