skip to main content
10.1145/2970398.2970419acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article
Public Access

PDF: A Probabilistic Data Fusion Framework for Retrieval and Ranking

Published:12 September 2016Publication History

ABSTRACT

Data fusion has been shown to be a simple and effective way to improve retrieval results. Most existing data fusion methods combine ranked lists from different retrieval functions for a single given query. But in many real search settings, the diversity of retrieval functions required to achieve good fusion performance is not available. Researchers are typically limited to a few variants on a scoring function used by the engine of their choice, with these variants often producing similar results due to being based on the same underlying term statistics.

This paper presents a framework for data fusion based on combining ranked lists from different queries that users could have entered for their information need. If we can identify a set of "possible queries" for an information need, and estimate probability distributions concerning the probability of generating those queries, the probability of retrieving certain documents for those queries, and the probability of documents being relevant to that information need, we have the potential to dramatically improve results over a baseline system given a single user query. Our framework is based on several component models that can be mixed and matched. We present several simple estimation methods for components. In order to demonstrate effectiveness, we present experimental results on 5 different datasets covering tasks such as ad-hoc search, novelty and diversity search, and search in the presence of implicit user feedback. Our results show strong performances for our method; it is competitive with state-of-the-art methods on the same datasets, and in some cases outperforms them.

References

  1. J. Aslam and M. Montague. Models for metasearch. In Proc. SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Bah and B. Carterette. Aggregating results from multiple related queries to improve web search over sessions. In Proc. AIRS, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  3. B. Bartell, G. Cottrell, and R. Belew. Automatic combination of multiple ranked retrieval systems. In Proc. SIGIR, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. J. Belkin, P. Kantor, E. Fox, and J. A. Shaw. Combining the evidence of multiple query representations for information retrieval. IPM, 31(3), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Carbonell and J. Goldstein. The user of mmr, diversity-based reranking for reordering documents and producing summaries. In Proc. SIGIR, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topic retrieval. In Proc. CIKM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. L. A. Clarke, M. Kolla, G. V. Cormack,O. Vechtomova, A. Ashkan, and S. Buttcher. Novelty and diversity in information retrieval evaluation. In SIGIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Collins-Thompson, P. Bennett, and F. Diaz. TREC 2014 Web track overview. In TREC, 2014.Google ScholarGoogle Scholar
  9. K. Collins-Thompson, P. Bennett, F. Diaz, and C. L. A. Clarke. TREC 2013 Web track overview. In TREC, 2013.Google ScholarGoogle Scholar
  10. Z. Dou, S. Hu, Y. Luo, R. Song, and J. R. Wen. Finding dimensions for queries. In Proc. CIKM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. A. Fox and J. A. Shaw. Combination of multiple searches. NIST SP, pages 243{243, 1994.Google ScholarGoogle Scholar
  12. J. Garofolo, C. Auzanne, and E. Voorhees. The trec spoken document retrieval track: A success story. In TREC, 2000.Google ScholarGoogle Scholar
  13. D. Guan. Structured Query Formulation and Result Organization for Session Search. PhD thesis, Georgetown University, 2013.Google ScholarGoogle Scholar
  14. D. Guan, S. Zhang, and H. Yang. Utilizing query change for session search. In Proc. SIGIR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Jiang, D. He, and S. Han. On duplicate results in a search session. In Proc. TREC, 2012.Google ScholarGoogle Scholar
  16. G. J. Jones, J. T. Foote, K. S. Jones, and S. J. Young. Retrieving spoken documents by combining multiple index sources. In Proc. SIGIR, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Kanoulas, B. Carterette, P. D. Clough, and M. Sanderson. Overview of the TREC 2013 Session track. In Proc. TREC, 2013.Google ScholarGoogle Scholar
  18. E. Kanoulas, B. Carterette, P. D. Clough, and M. Sanderson. Overview of the TREC 2014 Session track. In Proc. TREC, 2014.Google ScholarGoogle Scholar
  19. U. Kruschwitz. University of essex at the trec 2012 session track. In Proc. TREC, 2012.Google ScholarGoogle Scholar
  20. G. Lebanon and J. Lafferty. Cranking: Combining rankings using conditional probability models on permutations. In Proc. ICML, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Lee. Combining multiple evidence from different properties of weighting schemes. In Proc. SIGIR, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. H. Lee. Analyses of multiple evidence combination. In Proc. SIGIR, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Lillis, F. Toolan, R. Collier, and J. Dunnion. Probfuse: a probabilistic approach to data fusion. In Proc. SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Liu, R. Song, M. Zhang, Z. Dou, T. Yamamoto, M. Kato, and K. Zhou. Overview of the NTCIR-14 iMine task. In Proc. NTCIR, 2014.Google ScholarGoogle Scholar
  25. M. Montague and J. Aslam. Condorcet fusion for improved retrieval. In Proc. CIKM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. B. Ng and P. B. Kantor. Predicting the effectiveness of naive data fusion on the basis of system characteristics. Journal of the American Society for Information Science, 51(13):1177{1189, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In Proc. SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Raman, P. N. Bennett, and K. Collins-Thompson. Toward whole-session relevance: exploring intrinsic diversity in web search. In Proc. SIGIR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. E. Robertson. The probability ranking principle in ir. Journal of Documentation, 33(4):294{304, Dec. 1977.Google ScholarGoogle ScholarCross RefCross Ref
  30. R. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In Proc. WWW, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, 2005.Google ScholarGoogle Scholar
  32. C. C. Vogt and G. W. Cottrell. Predicting the performance of linearly combined ir systems. In Proc. SIGIR, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Wang and J. Zhu. Portfolio theory of information retrieval. In Proc. SIGIR, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Wu and S. McClean. Performance prediction of data fusion for information retrieval. Information processing & management, 42(4):899{915, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Zhai, W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Proc. SIGIR, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Zhang, D. Guan, and H. Yang. Query change as relevance feedback in session search. In Proc. SIGIR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PDF: A Probabilistic Data Fusion Framework for Retrieval and Ranking

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
          September 2016
          318 pages
          ISBN:9781450344975
          DOI:10.1145/2970398

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 September 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ICTIR '16 Paper Acceptance Rate41of79submissions,52%Overall Acceptance Rate209of482submissions,43%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader