Skip to main content
Log in

Probability-based fusion of information retrieval result sets

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Information Retrieval (IR) forms the basis of many information management tasks. Information management itself has become an extremely important area as the amount of electronically available information increases dramatically. There are numerous methods of performing the IR task both by utilising different techniques and through using different representations of the information available to us. It has been shown that some algorithms outperform others on certain tasks. Combining the results produced by different algorithms has resulted in superior retrieval performance and this has become an important research area. This paper introduces a probability-based fusion technique probFuse that shows initial promise in addressing this question. It also compares probFuse with the common CombMNZ data fusion technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aslam JA, Montague M (2000) Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems. In: SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 379–381

  • Aslam JA, Montague M (2001) Models for metasearch. In: SIGIR ’01: proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 276–284

  • Baeza-Yates RA and Ribeiro-Neto B (1999). Modern information retrieval. Addison-Wesley Longman Publishing Co, Inc, Boston, MA, USA

    Google Scholar 

  • Bartell BT, Cottrell GW, Belew RK (1994) Automatic combination of multiple ranked retrieval systems. In: SIGIR ’94: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer-Verlag, New York, New York Inc., NY, USA, pp 173–181

  • Beitzel SM., Jensen EC, Chowdhury A, Grossman D, Frieder O and Goharian N (2004). Fusion of effective retrieval strategies in the same information retrieval system. J Am Soc Inf Sci Technol 55(10): 859–868

    Article  Google Scholar 

  • Callan JP, Lu Z, Croft WB (1995) Searching distributed collections with inference networks. In: SIGIR ’95: proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 21–28

  • Das-Gupta P, Katzer J (1983) A study of the overlap among document representations. In: SIGIR ’83: Proceedings of the 6th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 106–114

  • Dietterich TG (2000). Ensemble methods in machine learning. Lecture Notes Comput Sci 1857: 1–15

    Article  Google Scholar 

  • Fox EA, Shaw JA (1994) Combination of multiple searches. In: Proceedings of the 2nd text Retrieval conference (TREC-2), national institute of standards and technology special publication 500-215. pp 243–252

  • Giacinto G and Roli F (2001). Dynamic classifier selection based on multiple classifier behaviour. Pattern Recogn 34(9): 1879–1881

    Article  MATH  Google Scholar 

  • Harman D (1993) Overview of the first text retrieval conference (TREC-1). In: SIGIR ’93: proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 36–47

  • Howe AE and Dreilinger D (1997). SavvySearch: a metasearch engine that learns which search engines to query.. AI Mag 18(2): 19–25

    Google Scholar 

  • Larkey LS, Connell ME, Callan J (2000) Collection selection and results merging with topically organized U.S. patents and TREC data. In: CIKM ’00: proceedings of the ninth international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 282–289

  • Lee JH (1997). Analyses of multiple evidence combination. SIGIR Forum 31(SI): 267–276

    Article  Google Scholar 

  • Montague M, Aslam JA (2001) Relevance score normalization for metasearch. In: CIKM ’01: proceedings of the tenth international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 427–433

  • Montague M, Aslam JA (2002) Condorcet fusion for improved retrieval. In: CIKM ’02: Proceedings of the eleventh international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 538–548

  • Mur A, Peng L, Collier R, Lillis D, Toolan F, Dunnion J (2005) A HOTAIR scalability model. In: Proceedings of the 16th irish conference on artificial intelligence and cognitive science (AICS 2005). University of Ulster. Portstewart, Northern Ireland, pp 359–368

  • Peng L, Collier R, Mur A, Lillis D, Toolan F, Dunnion J (2005) A self-configuring agent-based document indexing system. In: Proceedings of the 4th international central and eastern european conference on multi-agent systems (CEEMAS 2005). Springer-Verlag GmbH, Budapest, Hungary,

  • Powell AL, French JC, Callan J, Connell M, Viles CL (2000) The impact of database selection on distributed searching. In: SIGIR ’00: proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 232–239

  • Rasolofo Y, Abbaci F, Savoy J (2001) Approaches to collection selection and results merging for distributed information retrieval. In: CIKM ’01: proceedings of the tenth international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 191–198

  • Salton G, Fox EA and Wu H (1983). Extended boolean information retrieval. Commun ACM 26(11): 1022–1036

    Article  MATH  MathSciNet  Google Scholar 

  • Salton G and Lesk ME (1968). Computer evaluation of indexing and text processing. J ACM 15(1): 8–36

    Article  MATH  Google Scholar 

  • Saracevic T and Kantor P (1988). A study of information seeking and retrieving. III. Searchers, searches and overlap. J Am Soc Inform Sci 39(3): 197–216

    Article  Google Scholar 

  • Selberg E, Etzioni O (1997) The metacrawler architecture for resource aggregation on the web. IEEE Expert (January–February): 11–14

  • Si L, Callan J (2002) Using sampled data and regression to merge search engine results. In: SIGIR ’02: proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 19–26

  • Vogt CC and Cottrell GW (1999). Fusion via a linear combination of scores. Inform Retrieval 1(3): 151–173

    Article  Google Scholar 

  • Voorhees EM, Gupta NK, Johnson-Laird B (1994) The collection fusion problem. In: Proceedings of the third text retrieval conference (TREC-3). pp 95–104

  • Voorhees EM, Gupta NK, Johnson-Laird B (1995) Learning collection fusion strategies. In: SIGIR ’95: proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 172–179

  • Voorhees EM, Tong RM (1997) Multiple search engines in database merging. In: Proceedings of the second ACM international conference on digital libraries. ACM Press, Philadelphia, Pa, New York, pp 93–102

  • Wu S, Crestani F (2002) Data fusion with estimated weights. In: CIKM ’02: Proceedings of the eleventh international conference on information and knowledge management. ACM Press. New York, NY, USA, pp 648–651

  • Wu S, Crestani F (2004) Shadow document methods of results merging. In: SAC ’04: proceedings of the 2004 ACM symposium on applied computing. ACM Press. New York, NY, USA, pp 1067–1072

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Lillis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lillis, D., Toolan, F., Mur, A. et al. Probability-based fusion of information retrieval result sets. Artif Intell Rev 25, 179–191 (2006). https://doi.org/10.1007/s10462-007-9021-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-007-9021-x

Keywords

Navigation