Abstract
Information Retrieval (IR) forms the basis of many information management tasks. Information management itself has become an extremely important area as the amount of electronically available information increases dramatically. There are numerous methods of performing the IR task both by utilising different techniques and through using different representations of the information available to us. It has been shown that some algorithms outperform others on certain tasks. Combining the results produced by different algorithms has resulted in superior retrieval performance and this has become an important research area. This paper introduces a probability-based fusion technique probFuse that shows initial promise in addressing this question. It also compares probFuse with the common CombMNZ data fusion technique.
Similar content being viewed by others
References
Aslam JA, Montague M (2000) Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems. In: SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 379–381
Aslam JA, Montague M (2001) Models for metasearch. In: SIGIR ’01: proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 276–284
Baeza-Yates RA and Ribeiro-Neto B (1999). Modern information retrieval. Addison-Wesley Longman Publishing Co, Inc, Boston, MA, USA
Bartell BT, Cottrell GW, Belew RK (1994) Automatic combination of multiple ranked retrieval systems. In: SIGIR ’94: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer-Verlag, New York, New York Inc., NY, USA, pp 173–181
Beitzel SM., Jensen EC, Chowdhury A, Grossman D, Frieder O and Goharian N (2004). Fusion of effective retrieval strategies in the same information retrieval system. J Am Soc Inf Sci Technol 55(10): 859–868
Callan JP, Lu Z, Croft WB (1995) Searching distributed collections with inference networks. In: SIGIR ’95: proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 21–28
Das-Gupta P, Katzer J (1983) A study of the overlap among document representations. In: SIGIR ’83: Proceedings of the 6th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 106–114
Dietterich TG (2000). Ensemble methods in machine learning. Lecture Notes Comput Sci 1857: 1–15
Fox EA, Shaw JA (1994) Combination of multiple searches. In: Proceedings of the 2nd text Retrieval conference (TREC-2), national institute of standards and technology special publication 500-215. pp 243–252
Giacinto G and Roli F (2001). Dynamic classifier selection based on multiple classifier behaviour. Pattern Recogn 34(9): 1879–1881
Harman D (1993) Overview of the first text retrieval conference (TREC-1). In: SIGIR ’93: proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 36–47
Howe AE and Dreilinger D (1997). SavvySearch: a metasearch engine that learns which search engines to query.. AI Mag 18(2): 19–25
Larkey LS, Connell ME, Callan J (2000) Collection selection and results merging with topically organized U.S. patents and TREC data. In: CIKM ’00: proceedings of the ninth international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 282–289
Lee JH (1997). Analyses of multiple evidence combination. SIGIR Forum 31(SI): 267–276
Montague M, Aslam JA (2001) Relevance score normalization for metasearch. In: CIKM ’01: proceedings of the tenth international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 427–433
Montague M, Aslam JA (2002) Condorcet fusion for improved retrieval. In: CIKM ’02: Proceedings of the eleventh international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 538–548
Mur A, Peng L, Collier R, Lillis D, Toolan F, Dunnion J (2005) A HOTAIR scalability model. In: Proceedings of the 16th irish conference on artificial intelligence and cognitive science (AICS 2005). University of Ulster. Portstewart, Northern Ireland, pp 359–368
Peng L, Collier R, Mur A, Lillis D, Toolan F, Dunnion J (2005) A self-configuring agent-based document indexing system. In: Proceedings of the 4th international central and eastern european conference on multi-agent systems (CEEMAS 2005). Springer-Verlag GmbH, Budapest, Hungary,
Powell AL, French JC, Callan J, Connell M, Viles CL (2000) The impact of database selection on distributed searching. In: SIGIR ’00: proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 232–239
Rasolofo Y, Abbaci F, Savoy J (2001) Approaches to collection selection and results merging for distributed information retrieval. In: CIKM ’01: proceedings of the tenth international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 191–198
Salton G, Fox EA and Wu H (1983). Extended boolean information retrieval. Commun ACM 26(11): 1022–1036
Salton G and Lesk ME (1968). Computer evaluation of indexing and text processing. J ACM 15(1): 8–36
Saracevic T and Kantor P (1988). A study of information seeking and retrieving. III. Searchers, searches and overlap. J Am Soc Inform Sci 39(3): 197–216
Selberg E, Etzioni O (1997) The metacrawler architecture for resource aggregation on the web. IEEE Expert (January–February): 11–14
Si L, Callan J (2002) Using sampled data and regression to merge search engine results. In: SIGIR ’02: proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 19–26
Vogt CC and Cottrell GW (1999). Fusion via a linear combination of scores. Inform Retrieval 1(3): 151–173
Voorhees EM, Gupta NK, Johnson-Laird B (1994) The collection fusion problem. In: Proceedings of the third text retrieval conference (TREC-3). pp 95–104
Voorhees EM, Gupta NK, Johnson-Laird B (1995) Learning collection fusion strategies. In: SIGIR ’95: proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 172–179
Voorhees EM, Tong RM (1997) Multiple search engines in database merging. In: Proceedings of the second ACM international conference on digital libraries. ACM Press, Philadelphia, Pa, New York, pp 93–102
Wu S, Crestani F (2002) Data fusion with estimated weights. In: CIKM ’02: Proceedings of the eleventh international conference on information and knowledge management. ACM Press. New York, NY, USA, pp 648–651
Wu S, Crestani F (2004) Shadow document methods of results merging. In: SAC ’04: proceedings of the 2004 ACM symposium on applied computing. ACM Press. New York, NY, USA, pp 1067–1072
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lillis, D., Toolan, F., Mur, A. et al. Probability-based fusion of information retrieval result sets. Artif Intell Rev 25, 179–191 (2006). https://doi.org/10.1007/s10462-007-9021-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-007-9021-x