Abstract
Metasearch and data-fusion techniques combine the rank lists of multiple document retrieval systems with the aim of improving search coverage and precision.
We propose a new fusion method that partitions the rank lists of document retrieval systems into chunks. The size of chunks grows exponentially in the rank list. Using a small number of training queries, the probabilities of relevance of documents in different chunks are approximated for each search system. The estimated probabilities and normalized document scores are used to compute the final document ranks in the merged list. We show that our proposed method produces higher average precision values than previous systems across a range of testbeds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aslam, J., Montague, M.: Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems (poster session). In: Proc. ACM SIGIR conf., Athens, Greece, pp. 379–381. ACM Press, New York (2000)
Aslam, J., Montague, M.: Models for metasearch. In: Proc. ACM SIGIR conf., New Orleans, Louisiana, pp. 276–284. ACM Press, New York (2001)
Bar-Yossef, Z., Gurevich, M.: Random sampling from a search engine’s index. In: Proc. 15th Int. Conf. on the World Wide Web, Edinburgh, Scotland (2006)
Buckley, C., Voorhees, E.: Retrieval evaluation with incomplete information. In: Proc. ACM SIGIR conf., Sheffield, UK, pp. 25–32. ACM Press, New York (2004)
Callan, J.: Distributed information retrieval. In: Croft, B. (ed.) Advances in information retrieval, pp. 127–150 (2000)
Calvé, A., Savoy, J.: Database merging strategy based on logistic regression. Information Processing and Management 36(3), 341–359 (2000)
Croft, B.: Combining approaches to information retrieval. In: Croft, B. (ed.) Advances in information retrieval, pp. 1–36 (2000)
Dreilinger, D., Howe, A.: Experiences with selecting search engines using metasearch. ACM Transaction on Information Systems 15(3), 195–222 (1997)
Dwork, C., et al.: Rank aggregation methods for the Web. In: Proc. the 10th Int. conf. on World Wide Web, Hong Kong, pp. 613–622 (2001)
Fox, E., Shaw, J.: Combination of multiple searches. In: Proc. the second Text REtrieval Conf., Gaithersburg, Maryland. NIST Special Publication, pp. 243–252 (1993)
Fox, E., Shaw, J.: Combination of multiple searches. In: Proc. the Third Text REtrieval Conf., Gaithersburg, Maryland. NIST Special Publication, pp. 105–108 (1994)
Harman, D.: Overview of the third Text REtrieval Conference (TREC-3). In: Proc. the third Text REtrieval Conf., pp. 1–19. NIST (1994)
Joachims, T., et al.: Accurately interpreting clickthrough data as implicit feedback. In: Proc. ACM SIGIR conf., Salvador, Brazil, pp. 154–161. ACM Press, New York (2005)
Lee, J.: Analyses of multiple evidence combination. In: Proc. the 20th ACM SIGIR conf., Philadelphia, Pennsylvania, pp. 267–276. ACM Press, New York (1997)
Lillis, D., et al.: ProbFuse: a probabilistic approach to data fusion. In: Proc. ACM SIGIR conf., Seattle, Washington, pp. 139–146. ACM Press, New York (2006)
Manmatha, R., Rath, T., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proc. ACM SIGIR conf., New Orleans, Louisiana, pp. 267–275. ACM Press, New York (2001)
Meng, W., Yu, C., Liu, K.: Building efficient and effective metasearch engines. ACM Computing Surveys 34(1), 48–89 (2002)
Rasolofo, Y., Hawking, D., Savoy, J.: Result merging strategies for a current news metasearcher. Information Processing and Management 39(4), 581–609 (2003)
Savoy, J., Calvé, A., Vrajitoru, D.: Information retrieval systems for large document collections. In: Proc. the Fifth Text REtrieval Conf, Gaithersburg, Maryland, pp. 489–502 (1996)
Selberg, E., Etzioni, O.: The MetaCrawler architecture for resource aggregation on the web. IEEE Expert 12(1), 8–14 (1997)
Si, L., Callan, J.: A semisupervised learning method to merge search engine results. ACM Transactions on Information Systems 21(4), 457–491 (2003)
Spink, A., et al.: A study of results overlap and uniqueness among major web search engines. Information Processing and Management 42(5), 1379–1391 (2006)
Vogt, C.: How much more is better? characterizing the effects of adding more ir systems to a combination. In: Content-Based Multimedia Information Access (RIAO), Paris, France, pp. 457–475 (2000)
Vogt, C.: Adaptive combination of evidence for information retrieval. PhD thesis, University of California, San Diego (1999)
Vogt, C., Cottrell, G.: Fusion via a linear combination of scores. Information Retrieval 1(3), 151–173 (1999)
Voorhees, E., Harman, D.: Overview of the third Text REtrieval Conference (TREC-5). In: Proc. the fifth Text Retrieval Conf., pp. 1–28. National Institute of Standards and Technology (1996)
Voorhees, E., Gupta, K., Johnson-Larid, B.: The collection fusion problem. In: Proc. the Third Text REtrieval Conf (TREC-3), pp. 95–104 (1994)
Voorhees, E., Gupta, N., Johnson-Laird, B.: Learning collection fusion strategies. In: Proc. ACM SIGIR conf., Seattle, Washington, pp. 172–179. ACM Press, New York (1995)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Shokouhi, M. (2007). Segmentation of Search Engine Results for Effective Data-Fusion. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-71496-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)