Segmentation of Search Engine Results for Effective Data-Fusion

Shokouhi, Milad

doi:10.1007/978-3-540-71496-5_19

Milad Shokouhi¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

European Conference on Information Retrieval

2089 Accesses
13 Citations
3 Altmetric

Abstract

Metasearch and data-fusion techniques combine the rank lists of multiple document retrieval systems with the aim of improving search coverage and precision.

We propose a new fusion method that partitions the rank lists of document retrieval systems into chunks. The size of chunks grows exponentially in the rank list. Using a small number of training queries, the probabilities of relevance of documents in different chunks are approximated for each search system. The estimated probabilities and normalized document scores are used to compute the final document ranks in the merged list. We show that our proposed method produces higher average precision values than previous systems across a range of testbeds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aslam, J., Montague, M.: Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems (poster session). In: Proc. ACM SIGIR conf., Athens, Greece, pp. 379–381. ACM Press, New York (2000)
Google Scholar
Aslam, J., Montague, M.: Models for metasearch. In: Proc. ACM SIGIR conf., New Orleans, Louisiana, pp. 276–284. ACM Press, New York (2001)
Google Scholar
Bar-Yossef, Z., Gurevich, M.: Random sampling from a search engine’s index. In: Proc. 15th Int. Conf. on the World Wide Web, Edinburgh, Scotland (2006)
Google Scholar
Buckley, C., Voorhees, E.: Retrieval evaluation with incomplete information. In: Proc. ACM SIGIR conf., Sheffield, UK, pp. 25–32. ACM Press, New York (2004)
Google Scholar
Callan, J.: Distributed information retrieval. In: Croft, B. (ed.) Advances in information retrieval, pp. 127–150 (2000)
Google Scholar
Calvé, A., Savoy, J.: Database merging strategy based on logistic regression. Information Processing and Management 36(3), 341–359 (2000)
Article Google Scholar
Croft, B.: Combining approaches to information retrieval. In: Croft, B. (ed.) Advances in information retrieval, pp. 1–36 (2000)
Google Scholar
Dreilinger, D., Howe, A.: Experiences with selecting search engines using metasearch. ACM Transaction on Information Systems 15(3), 195–222 (1997)
Article Google Scholar
Dwork, C., et al.: Rank aggregation methods for the Web. In: Proc. the 10th Int. conf. on World Wide Web, Hong Kong, pp. 613–622 (2001)
Google Scholar
Fox, E., Shaw, J.: Combination of multiple searches. In: Proc. the second Text REtrieval Conf., Gaithersburg, Maryland. NIST Special Publication, pp. 243–252 (1993)
Google Scholar
Fox, E., Shaw, J.: Combination of multiple searches. In: Proc. the Third Text REtrieval Conf., Gaithersburg, Maryland. NIST Special Publication, pp. 105–108 (1994)
Google Scholar
Harman, D.: Overview of the third Text REtrieval Conference (TREC-3). In: Proc. the third Text REtrieval Conf., pp. 1–19. NIST (1994)
Google Scholar
Joachims, T., et al.: Accurately interpreting clickthrough data as implicit feedback. In: Proc. ACM SIGIR conf., Salvador, Brazil, pp. 154–161. ACM Press, New York (2005)
Google Scholar
Lee, J.: Analyses of multiple evidence combination. In: Proc. the 20th ACM SIGIR conf., Philadelphia, Pennsylvania, pp. 267–276. ACM Press, New York (1997)
Google Scholar
Lillis, D., et al.: ProbFuse: a probabilistic approach to data fusion. In: Proc. ACM SIGIR conf., Seattle, Washington, pp. 139–146. ACM Press, New York (2006)
Google Scholar
Manmatha, R., Rath, T., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proc. ACM SIGIR conf., New Orleans, Louisiana, pp. 267–275. ACM Press, New York (2001)
Google Scholar
Meng, W., Yu, C., Liu, K.: Building efficient and effective metasearch engines. ACM Computing Surveys 34(1), 48–89 (2002)
Article Google Scholar
Rasolofo, Y., Hawking, D., Savoy, J.: Result merging strategies for a current news metasearcher. Information Processing and Management 39(4), 581–609 (2003)
Article MATH Google Scholar
Savoy, J., Calvé, A., Vrajitoru, D.: Information retrieval systems for large document collections. In: Proc. the Fifth Text REtrieval Conf, Gaithersburg, Maryland, pp. 489–502 (1996)
Google Scholar
Selberg, E., Etzioni, O.: The MetaCrawler architecture for resource aggregation on the web. IEEE Expert 12(1), 8–14 (1997)
Article Google Scholar
Si, L., Callan, J.: A semisupervised learning method to merge search engine results. ACM Transactions on Information Systems 21(4), 457–491 (2003)
Article Google Scholar
Spink, A., et al.: A study of results overlap and uniqueness among major web search engines. Information Processing and Management 42(5), 1379–1391 (2006)
Article Google Scholar
Vogt, C.: How much more is better? characterizing the effects of adding more ir systems to a combination. In: Content-Based Multimedia Information Access (RIAO), Paris, France, pp. 457–475 (2000)
Google Scholar
Vogt, C.: Adaptive combination of evidence for information retrieval. PhD thesis, University of California, San Diego (1999)
Google Scholar
Vogt, C., Cottrell, G.: Fusion via a linear combination of scores. Information Retrieval 1(3), 151–173 (1999)
Article Google Scholar
Voorhees, E., Harman, D.: Overview of the third Text REtrieval Conference (TREC-5). In: Proc. the fifth Text Retrieval Conf., pp. 1–28. National Institute of Standards and Technology (1996)
Google Scholar
Voorhees, E., Gupta, K., Johnson-Larid, B.: The collection fusion problem. In: Proc. the Third Text REtrieval Conf (TREC-3), pp. 95–104 (1994)
Google Scholar
Voorhees, E., Gupta, N., Johnson-Laird, B.: Learning collection fusion strategies. In: Proc. ACM SIGIR conf., Seattle, Washington, pp. 172–179. ACM Press, New York (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Technology, RMIT University, Melbourne 3001, Australia
Milad Shokouhi

Authors

Milad Shokouhi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shokouhi, M. (2007). Segmentation of Search Engine Results for Effective Data-Fusion. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-71496-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics