Skip to main content

Segmentation of Search Engine Results for Effective Data-Fusion

  • Conference paper
Book cover Advances in Information Retrieval (ECIR 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

Abstract

Metasearch and data-fusion techniques combine the rank lists of multiple document retrieval systems with the aim of improving search coverage and precision.

We propose a new fusion method that partitions the rank lists of document retrieval systems into chunks. The size of chunks grows exponentially in the rank list. Using a small number of training queries, the probabilities of relevance of documents in different chunks are approximated for each search system. The estimated probabilities and normalized document scores are used to compute the final document ranks in the merged list. We show that our proposed method produces higher average precision values than previous systems across a range of testbeds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aslam, J., Montague, M.: Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems (poster session). In: Proc. ACM SIGIR conf., Athens, Greece, pp. 379–381. ACM Press, New York (2000)

    Google Scholar 

  • Aslam, J., Montague, M.: Models for metasearch. In: Proc. ACM SIGIR conf., New Orleans, Louisiana, pp. 276–284. ACM Press, New York (2001)

    Google Scholar 

  • Bar-Yossef, Z., Gurevich, M.: Random sampling from a search engine’s index. In: Proc. 15th Int. Conf. on the World Wide Web, Edinburgh, Scotland (2006)

    Google Scholar 

  • Buckley, C., Voorhees, E.: Retrieval evaluation with incomplete information. In: Proc. ACM SIGIR conf., Sheffield, UK, pp. 25–32. ACM Press, New York (2004)

    Google Scholar 

  • Callan, J.: Distributed information retrieval. In: Croft, B. (ed.) Advances in information retrieval, pp. 127–150 (2000)

    Google Scholar 

  • Calvé, A., Savoy, J.: Database merging strategy based on logistic regression. Information Processing and Management 36(3), 341–359 (2000)

    Article  Google Scholar 

  • Croft, B.: Combining approaches to information retrieval. In: Croft, B. (ed.) Advances in information retrieval, pp. 1–36 (2000)

    Google Scholar 

  • Dreilinger, D., Howe, A.: Experiences with selecting search engines using metasearch. ACM Transaction on Information Systems 15(3), 195–222 (1997)

    Article  Google Scholar 

  • Dwork, C., et al.: Rank aggregation methods for the Web. In: Proc. the 10th Int. conf. on World Wide Web, Hong Kong, pp. 613–622 (2001)

    Google Scholar 

  • Fox, E., Shaw, J.: Combination of multiple searches. In: Proc. the second Text REtrieval Conf., Gaithersburg, Maryland. NIST Special Publication, pp. 243–252 (1993)

    Google Scholar 

  • Fox, E., Shaw, J.: Combination of multiple searches. In: Proc. the Third Text REtrieval Conf., Gaithersburg, Maryland. NIST Special Publication, pp. 105–108 (1994)

    Google Scholar 

  • Harman, D.: Overview of the third Text REtrieval Conference (TREC-3). In: Proc. the third Text REtrieval Conf., pp. 1–19. NIST (1994)

    Google Scholar 

  • Joachims, T., et al.: Accurately interpreting clickthrough data as implicit feedback. In: Proc. ACM SIGIR conf., Salvador, Brazil, pp. 154–161. ACM Press, New York (2005)

    Google Scholar 

  • Lee, J.: Analyses of multiple evidence combination. In: Proc. the 20th ACM SIGIR conf., Philadelphia, Pennsylvania, pp. 267–276. ACM Press, New York (1997)

    Google Scholar 

  • Lillis, D., et al.: ProbFuse: a probabilistic approach to data fusion. In: Proc. ACM SIGIR conf., Seattle, Washington, pp. 139–146. ACM Press, New York (2006)

    Google Scholar 

  • Manmatha, R., Rath, T., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proc. ACM SIGIR conf., New Orleans, Louisiana, pp. 267–275. ACM Press, New York (2001)

    Google Scholar 

  • Meng, W., Yu, C., Liu, K.: Building efficient and effective metasearch engines. ACM Computing Surveys 34(1), 48–89 (2002)

    Article  Google Scholar 

  • Rasolofo, Y., Hawking, D., Savoy, J.: Result merging strategies for a current news metasearcher. Information Processing and Management 39(4), 581–609 (2003)

    Article  MATH  Google Scholar 

  • Savoy, J., Calvé, A., Vrajitoru, D.: Information retrieval systems for large document collections. In: Proc. the Fifth Text REtrieval Conf, Gaithersburg, Maryland, pp. 489–502 (1996)

    Google Scholar 

  • Selberg, E., Etzioni, O.: The MetaCrawler architecture for resource aggregation on the web. IEEE Expert 12(1), 8–14 (1997)

    Article  Google Scholar 

  • Si, L., Callan, J.: A semisupervised learning method to merge search engine results. ACM Transactions on Information Systems 21(4), 457–491 (2003)

    Article  Google Scholar 

  • Spink, A., et al.: A study of results overlap and uniqueness among major web search engines. Information Processing and Management 42(5), 1379–1391 (2006)

    Article  Google Scholar 

  • Vogt, C.: How much more is better? characterizing the effects of adding more ir systems to a combination. In: Content-Based Multimedia Information Access (RIAO), Paris, France, pp. 457–475 (2000)

    Google Scholar 

  • Vogt, C.: Adaptive combination of evidence for information retrieval. PhD thesis, University of California, San Diego (1999)

    Google Scholar 

  • Vogt, C., Cottrell, G.: Fusion via a linear combination of scores. Information Retrieval 1(3), 151–173 (1999)

    Article  Google Scholar 

  • Voorhees, E., Harman, D.: Overview of the third Text REtrieval Conference (TREC-5). In: Proc. the fifth Text Retrieval Conf., pp. 1–28. National Institute of Standards and Technology (1996)

    Google Scholar 

  • Voorhees, E., Gupta, K., Johnson-Larid, B.: The collection fusion problem. In: Proc. the Third Text REtrieval Conf (TREC-3), pp. 95–104 (1994)

    Google Scholar 

  • Voorhees, E., Gupta, N., Johnson-Laird, B.: Learning collection fusion strategies. In: Proc. ACM SIGIR conf., Seattle, Washington, pp. 172–179. ACM Press, New York (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Shokouhi, M. (2007). Segmentation of Search Engine Results for Effective Data-Fusion. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71496-5_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71494-1

  • Online ISBN: 978-3-540-71496-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics