Skip to main content

Simple Adaptations of Data Fusion Algorithms for Source Selection

  • Conference paper
Advances in Information Retrieval (ECIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

Abstract

Source selection deals with the problem of selecting the most appropriate information sources from the set of, usually non-intersecting, available document collections. On the other hand, data fusion techniques (also known as metasearch techniques) deal with the problem of aggregating the results from multiple, usually completely or partly intersecting, document sources in order to provide a wider coverage and a more effective retrieval result. In this paper we study some simple adaptations to traditional data fusion algorithms for the task of source selection in uncooperative distributed information retrieval environments. The experiments demonstrate that the performance of data fusion techniques at source selection tasks is comparable with that of state-of-the-art source selection algorithms and they are often able to surpass them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Callan, J.: Distributed information retrieval (2000)

    Google Scholar 

  2. Si, L., Callan, J.: A semisupervised learning method to merge search engine results. ACM Trans. Inf. Syst. 21(4), 457–491 (2003)

    Article  Google Scholar 

  3. Lyman, P., Varian, H.R.: How much information? University of California, Berkeley (2003), http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/internet.htm

    Google Scholar 

  4. Bergman, M.K.: The deep web: Surfacing hidden value. BrightPlanet (September 2001), http://www.brightplanet.com/pdf/deepwebwhitepaper.pdf

  5. Raghavan, S., Garcia-Molina, H.: Crawling the hidden web. In: VLDB 2001, pp. 129–138. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  6. Sherman, C.: Search for the invisible web. Guardian Unlimited (2001), http://www.guardian.co.uk/technology/2001/sep/06/internetnews

  7. Miller, J.: Most fed data is un-googleable. Federal Computer Week (2007), http://www.fcw.com/online/news/151098-1.html?CMP=OTC-RSS

  8. Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: SIGIR 2003, pp. 298–305. ACM, New York (2003)

    Google Scholar 

  9. Callan, J., Connell, M.: Query-based sampling of text databases. ACM Trans. Inf. Syst. 19(2), 97–130 (2001)

    Article  Google Scholar 

  10. Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: SIGIR 1995, pp. 21–28. ACM, New York (1995)

    Google Scholar 

  11. Powell, A.L., French, J.C., Callan, J., Connell, M., Viles, C.L.: The impact of database selection on distributed searching. In: SIGIR 2000, pp. 232–239. ACM, New York (2000)

    Google Scholar 

  12. Paltoglou, G., Salampasis, M., Satratzemi, M.: Hybrid results merging. In: CIKM 2007, pp. 321–330. ACM, New York (2007)

    Google Scholar 

  13. Craswell, N., Hawking, D., Thistlewaite, P.: Merging results from isolated search engines. In: 10th ADC, Auckland, NZ, pp. 189–200. Springer, Heidelberg (1999)

    Google Scholar 

  14. Si, L., Jin, R., Callan, J., Ogilvie, P.: A language modeling framework for resource selection and results merging. In: CIKM 2002, pp. 391–397. ACM, New York (2002)

    Google Scholar 

  15. Nottelmann, H., Fuhr, N.: Evaluating different methods of estimating retrieval quality for resource selection. In: SIGIR 2003, pp. 290–297. ACM, New York (2003)

    Google Scholar 

  16. Avrahami, T.T., Yau, L., Si, L., Callan, J.: The fedlemur project: Federated search in the real world. J. Am. Soc. Inf. Sci. Technol. 57(3), 347 (2006)

    Article  Google Scholar 

  17. Shokouhi, M.: Central-rank-based collection selection in uncooperative distributed information retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 160–172. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Gravano, L., Chang, K., Garcia-Molina, H., Paepcke, A.: Starts: Stanford protocol proposal for internet retrieval and search. Technical report, Stanford, CA, USA (1997)

    Google Scholar 

  19. Shokouhi, M., Zobel, J., Scholer, F., Tahaghoghi, S.M.M.: Capturing collection size for distributed non-cooperative retrieval. In: SIGIR 2006, pp. 316–323. ACM, New York (2006)

    Google Scholar 

  20. Gravano, L., Garcia-Molina, H., Tomasic, A.: Gloss: Text-source discovery over the internet. ACM Trans. Database Syst. 24(2), 229–264 (1999)

    Article  Google Scholar 

  21. Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on the internet. In: DASFAA 1997, pp. 41–50. World Sc. Pr. (1997)

    Google Scholar 

  22. Powell, A.L., French, J.C.: Comparing the performance of collection selection algorithms. ACM Trans. Inf. Syst. 21(4), 412–456 (2003)

    Article  Google Scholar 

  23. Fuhr, N.: A decision-theoretic approach to database selection in networked ir. ACM Trans. Inf. Syst. 17(3), 229–249 (1999)

    Article  Google Scholar 

  24. Hawking, D., Thomas, P.: Server selection methods in hybrid portal search. In: SIGIR 2005, pp. 75–82. ACM, New York (2005)

    Google Scholar 

  25. Si, L., Callan, J.: Unified utility maximization framework for resource selection. In: CIKM 2004, pp. 32–41. ACM, New York (2004)

    Google Scholar 

  26. Si, L., Callan, J.: Modeling search engine effectiveness for federated search. In: SIGIR 2005, pp. 83–90. ACM, New York (2005)

    Google Scholar 

  27. Baillie, M., Azzopardi, L.F.: Adaptive query-based sampling of distributed collections. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 316–328. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  28. Callan, J.P., Croft, W.B., Harding, S.M.: The inquery retrieval system. In: DEXA, pp. 78–83 (1992)

    Google Scholar 

  29. Lee, J.H.: Analyses of multiple evidence combination. In: SIGIR 1997, pp. 267–276. ACM, New York (1997)

    Google Scholar 

  30. Shaw, J.A., Fox, E.A.: Combination of multiple searches. In: The Second Text REtrieval Conference (TREC-2), pp. 243–252 (1994)

    Google Scholar 

  31. Aslam, J.A., Montague, M.: Models for metasearch. In: SIGIR 2001, pp. 276–284. ACM, New York (2001)

    Google Scholar 

  32. Macdonald, C., Ounis, I.: Voting for candidates: adapting data fusion techniques for an expert search task. In: CIKM 2006, pp. 387–396. ACM, New York (2006)

    Google Scholar 

  33. Ogilvie, P., Callan, J.P.: Experiments using the lemur toolkit. In: Text REtrieval Conference (2001)

    Google Scholar 

  34. Bailey, P., Craswell, N., Hawking, D.: Engineering a multi-purpose test collection for web retrieval experiments. Inf. Process. Manage. 39(6), 853–871 (2003)

    Article  Google Scholar 

  35. Thomas, P., Hawking, D.: Evaluating sampling methods for uncooperative collections. In: SIGIR 2007, pp. 503–510. ACM, New York (2007)

    Google Scholar 

  36. Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M.: Okapi at trec-3. In: TREC-3, pp. 21–30 (1994)

    Google Scholar 

  37. Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Process. Manage. 36(2), 207–227 (2000)

    Article  Google Scholar 

  38. Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003, pp. 143–150. ACM, New York (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Paltoglou, G., Salampasis, M., Satratzemi, M. (2009). Simple Adaptations of Data Fusion Algorithms for Source Selection. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00958-7_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00957-0

  • Online ISBN: 978-3-642-00958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics