Abstract
Source selection deals with the problem of selecting the most appropriate information sources from the set of, usually non-intersecting, available document collections. On the other hand, data fusion techniques (also known as metasearch techniques) deal with the problem of aggregating the results from multiple, usually completely or partly intersecting, document sources in order to provide a wider coverage and a more effective retrieval result. In this paper we study some simple adaptations to traditional data fusion algorithms for the task of source selection in uncooperative distributed information retrieval environments. The experiments demonstrate that the performance of data fusion techniques at source selection tasks is comparable with that of state-of-the-art source selection algorithms and they are often able to surpass them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Callan, J.: Distributed information retrieval (2000)
Si, L., Callan, J.: A semisupervised learning method to merge search engine results. ACM Trans. Inf. Syst. 21(4), 457–491 (2003)
Lyman, P., Varian, H.R.: How much information? University of California, Berkeley (2003), http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/internet.htm
Bergman, M.K.: The deep web: Surfacing hidden value. BrightPlanet (September 2001), http://www.brightplanet.com/pdf/deepwebwhitepaper.pdf
Raghavan, S., Garcia-Molina, H.: Crawling the hidden web. In: VLDB 2001, pp. 129–138. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Sherman, C.: Search for the invisible web. Guardian Unlimited (2001), http://www.guardian.co.uk/technology/2001/sep/06/internetnews
Miller, J.: Most fed data is un-googleable. Federal Computer Week (2007), http://www.fcw.com/online/news/151098-1.html?CMP=OTC-RSS
Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: SIGIR 2003, pp. 298–305. ACM, New York (2003)
Callan, J., Connell, M.: Query-based sampling of text databases. ACM Trans. Inf. Syst. 19(2), 97–130 (2001)
Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: SIGIR 1995, pp. 21–28. ACM, New York (1995)
Powell, A.L., French, J.C., Callan, J., Connell, M., Viles, C.L.: The impact of database selection on distributed searching. In: SIGIR 2000, pp. 232–239. ACM, New York (2000)
Paltoglou, G., Salampasis, M., Satratzemi, M.: Hybrid results merging. In: CIKM 2007, pp. 321–330. ACM, New York (2007)
Craswell, N., Hawking, D., Thistlewaite, P.: Merging results from isolated search engines. In: 10th ADC, Auckland, NZ, pp. 189–200. Springer, Heidelberg (1999)
Si, L., Jin, R., Callan, J., Ogilvie, P.: A language modeling framework for resource selection and results merging. In: CIKM 2002, pp. 391–397. ACM, New York (2002)
Nottelmann, H., Fuhr, N.: Evaluating different methods of estimating retrieval quality for resource selection. In: SIGIR 2003, pp. 290–297. ACM, New York (2003)
Avrahami, T.T., Yau, L., Si, L., Callan, J.: The fedlemur project: Federated search in the real world. J. Am. Soc. Inf. Sci. Technol. 57(3), 347 (2006)
Shokouhi, M.: Central-rank-based collection selection in uncooperative distributed information retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 160–172. Springer, Heidelberg (2007)
Gravano, L., Chang, K., Garcia-Molina, H., Paepcke, A.: Starts: Stanford protocol proposal for internet retrieval and search. Technical report, Stanford, CA, USA (1997)
Shokouhi, M., Zobel, J., Scholer, F., Tahaghoghi, S.M.M.: Capturing collection size for distributed non-cooperative retrieval. In: SIGIR 2006, pp. 316–323. ACM, New York (2006)
Gravano, L., Garcia-Molina, H., Tomasic, A.: Gloss: Text-source discovery over the internet. ACM Trans. Database Syst. 24(2), 229–264 (1999)
Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on the internet. In: DASFAA 1997, pp. 41–50. World Sc. Pr. (1997)
Powell, A.L., French, J.C.: Comparing the performance of collection selection algorithms. ACM Trans. Inf. Syst. 21(4), 412–456 (2003)
Fuhr, N.: A decision-theoretic approach to database selection in networked ir. ACM Trans. Inf. Syst. 17(3), 229–249 (1999)
Hawking, D., Thomas, P.: Server selection methods in hybrid portal search. In: SIGIR 2005, pp. 75–82. ACM, New York (2005)
Si, L., Callan, J.: Unified utility maximization framework for resource selection. In: CIKM 2004, pp. 32–41. ACM, New York (2004)
Si, L., Callan, J.: Modeling search engine effectiveness for federated search. In: SIGIR 2005, pp. 83–90. ACM, New York (2005)
Baillie, M., Azzopardi, L.F.: Adaptive query-based sampling of distributed collections. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 316–328. Springer, Heidelberg (2006)
Callan, J.P., Croft, W.B., Harding, S.M.: The inquery retrieval system. In: DEXA, pp. 78–83 (1992)
Lee, J.H.: Analyses of multiple evidence combination. In: SIGIR 1997, pp. 267–276. ACM, New York (1997)
Shaw, J.A., Fox, E.A.: Combination of multiple searches. In: The Second Text REtrieval Conference (TREC-2), pp. 243–252 (1994)
Aslam, J.A., Montague, M.: Models for metasearch. In: SIGIR 2001, pp. 276–284. ACM, New York (2001)
Macdonald, C., Ounis, I.: Voting for candidates: adapting data fusion techniques for an expert search task. In: CIKM 2006, pp. 387–396. ACM, New York (2006)
Ogilvie, P., Callan, J.P.: Experiments using the lemur toolkit. In: Text REtrieval Conference (2001)
Bailey, P., Craswell, N., Hawking, D.: Engineering a multi-purpose test collection for web retrieval experiments. Inf. Process. Manage. 39(6), 853–871 (2003)
Thomas, P., Hawking, D.: Evaluating sampling methods for uncooperative collections. In: SIGIR 2007, pp. 503–510. ACM, New York (2007)
Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M.: Okapi at trec-3. In: TREC-3, pp. 21–30 (1994)
Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Process. Manage. 36(2), 207–227 (2000)
Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003, pp. 143–150. ACM, New York (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paltoglou, G., Salampasis, M., Satratzemi, M. (2009). Simple Adaptations of Data Fusion Algorithms for Source Selection. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_44
Download citation
DOI: https://doi.org/10.1007/978-3-642-00958-7_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)