Skip to main content
Log in

Metrics for evaluating database selection techniques

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The increasing availability of online databases and other information resources in digital libraries and on the World Wide Web has created the need for efficient and effective algorithms for selecting databases to search. A number of techniques have been proposed for query routing or database selection. We have developed a methodology and metrics that can be used to directly compare competing techniques. They can also be used to isolate factors that influence the performance of these techniques so that we can better understand performance issues. In this paper we describe the methodology we have used to examine the performance of database selection algorithms such as gGlOSS and CORI. In addition we develop the theory behind a “random” database selection algorithm and show how it can be used to help analyze the behavior of realistic database selection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Callan, J., A.L. Powell, J.C. French, and M. Connell (2000), “The Effects of Query-Based Sampling on Automatic Database Selection Algorithms,” Technical Report CMU-LTI-00-162, Language Technologies Institute, School of Computer Science, Carnegie Mellon University.

  • Callan, J.P., Z. Lu, and W.B. Croft (1995), “Searching Distributed Collections with Inference Networks,” In Proceedings of the 18th International Conference on Research and Development in Information Retrieval, pp. 21–29.

  • French, J.C., A.L. Powell, C.L. Viles, T. Emmitt, and K.J. Prey (1998), “Evaluating Database Selection Techniques: A Testbed and Experiment,” In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W.B. Croft, A. Moffat, and C.J. van Rijsbergen, Eds., Melbourne, Australia, pp. 121–129.

  • French, J.C., A.L. Powell, and J. Callan (1999a), “Effective and Efficient Automatic Database Selection,” Technical Report CS-99-08, Department of Computer Science, University of Virginia.

  • French, J.C., A.L. Powell, J. Callan, C.L. Viles, T. Emmitt, K.J. Prey, and Y. Mou (1999b), “Comparing the Performance of Database Selection Algorithms,” In Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 238–245.

  • Fuhr, N. (1999), “A Decision-Theoretic Approach to Database Selection in Networked IR,” ACM Transactions on Information Systems 17, 3, 229–249.

    Article  Google Scholar 

  • Gibbons, J.D. (1976), Nonparametric Methods for Quantitative Analysis, Holt, Rinehart and Winston.

  • Gravano, L. and H. García-Molina (1995), “Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies,” In Proceedings of the 21st International Conference on Very Large Databases (VLDB), Zurich, Switzerland, pp. 78–89.

  • Gravano, L., H. García-Molina, and A. Tomasic (1994a), “Precision and Recall of GlOSS Estimators for Database Discovery,” In Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems, Austin, TX, pp. 103–106.

  • Gravano, L., H. García-Molina, and A. Tomasic (1994b), “The Effectiveness of GlOSS for the Text Database Discovery Problem,” In SIGMOD 94, Minneapolis, MN, pp. 126–137.

  • Gravano, L., H. García-Molina, and A. Tomasic (1999), “GlOSS: Text-Source Discovery over the Internet,” ACM Transactions on Database Systems, to appear.

  • Harman, D. (1996), “Overview of the Fourth Text Retrieval Conference (TREC-4),” In Proceedings of the 4th Text Retrieval Conference (TREC-4), Gaithersburg, MD.

  • Hawking, D. and P. Thistlewaite (1999), “Methods for Information Server Selection,” ACM Transactions on Information Systems 17, 1, 40–76.

    Article  Google Scholar 

  • Larson, H.J. (1974), Introduction to Probability Theory and Statistical Inference, 2nd Edition, Wiley, New York.

    MATH  Google Scholar 

  • Losee, R.M. (1995), “Determining Information Retrieval and Filtering Performance without Experimentation,” Information Processing & Management 31, 4, 555–572.

    Article  Google Scholar 

  • Lu, Z., J.P. Callan, and W.B. Croft (1996), “Measures in Collection Ranking Evaluation,” Technical Report TR-96-39, Computer Science Department, University of Massachusetts.

  • Meng, W., K.-L. Liu, C. Yu, X. Wang, Y. Chang, and N. Rishe (1998), “Determining Text Databases to Search in the Internet,” In Proceedings of the 24th VLDB Conference, New York, pp. 14–25.

  • Moffat, A. and J. Zobel (1995), “Information Retrieval Systems for Large Document Collections,” In Proceedings of the 3rd Text Retrieval Conference (TREC-3), Gaithersburg, MD, pp. 85–94.

  • Powell, A.L., J.C. French, J. Callan, M. Connell, and C.L. Viles (2000), “Measuring the Impact of Database Selection on Distributed Searching,” In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, to appear.

  • Tomasic, A., L. Gravano, C. Lue, P. Schwarz, and L. Haas (1997), “Data Structures for Efficient Broker Implementation,” ACM Transactions on Information Systems 15, 3, 223–253.

    Article  Google Scholar 

  • Xu, J. and J. Callan (1998), “Effective Retrieval with Distributed Collections,” In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 112–120.

  • Xu, J. and W.B. Croft (1999), “Cluster-based Language Models for Distributed Retrieval,” In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 254–261.

  • Yuwono, B. and D.L. Lee (1997), “Server Ranking for Distributed Text Retrieval Systems on Internet,” In Proceedings of the 5th International Conference on Database Systems for Advanced Applications, Melbourne, Australia, pp. 41–49.

  • Zobel, J. (1997), “Collection Selection via Lexicon Inspection,” In Proceedings of the 1997 Australian Document Computing Symposium, Melbourne, Australia, pp. 74–80.

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

French, J.C., Powell, A.L. Metrics for evaluating database selection techniques. World Wide Web 3, 153–163 (2000). https://doi.org/10.1023/A:1019241915635

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1019241915635

Keywords

Navigation