Metrics for evaluating database selection techniques

French, James C.; Powell, Allison L.

doi:10.1023/A:1019241915635

Metrics for evaluating database selection techniques

Published: November 2000

Volume 3, pages 153–163, (2000)
Cite this article

World Wide Web Aims and scope Submit manuscript

James C. French &
Allison L. Powell

116 Accesses
6 Citations
Explore all metrics

Abstract

The increasing availability of online databases and other information resources in digital libraries and on the World Wide Web has created the need for efficient and effective algorithms for selecting databases to search. A number of techniques have been proposed for query routing or database selection. We have developed a methodology and metrics that can be used to directly compare competing techniques. They can also be used to isolate factors that influence the performance of these techniques so that we can better understand performance issues. In this paper we describe the methodology we have used to examine the performance of database selection algorithms such as gGlOSS and CORI. In addition we develop the theory behind a “random” database selection algorithm and show how it can be used to help analyze the behavior of realistic database selection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Callan, J., A.L. Powell, J.C. French, and M. Connell (2000), “The Effects of Query-Based Sampling on Automatic Database Selection Algorithms,” Technical Report CMU-LTI-00-162, Language Technologies Institute, School of Computer Science, Carnegie Mellon University.
Callan, J.P., Z. Lu, and W.B. Croft (1995), “Searching Distributed Collections with Inference Networks,” In Proceedings of the 18th International Conference on Research and Development in Information Retrieval, pp. 21–29.
French, J.C., A.L. Powell, C.L. Viles, T. Emmitt, and K.J. Prey (1998), “Evaluating Database Selection Techniques: A Testbed and Experiment,” In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W.B. Croft, A. Moffat, and C.J. van Rijsbergen, Eds., Melbourne, Australia, pp. 121–129.
French, J.C., A.L. Powell, and J. Callan (1999a), “Effective and Efficient Automatic Database Selection,” Technical Report CS-99-08, Department of Computer Science, University of Virginia.
French, J.C., A.L. Powell, J. Callan, C.L. Viles, T. Emmitt, K.J. Prey, and Y. Mou (1999b), “Comparing the Performance of Database Selection Algorithms,” In Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 238–245.
Fuhr, N. (1999), “A Decision-Theoretic Approach to Database Selection in Networked IR,” ACM Transactions on Information Systems 17, 3, 229–249.
Article Google Scholar
Gibbons, J.D. (1976), Nonparametric Methods for Quantitative Analysis, Holt, Rinehart and Winston.
Gravano, L. and H. García-Molina (1995), “Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies,” In Proceedings of the 21st International Conference on Very Large Databases (VLDB), Zurich, Switzerland, pp. 78–89.
Gravano, L., H. García-Molina, and A. Tomasic (1994a), “Precision and Recall of GlOSS Estimators for Database Discovery,” In Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems, Austin, TX, pp. 103–106.
Gravano, L., H. García-Molina, and A. Tomasic (1994b), “The Effectiveness of GlOSS for the Text Database Discovery Problem,” In SIGMOD 94, Minneapolis, MN, pp. 126–137.
Gravano, L., H. García-Molina, and A. Tomasic (1999), “GlOSS: Text-Source Discovery over the Internet,” ACM Transactions on Database Systems, to appear.
Harman, D. (1996), “Overview of the Fourth Text Retrieval Conference (TREC-4),” In Proceedings of the 4th Text Retrieval Conference (TREC-4), Gaithersburg, MD.
Hawking, D. and P. Thistlewaite (1999), “Methods for Information Server Selection,” ACM Transactions on Information Systems 17, 1, 40–76.
Article Google Scholar
Larson, H.J. (1974), Introduction to Probability Theory and Statistical Inference, 2nd Edition, Wiley, New York.
MATH Google Scholar
Losee, R.M. (1995), “Determining Information Retrieval and Filtering Performance without Experimentation,” Information Processing & Management 31, 4, 555–572.
Article Google Scholar
Lu, Z., J.P. Callan, and W.B. Croft (1996), “Measures in Collection Ranking Evaluation,” Technical Report TR-96-39, Computer Science Department, University of Massachusetts.
Meng, W., K.-L. Liu, C. Yu, X. Wang, Y. Chang, and N. Rishe (1998), “Determining Text Databases to Search in the Internet,” In Proceedings of the 24th VLDB Conference, New York, pp. 14–25.
Moffat, A. and J. Zobel (1995), “Information Retrieval Systems for Large Document Collections,” In Proceedings of the 3rd Text Retrieval Conference (TREC-3), Gaithersburg, MD, pp. 85–94.
Powell, A.L., J.C. French, J. Callan, M. Connell, and C.L. Viles (2000), “Measuring the Impact of Database Selection on Distributed Searching,” In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, to appear.
Tomasic, A., L. Gravano, C. Lue, P. Schwarz, and L. Haas (1997), “Data Structures for Efficient Broker Implementation,” ACM Transactions on Information Systems 15, 3, 223–253.
Article Google Scholar
Xu, J. and J. Callan (1998), “Effective Retrieval with Distributed Collections,” In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 112–120.
Xu, J. and W.B. Croft (1999), “Cluster-based Language Models for Distributed Retrieval,” In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 254–261.
Yuwono, B. and D.L. Lee (1997), “Server Ranking for Distributed Text Retrieval Systems on Internet,” In Proceedings of the 5th International Conference on Database Systems for Advanced Applications, Melbourne, Australia, pp. 41–49.
Zobel, J. (1997), “Collection Selection via Lexicon Inspection,” In Proceedings of the 1997 Australian Document Computing Symposium, Melbourne, Australia, pp. 74–80.

Download references

Authors

James C. French
View author publications
You can also search for this author in PubMed Google Scholar
Allison L. Powell
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

French, J.C., Powell, A.L. Metrics for evaluating database selection techniques. World Wide Web 3, 153–163 (2000). https://doi.org/10.1023/A:1019241915635

Download citation

Issue Date: November 2000
DOI: https://doi.org/10.1023/A:1019241915635

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Metrics for evaluating database selection techniques

Abstract

Access this article

Similar content being viewed by others

Data dependencies for query optimization: a survey

Databases and Data Retrieval

Have query optimizers hit the wall?

References

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Metrics for evaluating database selection techniques

Abstract

Access this article

Similar content being viewed by others

Data dependencies for query optimization: a survey

Databases and Data Retrieval

Have query optimizers hit the wall?

References

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation