Abstract
This paper examines technology developed to support large-scale distributed digital libraries. We describe the method used for harvesting collection information using standard information retrieval protocols and how this information is used in collection ranking and retrieval. The system that we have developed takes a probabilistic approach to distributed information retrieval using a Logistic regression algorithm for estimation of distributed collection relevance and fusion techniques to combine multiple sources of evidence. We discuss the harvesting method used and how it can be employed in building collection representatives using features of the Z39.50 protocol. The extracted collection representatives are ranked using a fusion of probabilistic retrieval methods. The effectiveness of our algorithm is compared to other distributed search methods using test collections developed for distributed search evaluation. We also describe how this system in currently being applied to operational systems in the U.K.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Buckland, M.K., Plaunt, C.: Selecting libraries, selecting documents, selecting data. In: Proceedings of the International Symposium on Research, Development & Practice in Digital Libraries 1997, ISDL 1997, Tsukuba, Japan, Novomber 18-21, pp. 85–91 (1997) University of Library and Information Science.
Callan, J.: Distributed information retrieval. In: Croft, W.B. (ed.) Advances in Information Retrieval: Recent research from the Center for Intelligent Information Retrieval, ch. 5, pp. 127–150. Kluwer, Boston (2000)
Cooper, W.S., Gey, F.C., Chen, A.: Full text retrieval based on a probabilistic equation with coefficients fitted by logistic regression. In: Harman, D. K. (ed.) The Second Text Retrieval Conference (TREC-2), pp. 57–66, Gaithersburg, MD, NIST (1994)
French, J.C., Powell, A.L., Callan, J.P., Viles, C.L., Emmitt, T., Prey, K.J., Mou, Y.: Comparing the performance of database selection algorithms. In: SIGIR 1999, pp. 238–245 (1999)
French, J.C., Powell, A.L., Viles, C.L., Emmitt, T., Prey, K.J.: Evaluating database selection techniques: A testbed and experiment. In: SIGIR 1998, pp. 121–129 (1998)
Gravano, L., GarcÃa-Molina, H., Tomasic, A.: GlOSS: text-source discovery over the Internet. ACM Transactions on Database Systems 24(2), 229–264 (1999)
Callan, J., Connell, M.: Query-based sampling of text databases. Technical report, Center for Intelligent Information Retrieval, Dept. of Computer Science, University of Massachusetts (1999) Technical Report IR-180
Larson, R.R.: Distributed resource discovery: Using Z39.50 to build cross-domain information servers. In: JCDL 2001, pp. 52–53. ACM Press, New York (2001)
Larson, R.R.: Cheshire II at INEX: Using a hybrid logistic regression and boolean model for XML retrieval. In: Proceedings of the First Annual Workshop of the Initiative for the Evaluation of XML retrieval (INEX), page IN PRESS. DELOS workshop series (2003)
Lin, Y., Xu, J., Lim, E.-P., Ng, W.-K.: Zbroker: A query routing broker for z39.50 databases (1999)
Powell, A.L.: Database Selection in Distributed Information Retrieval: A Study of Multi-Collection Information Retrieval. PhD thesis, University of Virginia, Virginia (2001)
Varian, H., Lyman, P.: How much information? (2002), Available as http://sims.berkeley.edu/research/projects/how-much-info/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Larson, R.R. (2003). Distributed IR for Digital Libraries. In: Koch, T., Sølvberg, I.T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2003. Lecture Notes in Computer Science, vol 2769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45175-4_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-45175-4_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40726-3
Online ISBN: 978-3-540-45175-4
eBook Packages: Springer Book Archive