Abstract
Structured document retrieval aims at retrieving the document components that best satisfy a query, instead of merely retrieving pre-defined document units. This paper reports on an investigation of a tf -idf -acc approach, where tf and idf are the classical term frequency and inverse document frequency, and acc, a new parameter called accessibility, that captures the structure of documents. The tf -idf -acc approach is defined using a probabilistic relational algebra. To investigate the retrieval quality and estimate the acc values, we developed a method that automatically constructs diverse test collections of structured documents from a standard test collection, with which experiments were carried out. The analysis of the experiments provides estimates of the acc values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yates, R., and Ribeiro-Neto, B. Modern Information Retrieval. Addison Wesley, 1999.
Baumgarten, C. A probabilistic model for distributed information retrieval. In Proceedings of ACM-SIGIR Conference on Research and Development in Information Retrieval (Philadelphia, USA, 1997), pp. 258–266.
Bordogna, G., and Pasi, G. Flexible querying of structured documents. In Proceedings of Flexible Query Answering Systems (FQAS) (Warsaw, Poland, 2000), pp. 350–361.
Chellas, B. Modal Logic. Cambridge University Press, 1980.
Chiaramella, Y. Browsing and querying: two complementary approaches for multimedia information retrieval. In Proceedings Hypermedia-Information Retrieval-Multimedia (Dortmund, Germany, 1997). Invited talk.
Chiaramella, Y., Mulhem, P., and Fourel, F. A model for multimedia information retrieval. Tech. Rep. Fermi ESPRIT BRA 8134, University of Glasgow, 1996.
Edwards, D., and Hardman, L. Lost in hyperspace: Cognitive navigation in a hypertext environment. In Hypertext: Theory Into Practice (1993), pp. 90–105.
Frisse, M. Searching for information in a hypertext medical handbook. Communications of the ACM 31, 7 (1988), 880–886.
Fuhr, N., and Roelleke, T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems 14, 1 (1997).
Iweha, C. Visualisation of Structured Documents: An Investigation into the Role of Visualising Structure for Information Retrieval Interfaces and Human Computer Interaction. PhD thesis, Queen Marty & Westfield College, 1999.
Lalmas, M., and Moutogianni, E. A Dempster-Shafer indexing for the focussed retrieval of hierarchically structured documents: Implememtation and experiments on a web museum collection. In 6th RIAO Conference, Content-Based Multimedia Information Access (Paris, France, 2000).
Lalmas, M., and Roelleke, T. Four-valued knowledge augmentation for structured document retrieval. Submitted for Publication.
Lalmas, M., and Ruthven, I. Representing and retrieving structured documents with Dempster-Shafer’s theory of evidence: Modelling and evaluation. Journal of Documentation 54, 5 (1998), 529–565.
Mizzaro, S. Relevance: The whole story. Journal of the America Society for Information Science 48, 9 (1997), 810–832.
Myaeng, S., Jang, D. H., Kim, M. S., and Zhoo, Z. C. A flexible model for retrieval of SGML documents. In Proceedings of ACM-SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia, 1998), pp. 138–145.
Quicker, S. Relevanzuntersuchung fur das Retrieval von strukturierten Dokumenten. Master’s thesis, University of Dortmund, 1998.
Roelleke, T. POOL: Probabilistic Object-Oriented Logical Representation and Retrieval of Complex Objects-A Model for Hypermedia Retrieva. PhD thesis, University of Dortmund, Germany, 1999.
van Rijsbergen, C. J. Information Retrieval, 2 ed. Butterworths, London, 1979.
Voorhees, E., and Harman, D. Overview of the Fifth Text REtrieval Conference (TREC-5). In Proceedings of the 5th Text Retrieval Conference (Gaitherburg, 1996), pp. 1–29.
Wilkinson, R. Effective retrieval of structured documents. In Proceedings of ACM-SIGIR Conference on Research and Development in Information Retrieval (Dublin, Ireland, 1994), pp. 311–317.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Roelleke, T., Lalmas, M., Kazai, G., Ruthven, I., Quicker, S. (2002). The Accessibility Dimension for Structured Document Retrieval. In: Crestani, F., Girolami, M., van Rijsbergen, C.J. (eds) Advances in Information Retrieval. ECIR 2002. Lecture Notes in Computer Science, vol 2291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45886-7_19
Download citation
DOI: https://doi.org/10.1007/3-540-45886-7_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43343-9
Online ISBN: 978-3-540-45886-9
eBook Packages: Springer Book Archive