Skip to main content

The Accessibility Dimension for Structured Document Retrieval

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2291))

Included in the following conference series:

  • 476 Accesses

Abstract

Structured document retrieval aims at retrieving the document components that best satisfy a query, instead of merely retrieving pre-defined document units. This paper reports on an investigation of a tf -idf -acc approach, where tf and idf are the classical term frequency and inverse document frequency, and acc, a new parameter called accessibility, that captures the structure of documents. The tf -idf -acc approach is defined using a probabilistic relational algebra. To investigate the retrieval quality and estimate the acc values, we developed a method that automatically constructs diverse test collections of structured documents from a standard test collection, with which experiments were carried out. The analysis of the experiments provides estimates of the acc values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Baeza-Yates, R., and Ribeiro-Neto, B. Modern Information Retrieval. Addison Wesley, 1999.

    Google Scholar 

  2. Baumgarten, C. A probabilistic model for distributed information retrieval. In Proceedings of ACM-SIGIR Conference on Research and Development in Information Retrieval (Philadelphia, USA, 1997), pp. 258–266.

    Google Scholar 

  3. Bordogna, G., and Pasi, G. Flexible querying of structured documents. In Proceedings of Flexible Query Answering Systems (FQAS) (Warsaw, Poland, 2000), pp. 350–361.

    Google Scholar 

  4. Chellas, B. Modal Logic. Cambridge University Press, 1980.

    Google Scholar 

  5. Chiaramella, Y. Browsing and querying: two complementary approaches for multimedia information retrieval. In Proceedings Hypermedia-Information Retrieval-Multimedia (Dortmund, Germany, 1997). Invited talk.

    Google Scholar 

  6. Chiaramella, Y., Mulhem, P., and Fourel, F. A model for multimedia information retrieval. Tech. Rep. Fermi ESPRIT BRA 8134, University of Glasgow, 1996.

    Google Scholar 

  7. Edwards, D., and Hardman, L. Lost in hyperspace: Cognitive navigation in a hypertext environment. In Hypertext: Theory Into Practice (1993), pp. 90–105.

    Google Scholar 

  8. Frisse, M. Searching for information in a hypertext medical handbook. Communications of the ACM 31, 7 (1988), 880–886.

    Article  Google Scholar 

  9. Fuhr, N., and Roelleke, T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems 14, 1 (1997).

    Google Scholar 

  10. Iweha, C. Visualisation of Structured Documents: An Investigation into the Role of Visualising Structure for Information Retrieval Interfaces and Human Computer Interaction. PhD thesis, Queen Marty & Westfield College, 1999.

    Google Scholar 

  11. Lalmas, M., and Moutogianni, E. A Dempster-Shafer indexing for the focussed retrieval of hierarchically structured documents: Implememtation and experiments on a web museum collection. In 6th RIAO Conference, Content-Based Multimedia Information Access (Paris, France, 2000).

    Google Scholar 

  12. Lalmas, M., and Roelleke, T. Four-valued knowledge augmentation for structured document retrieval. Submitted for Publication.

    Google Scholar 

  13. Lalmas, M., and Ruthven, I. Representing and retrieving structured documents with Dempster-Shafer’s theory of evidence: Modelling and evaluation. Journal of Documentation 54, 5 (1998), 529–565.

    Article  Google Scholar 

  14. Mizzaro, S. Relevance: The whole story. Journal of the America Society for Information Science 48, 9 (1997), 810–832.

    Article  Google Scholar 

  15. Myaeng, S., Jang, D. H., Kim, M. S., and Zhoo, Z. C. A flexible model for retrieval of SGML documents. In Proceedings of ACM-SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia, 1998), pp. 138–145.

    Google Scholar 

  16. Quicker, S. Relevanzuntersuchung fur das Retrieval von strukturierten Dokumenten. Master’s thesis, University of Dortmund, 1998.

    Google Scholar 

  17. Roelleke, T. POOL: Probabilistic Object-Oriented Logical Representation and Retrieval of Complex Objects-A Model for Hypermedia Retrieva. PhD thesis, University of Dortmund, Germany, 1999.

    Google Scholar 

  18. van Rijsbergen, C. J. Information Retrieval, 2 ed. Butterworths, London, 1979.

    Google Scholar 

  19. Voorhees, E., and Harman, D. Overview of the Fifth Text REtrieval Conference (TREC-5). In Proceedings of the 5th Text Retrieval Conference (Gaitherburg, 1996), pp. 1–29.

    Google Scholar 

  20. Wilkinson, R. Effective retrieval of structured documents. In Proceedings of ACM-SIGIR Conference on Research and Development in Information Retrieval (Dublin, Ireland, 1994), pp. 311–317.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Roelleke, T., Lalmas, M., Kazai, G., Ruthven, I., Quicker, S. (2002). The Accessibility Dimension for Structured Document Retrieval. In: Crestani, F., Girolami, M., van Rijsbergen, C.J. (eds) Advances in Information Retrieval. ECIR 2002. Lecture Notes in Computer Science, vol 2291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45886-7_19

Download citation

  • DOI: https://doi.org/10.1007/3-540-45886-7_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43343-9

  • Online ISBN: 978-3-540-45886-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics