Skip to main content

Typed structured documents for information retrieval

  • Conference paper
  • First Online:
Principles of Document Processing (PODP 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1293))

Included in the following conference series:

  • 99 Accesses

Abstract

The paper presents a model for typed, structured documents to improve the quality of retrieval. The document type determines the representative characteristics of the document content. Document content is indexed in order to provide structured and more precise queries. This paper presents a formalism for typed, structured documents and defines a suite of tools that operate on typed documents. In particular, we define document creation, document verification, and document translation. In addition the paper presents performance measurements for retrieval of structured documents, based on established recall and precision tests in information theory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yahoo, a search index. Available at URL: http://www.yahoo.com.

    Google Scholar 

  2. Bob Alberti, Farhad Anklesaria, Paul Linder, McCahill, and Daniel MarkTorrey. Exploring the Internet Gopherspace. Internet Society News, 1(2).

    Google Scholar 

  3. C. Mic Bowman and Chanda Dharap. The Enterprise Distributed White-pages Service. In Proceedings of the Winter 1998 USENIX Conference, January 1993.

    Google Scholar 

  4. Mic Bowman, Chanda Dharap, Mrinal Baruah, Bill Camargo, and Sunil Potti. A File System for Information Management. In Prooceedings of the International Conference on Intelligent Information Management Systems, Washington D.C., March 1994.

    Google Scholar 

  5. Vincent Cate. Alex — A Global Filesystem. In Proceedings of the Usenix Filesystem Workshop, pages 1–11, Ann Arbor, Michigan, May 1992. USENIX.

    Google Scholar 

  6. Chanda Dharap. Typed and Structured Systems for Wide-Area Information Management. PhD thesis, The Pennsylvania State University, 1996.

    Google Scholar 

  7. Alan Emtage and Peter Deutsch. Archie — An Electronic Directory Service for the Internet. 1992.

    Google Scholar 

  8. Fredrick C. Gey. Inferring the probablity of relevance using the method of logistic precision,. In Proceedings of the 17th Annnual International ACM/SIGIR Conference on Research in Information Retrieval, 1994.

    Google Scholar 

  9. David K. Gifford, Pierre Jouvelot, Mark A. Sheldon, and James W. Jr O'Toole. Semantic File Systems. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 16–25, Oct. 1991.

    Google Scholar 

  10. David K. Gifford, Roger M. Needham, and Michael D. Schroeder. The Cedar File System. Communications of the ACM, 31(3):288–298, March 1988.

    Article  Google Scholar 

  11. William Hersh, Chris Buckley, Leone T.J., and David. Hickam. Ohsumed: An interactive retrieval evaluation and new large test collections for research. In Proceedings of the 17th Annnual International ACM/SIGIR Conference on Research in Information Retrieval, 1994.

    Google Scholar 

  12. M. L. Mauldin. Measuring the Web with Lycos (poster presentation). In Proceedings of the Third International World-Wide Web Conference (WWW'95), April 1995.

    Google Scholar 

  13. Magdi M. A. Morsi and Shamkant Navathe. Application and system prototyping via an extensible object-oriented environment. In Proceedings of the 12th International Conference on the Entity-Relationship Approach, LNCS 823, pages 24–33, Arlington, Texas, USA, December 1993. Springer-Verlag.

    Google Scholar 

  14. Clifford B. Neuman. Prospero: A Tool for Organizing Internet Resources. Electronic Networking: Research, Applications, and Policy, Spring 1992.

    Google Scholar 

  15. Yasuhiko Yokote, Fumio Teraoka, and Mario Tokoro. A reflective architecture for an object-oriented distributed operating system. In Stephen Cook, editor, ecoop89, pages 89–106. Cambridge University Press, July 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Charles Nicholas Derick Wood

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dharap, C., Bowman, C.M. (1997). Typed structured documents for information retrieval. In: Nicholas, C., Wood, D. (eds) Principles of Document Processing. PODP 1996. Lecture Notes in Computer Science, vol 1293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63620-X_60

Download citation

  • DOI: https://doi.org/10.1007/3-540-63620-X_60

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63620-5

  • Online ISBN: 978-3-540-69614-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics