Abstract
The paper presents a model for typed, structured documents to improve the quality of retrieval. The document type determines the representative characteristics of the document content. Document content is indexed in order to provide structured and more precise queries. This paper presents a formalism for typed, structured documents and defines a suite of tools that operate on typed documents. In particular, we define document creation, document verification, and document translation. In addition the paper presents performance measurements for retrieval of structured documents, based on established recall and precision tests in information theory.
Preview
Unable to display preview. Download preview PDF.
References
Yahoo, a search index. Available at URL: http://www.yahoo.com.
Bob Alberti, Farhad Anklesaria, Paul Linder, McCahill, and Daniel MarkTorrey. Exploring the Internet Gopherspace. Internet Society News, 1(2).
C. Mic Bowman and Chanda Dharap. The Enterprise Distributed White-pages Service. In Proceedings of the Winter 1998 USENIX Conference, January 1993.
Mic Bowman, Chanda Dharap, Mrinal Baruah, Bill Camargo, and Sunil Potti. A File System for Information Management. In Prooceedings of the International Conference on Intelligent Information Management Systems, Washington D.C., March 1994.
Vincent Cate. Alex — A Global Filesystem. In Proceedings of the Usenix Filesystem Workshop, pages 1–11, Ann Arbor, Michigan, May 1992. USENIX.
Chanda Dharap. Typed and Structured Systems for Wide-Area Information Management. PhD thesis, The Pennsylvania State University, 1996.
Alan Emtage and Peter Deutsch. Archie — An Electronic Directory Service for the Internet. 1992.
Fredrick C. Gey. Inferring the probablity of relevance using the method of logistic precision,. In Proceedings of the 17th Annnual International ACM/SIGIR Conference on Research in Information Retrieval, 1994.
David K. Gifford, Pierre Jouvelot, Mark A. Sheldon, and James W. Jr O'Toole. Semantic File Systems. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 16–25, Oct. 1991.
David K. Gifford, Roger M. Needham, and Michael D. Schroeder. The Cedar File System. Communications of the ACM, 31(3):288–298, March 1988.
William Hersh, Chris Buckley, Leone T.J., and David. Hickam. Ohsumed: An interactive retrieval evaluation and new large test collections for research. In Proceedings of the 17th Annnual International ACM/SIGIR Conference on Research in Information Retrieval, 1994.
M. L. Mauldin. Measuring the Web with Lycos (poster presentation). In Proceedings of the Third International World-Wide Web Conference (WWW'95), April 1995.
Magdi M. A. Morsi and Shamkant Navathe. Application and system prototyping via an extensible object-oriented environment. In Proceedings of the 12th International Conference on the Entity-Relationship Approach, LNCS 823, pages 24–33, Arlington, Texas, USA, December 1993. Springer-Verlag.
Clifford B. Neuman. Prospero: A Tool for Organizing Internet Resources. Electronic Networking: Research, Applications, and Policy, Spring 1992.
Yasuhiko Yokote, Fumio Teraoka, and Mario Tokoro. A reflective architecture for an object-oriented distributed operating system. In Stephen Cook, editor, ecoop89, pages 89–106. Cambridge University Press, July 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dharap, C., Bowman, C.M. (1997). Typed structured documents for information retrieval. In: Nicholas, C., Wood, D. (eds) Principles of Document Processing. PODP 1996. Lecture Notes in Computer Science, vol 1293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63620-X_60
Download citation
DOI: https://doi.org/10.1007/3-540-63620-X_60
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63620-5
Online ISBN: 978-3-540-69614-8
eBook Packages: Springer Book Archive