Skip to main content
Log in

XML and the TEI

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

Electronic texts are claimed to exhibit features distinct from their more tangible cousins. The Snapshot project aims to observe and capture language usage in an electronic medium by creating an open corpus of World Wide Web documents. These documents are re-encoded using the TEI guidelines to create a flexible, persistent and portable data repository. This report gives an overview of the decisions made with respect to the re-encoding of HTML documents, and with the structuring the overall corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abiteboul, Serge et al. “Querying Documents in Object Databases”. International Journal on Digital Libraries 1(1) (1997), 5–19.

    Google Scholar 

  • Association for Computers and the Humanities (ACH), Association for Computational Linguistics (ACL), and Association for Literary and Linguistic Computing (ALLC). Guidelines for Electronic Text Encoding and Interchange (TEI P3)〈/title〉. Ed. C. M. Sperberg-McQueen and Lou Burnard, Chicago, Oxford: Text Encoding Initiative, 1994. Also available from ftp://ftp-tei.uic.edu/pub/tei.

    Google Scholar 

  • Bray, T., J. Paoli, and C. M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0. W3C Recommendation, 10-February-1998.

  • Brüggemann-Klein, Anne. Regular Expressions into Finite Automata. Universität Freiburg, Institut für Informatik, 1991.

  • Brüggemann-Klein, A. and D. Wood. Deterministic Regular Languages. Universität Freiburg, Institut für Informatik, 1991.

  • Burkowski, F. J. “An Algebra for Hierarchically Organized Text-Dominated Databases”. Waterloo, Ontario, Canada: Department of Computer Science, University of Waterloo. Manuscript: Portions “appeared as part of a paper presented at RIAO '91: Intelligent Text and Image Handling, Barcelona, Spain, Apr. 1991.”

  • Catano, J. V. “Poetry and Computers: Experimenting with the Communal Text”. Computers and the Humanities 13(9) (1979), 269–275.

    Google Scholar 

  • Coombs, J. H., A. H. Renear, and S. J. DeRose. “Markup Systems and the Future of Scholarly Text Processing”. Communications of the Association for Computing Machinery 30(11) (1987), 933–947.

    Google Scholar 

  • DeRose S. J., D. G. Durand, E. Mylonas and A. H. Renear. “What is Text, Really?” Journal of Computing in Higher Education 1(2) (1990), 3–26.

    Google Scholar 

  • DeRose, S. J. “Expanding the Notion of Links”. In Proceedings of Hypertext '89, Pittsburgh, PA, Baltimore, MD: Association for Computing Machinery Press, 1989.

    Google Scholar 

  • DeRose, S. J. The SGML FAQ Book: Understanding the Foundations of SGML and XML. Boston: Kluwer Academic Publishers. ISBN 0–7923–9943–9, 1997.

    Google Scholar 

  • DeRose, S. and E. Maler, Eds. “XML Linking Language (XLink)”. World Wide Web Consortium Working Draft. March 1998. http://www.w3.org/TR/1998/WD-xlink-19980303.

  • DeRose, S. and E. Maler, Eds. “XML Pointer Language (XPointer)”. World Wide Web Consortium Working Draft. March 1998. http://www.w3.org/TR/1998/WD-xpointer-19980303.

  • International Organisation for Standardisation. ISO/IEC 10744. Hypermedia/Time-based Structuring Language: HyTime, 1992.

  • Reid, B. “A High-level Approach to Computer Document Formatting”. Conference Record of the Seventh Annual ACM Symposium on Principles of Programming Languages, January, 1980.

  • Reid, B. Scribe: A Document Specification Language and its Compiler. Ph.D. thesis, Carnegie-Mellon University, Pittsburgh, PA. Also available as Technical Report CMU-CS–81–100, 1981.

  • Rice, S. “Editorial Text Structures (with some relations to information structures and format controls in computerized composition)”. Memo to ANSI Standards Planning and Requirements Committee. March 17, 1970.

  • Shannon, C. E. and W. Weaver. The Mathematical Theory of Communication, Reprinted, Urbana: University of Illinois Press, 1971 (1949).

  • Subramanian, B., T.W. Leung, S. L. Vandenberg and S. B. Zdonik. “The AQUA Approach to Querying Lists and Trees in Object-Oriented Databases”. Presented at the International Conference on Data Engineering, Taipei, Taiwan. Available from the authors, 1995.

  • Trigg, R. H. “Guided Tours and Tabletops: Tools for Communicating in a Hypertext Environment”. In ACM Transactions on Office Information Systems, 6.4 (October 1988), 398–414.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

DeRose, S. XML and the TEI. Computers and the Humanities 33, 11–30 (1999). https://doi.org/10.1023/A:1001771114509

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1001771114509

Navigation