Skip to main content
Log in

Taking Snapshots of the Web with a TEI Camera

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

Electronic texts are claimed to exhibit features distinct from their more tangible cousins. The Snapshot project aims to observe and capture language usage in an electronic medium by creating an open corpus of World Wide Web documents. These documents are re-encoded using the TEI guidelines to create a flexible, persistent and portable data repository. This report gives an overview of the decisions made with respect to the re-encoding of HTML documents, and with the structuring the overall corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baayen, H. and R. Lieber. “Productivity and English derivation: a corpus based study”. Linguistics, 29 (1991), 801–843.

    Google Scholar 

  • Ballim, A. “Deliverable 2.5.3 Aligner v2.0”, at ftp://issco-ftp.unige.ch/pub/multext/multext_align_v2.0.tar.gz, 1996.

  • Burnard, L. and C. M. Sperberg-McQueen. “TEI Lite: An Introduction to Text Encoding for Interchange”, at ftp://www.uic.edu/orgs/tei/intros/teiu5.tei, 1995.

  • Coombs, J., A. Renear and S. DeRose. “Markup Systems and the Future of Scholarly Text Processing”. In The Digital Word. Cambridge, MA: MIT Press, 1993, pp. 85–118.

    Google Scholar 

  • Delaney, P. and G. Landow. “Managing the Digital Word: The Text in an age of Electronic Reproduction”. In The Digital Word. Cambridge, Mass: MIT Press, 1993, pp. 3–28.

    Google Scholar 

  • DeRose, S. “Markup Systems in the Present”. In The Digital Word. Cambridge, MA: MIT Press, 1993, pp. 119–135.

    Google Scholar 

  • DeRose, S. J. and D. G. Durand. “The TEI Hypertext Guidelines”. Computers in the Humanities, 29 (1995), 181–190.

    Google Scholar 

  • Dunlop, D. “Practical Considerations in the Use of TEI Headers in a Large Corpus”. Computers in the Humanities, 29 (1995), 85–198.

    Google Scholar 

  • Gartner, R., L. Burnard and P. Kidd. “A TEI Extension for the Description of Medieval Manuscripts”. In Proceedings of TEI-10, 1997, pp. 73–76.

  • Heery, R. “Frequently Asked Questions about the Extensible Markup Language”, at http://www.ucc.ie/xml/index.html#FAQ-EXTEND, 1996.

  • Janicivic, T. and D. Walker. “NeoloSearch: Automatic Detection of Neologisms in French Internet Documents”. In Proceedings of ACH/ALLC'97 29, 1996, pp. 93–94.

    Google Scholar 

  • McKelvie, D. et al. “The Normalised SGML Library LT NSL version 1.5”, at http://www.ltg.ed.ac.uk/corpora/nsldoc/nsldoc.html, 1996.

  • Langendoen, T. D. and G. F. Simons. “A Rationale for the TEI Recommendation for Feature-Structure Markup”, Computers in the Humanities, 29 (1995), 191–209.

    Google Scholar 

  • Lavagnino, J. “Completeness and Adequacy in Text Encoding”. In The Literary Text in the Digital Age. Ann Arbor: University of Michigan Press, 1996, pp. 63–76.

    Google Scholar 

  • Raggett, D. “HTML 3.2 Reference Specifications”, at http://www.w3.org/TR/REC-html32.html, 1997.

  • Simone, R. “The Body of the Text”. In The Future of the Book. Berkeley: University of California Press, 1996, pp. 239–252.

    Google Scholar 

  • Sperberg-McQueen, C. M. and L. Burnard. “The Text Encoding Initiative Guidelines (TEI P3)”, at ftp://info.ox.ac.uk/pub/ota/TEI/doc/teij31.sgml, 1995.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Walker, D. Taking Snapshots of the Web with a TEI Camera. Computers and the Humanities 33, 185–192 (1999). https://doi.org/10.1023/A:1001735413255

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1001735413255

Navigation