Abstract
Electronic texts are claimed to exhibit features distinct from their more tangible cousins. The Snapshot project aims to observe and capture language usage in an electronic medium by creating an open corpus of World Wide Web documents. These documents are re-encoded using the TEI guidelines to create a flexible, persistent and portable data repository. This report gives an overview of the decisions made with respect to the re-encoding of HTML documents, and with the structuring the overall corpus.
Similar content being viewed by others
References
Baayen, H. and R. Lieber. “Productivity and English derivation: a corpus based study”. Linguistics, 29 (1991), 801–843.
Ballim, A. “Deliverable 2.5.3 Aligner v2.0”, at ftp://issco-ftp.unige.ch/pub/multext/multext_align_v2.0.tar.gz, 1996.
Burnard, L. and C. M. Sperberg-McQueen. “TEI Lite: An Introduction to Text Encoding for Interchange”, at ftp://www.uic.edu/orgs/tei/intros/teiu5.tei, 1995.
Coombs, J., A. Renear and S. DeRose. “Markup Systems and the Future of Scholarly Text Processing”. In The Digital Word. Cambridge, MA: MIT Press, 1993, pp. 85–118.
Delaney, P. and G. Landow. “Managing the Digital Word: The Text in an age of Electronic Reproduction”. In The Digital Word. Cambridge, Mass: MIT Press, 1993, pp. 3–28.
DeRose, S. “Markup Systems in the Present”. In The Digital Word. Cambridge, MA: MIT Press, 1993, pp. 119–135.
DeRose, S. J. and D. G. Durand. “The TEI Hypertext Guidelines”. Computers in the Humanities, 29 (1995), 181–190.
Dunlop, D. “Practical Considerations in the Use of TEI Headers in a Large Corpus”. Computers in the Humanities, 29 (1995), 85–198.
Gartner, R., L. Burnard and P. Kidd. “A TEI Extension for the Description of Medieval Manuscripts”. In Proceedings of TEI-10, 1997, pp. 73–76.
Heery, R. “Frequently Asked Questions about the Extensible Markup Language”, at http://www.ucc.ie/xml/index.html#FAQ-EXTEND, 1996.
Janicivic, T. and D. Walker. “NeoloSearch: Automatic Detection of Neologisms in French Internet Documents”. In Proceedings of ACH/ALLC'97 29, 1996, pp. 93–94.
McKelvie, D. et al. “The Normalised SGML Library LT NSL version 1.5”, at http://www.ltg.ed.ac.uk/corpora/nsldoc/nsldoc.html, 1996.
Langendoen, T. D. and G. F. Simons. “A Rationale for the TEI Recommendation for Feature-Structure Markup”, Computers in the Humanities, 29 (1995), 191–209.
Lavagnino, J. “Completeness and Adequacy in Text Encoding”. In The Literary Text in the Digital Age. Ann Arbor: University of Michigan Press, 1996, pp. 63–76.
Raggett, D. “HTML 3.2 Reference Specifications”, at http://www.w3.org/TR/REC-html32.html, 1997.
Simone, R. “The Body of the Text”. In The Future of the Book. Berkeley: University of California Press, 1996, pp. 239–252.
Sperberg-McQueen, C. M. and L. Burnard. “The Text Encoding Initiative Guidelines (TEI P3)”, at ftp://info.ox.ac.uk/pub/ota/TEI/doc/teij31.sgml, 1995.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Walker, D. Taking Snapshots of the Web with a TEI Camera. Computers and the Humanities 33, 185–192 (1999). https://doi.org/10.1023/A:1001735413255
Issue Date:
DOI: https://doi.org/10.1023/A:1001735413255