Skip to main content

The Ancient Greek and Latin Dependency Treebanks

  • Conference paper
  • First Online:
Book cover Language Technology for Cultural Heritage

Abstract

This paper describes the development, composition, and several uses of the Ancient Greek and Latin Dependency Treebanks, large collections of Classical texts in which the syntactic, morphological and lexical information for each word is made explicit. To date, over 200 individuals from around the world have collaborated to annotate over 350,000 words, including the entirety of Homer’s Iliad and Odyssey, Sophocles’ Ajax, all of the extant works of Hesiod and Aeschylus, and selections from Caesar, Cicero, Jerome, Ovid, Petronius, Propertius, Sallust and Vergil. While perhaps the most straightforward value of such an annotated corpus for Classical philology is the morphosyntactic searching it makes possible, it also enables a large number of downstream tasks as well, such as inducing the syntactic behavior of lexemes and automatically identifying similar passages between texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bamman, D., Crane, G.: The design and use of a Latin dependency treebank. In: Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (TLT2006), pp. 67–78. ÚFAL MFF UK, Prague (2006)

    Google Scholar 

  2. Bamman, D., Crane, G.: The Latin Dependency Treebank in a cultural heritage digital library. In: Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007), pp. 33–40. Association for Computational Linguistics, Prague (2007). URL http://www.aclweb.org/anthology/W/W07/W07-0905

  3. Bamman, D., Crane, G.: Building a dynamic lexicon from a digital library. In: JCDL ’08: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, pp. 11–20. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/10.1145/1378889.1378892

  4. Bamman, D., Crane, G.: The logic and discovery of textual allusion. In: Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008). Marrakesh (2008)

    Google Scholar 

  5. Bamman, D., Crane, G.: Guidelines for the syntactic annotation of Ancient Greek treebanks, version 1.1. Tech. rep., Tufts Digital Library, Medford (2009)

    Google Scholar 

  6. Bamman, D., Crane, G.: Pautas para la notación sintáctica del treebank de dependencia para el griego antiguo (1.1), traducción y adaptacón al español de Alejandro Abritta. Tech. rep., Tufts Digital Library, Medford (2010)

    Google Scholar 

  7. Bamman, D., Mambrini, F., Crane, G.: An ownership model of annotation: The Ancient Greek Dependency Treebank. In: The Eighth International Workshop on Treebanks and Linguistic Theories (2009)

    Google Scholar 

  8. Bamman, D., Passarotti, M., Crane, G., Raynaud, S.: Guidelines for the syntactic annotation of Latin treebanks, version 1.3. Tech. rep., Tufts Digital Library, Medford (2007)

    Google Scholar 

  9. Bamman, D., Passarotti, M., Crane, G., Raynaud, S.: Pautas para la notación sintáctica del treebank de dependencia para el latin (1.3), traducción y adaptacón al español de Alejandro Abritta. Tech. rep., Tufts Digital Library, Medford (2010)

    Google Scholar 

  10. Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories. Sozopol (2002)

    Google Scholar 

  11. Brants, T., Franz, A.: Web 1T 5-gram Version 1. Linguistic Data Consortium, Philadelphia (2006)

    Google Scholar 

  12. Brin, S., Davis, J., García-Molina, H.: Copy detection mechanisms for digital documents. SIGMOD Rec. 24(2), 398–409 (1995). DOI http://doi.acm.org/10.1145/568271.223855

    Google Scholar 

  13. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)

    Google Scholar 

  14. Chiou, F.D., Chiang, D., Palmer, M.: Facilitating treebank annotation using a statistical parser. In: Proceedings of the First International Conference on Human Language Technology Research HLT ’01, pp. 1–4 (2001)

    Google Scholar 

  15. Chomsky, N.: Remarks on nominalization. In: R. Jacobs, P. Rosenbaum (eds.) Reading in English Transformational Grammar. Ginn, Waltham (1970)

    Google Scholar 

  16. Conington, J. (ed.): P. Vergili Maronis Opera. The Works of Virgil, with Commentary. Whittaker and Co, London (1876)

    Google Scholar 

  17. Crane, G.: From the old to the new: Integrating hypertext into traditional scholarship. In: Hypertext ’87: Proceedings of the 1st ACM conference on Hypertext, pp. 51–56. ACM Press (1987)

    Google Scholar 

  18. Crane, G.: New technologies for reading: The lexicon and the digital library. Classical World pp. 471–501 (1998)

    Google Scholar 

  19. Crane, G., Bamman, D., Cerrato, L., Jones, A., Mimno, D.M., Packel, A., Sculley, D., Weaver, G.: Beyond digital incunabula: Modeling the next generation of digital libraries. In: J. Gonzalo, C. Thanos, M.F. Verdejo, R.C. Carrasco (eds.) ECDL, Lecture Notes in Computer Science, vol. 4172, pp. 353–366. Springer (2006)

    Google Scholar 

  20. Cuzzolin, P.: On sentential complementation after verba affectuum. In: J. Herman (ed.) Linguistic Studies on Latin, pp. 167–178. Benjamins, Amsterdam-Philadelphia (1991)

    Google Scholar 

  21. Hajič, J.: Building a syntactically annotated corpus: The Prague Dependency Treebank. In: E. Hajičová (ed.) Issues of Valency and Meaning. Studies in Honor of Jarmila Panevová, pp. 12–19. Prague Karolinum, Charles University Press (1998)

    Google Scholar 

  22. Hajič, J., Smrž, O., Zemánek, P., Šnaidauf, J., Beška, E.: Prague Arabic dependency treebank: Development in data and tools. In: Proc. of the NEMLAR Intern. Conf. on Arabic Language Resources and Tools (2004)

    Google Scholar 

  23. Haug, D., Jøhndal, M.: Creating a Parallel Treebank of the Old Indo-European Bible Translations. In: Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008) (2008)

    Google Scholar 

  24. Hoad, T.C., Zobel, J.: Methods for identifying versioned and plagiarized documents. J. Am. Soc. Inf. Sci. Technol. 54(3), 203–215 (2003). DOI http://dx.doi.org/10.1002/asi.10170

    Google Scholar 

  25. Kilgarriff, A., Rychlý, P., Smrž, P., Tugwell, D.: The sketch engine. In: Proceedings of the Eleventh EURALEX International Congress, pp. 105–116 (2004). URL http://www.fit.vutbr.cz/research/view_pub.php?id=7703

  26. Klosa, A., Schnörch, U., Storjohann, P.: ELEXIKO – a lexical and lexicological, corpus-based hypertext information system at the Institut für deutsche Sprache, Mannheim. In: Proceedings of the 12th Euralex International Congress (2006)

    Google Scholar 

  27. Kroch, A., Santorini, B., Delfs, L.: Penn-Helsinki Parsed Corpus of Early Modern English. http://www.ling.upenn.edu/hist-corpora/ppceme-release-1 (2004)

  28. Kroch, A., Taylor, A.: Penn-Helsinki Parsed Corpus of Middle English, second edition. http://www.ling.upenn.edu/hist-corpora/ppcme2-release-2/ (2000)

  29. Kühner, R., Stegmann, C.: Ausführliche Grammatik der lateinischen Sprache II. Satzlehre. I. Teile Zweite Auflage. Hahnsche Buchhandlung, Hannover (1914)

    Google Scholar 

  30. Lewis, C.T., Short, C. (eds.): A Latin Dictionary. Clarendon Press, Oxford (1879)

    Google Scholar 

  31. Liddell, H.G., Scott, R., Jones, H.S., McKenzie, R. (eds.): A Greek-English Lexicon, 9th edition. Oxford University Press, Oxford (1996)

    Google Scholar 

  32. Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus. In: Proc. of the NEMLAR Intern. Conf. on Arabic Language Resources and Tools (2004)

    Google Scholar 

  33. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1994)

    Google Scholar 

  34. McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 523–530 (2005)

    Google Scholar 

  35. Mel’čuk, I.: Dependency Syntax: Theory and Practice. University of New York Press, Albany (1988)

    Google Scholar 

  36. Metzler, D., Bernstein, Y., Croft, W.B., Moffat, A., Zobel, J.: Similarity measures for tracking information flow. In: CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 517–524. ACM, New York, NY, USA (2005). DOI http://doi.acm.org/10.1145/1099554.1099695

  37. Passarotti, M.: Verso il Lessico Tomistico Biculturale. La treebank dell’Index Thomisticus. In: P. Raffaella, F. Diego (eds.) Il filo del discorso. Intrecci testuali, articolazioni linguistiche, composizioni logiche. Atti del XIII Congresso Nazionale della Società di Filosofia del Linguaggio, Viterbo, Settembre 2006, pp. 187–205. Roma, Aracne Editrice, Pubblicazioni della Società di Filosofia del Linguaggio (2007)

    Google Scholar 

  38. Pintzuk, S., Leendert, P.: York-Helsinki Parsed Corpus of Old English Poetry (2001)

    Google Scholar 

  39. Rocio, V., Alves, M.A., Lopes, J.G., Xavier, M.F., Vicente, G.: Automated creation of a Medieval Portuguese partial treebank. In: A. Abeillé (ed.) Treebanks: Building and Using Parsed Corpora, pp. 211–227. Kluwer Academic Publishers (2003)

    Google Scholar 

  40. Seo, J., Croft, W.B.: Local text reuse detection. In: SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 571–578. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/10.1145/1390334.1390432

  41. Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Dordrecht: Reidel Publishing Company and Prague: Academia (1986)

    Google Scholar 

  42. Shivakumar, N., Garcia-Molina, H.: SCAM: A copy detection mechanism for digital documents. In: In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries (1995)

    Google Scholar 

  43. Sinclair, J.M. (ed.): Looking Up: an account of the COBUILD project in lexical computing. Collins (1987)

    Google Scholar 

  44. Smyth, H.W.: Greek Grammar. Harvard University Press (1920)

    Google Scholar 

  45. Taylor, A., Marcus, M., Santorini, B.: The Penn Treebank: An overview. In: A. Abeillé (ed.) Treebanks: Building and Using Parsed Corpora, pp. 5–22. Kluwer Academic Publishers (2003)

    Google Scholar 

  46. Taylor, A., Warner, A., Pintzuk, S., Beths, F.: York-Toronto-Helsinki Parsed Corpus of Old English Prose (2003)

    Google Scholar 

  47. Tesnière, L.: Éleménts de syntaxe structurale. Klincksieck, Paris (1959)

    Google Scholar 

  48. Zeldes, A., Ritz, J., Lüdeling, A., Chiarcos, C.: Annis: A search tool for multi-layer annotated corpora. In: Proceedings of Corpus Linguistics 2009, Liverpool, July 20-23, 2009. (2009)

    Google Scholar 

  49. Zemánek, P.: A treebank of Ugaritic: Annotating fragmentary attested languages. In: Proceedings of the Sixth Workshop on Treebanks and Linguistic Theories (TLT2007), pp. 213–218. Bergen (2007)

    Google Scholar 

Download references

Acknowledgements

Grants from the Alpheios Project (“Building a Greek Treebank”), the National Endowment for the Humanities (PR-50013-08, “The Dynamic Lexicon: Cyberinfrastructure and the Automated Analysis of Historical Languages”), the Andrew W. Mellon Foundation (“The CyberEdition Project: Workflow for Textual Data in Cyberinfrastructure”), the Digital Library Initiative Phrase 2 (IIS-9817484) and the National Science Foundation (BCS-0616521) provided support for this work. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This paper is made available under a Creative Commons Attribution license.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Bamman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bamman, D., Crane, G. (2011). The Ancient Greek and Latin Dependency Treebanks. In: Sporleder, C., van den Bosch, A., Zervanou, K. (eds) Language Technology for Cultural Heritage. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20227-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20227-8_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20226-1

  • Online ISBN: 978-3-642-20227-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics