Skip to main content
Log in

Emerging language technologies and the rediscovery of the past: a research agenda

  • Regular contribution
  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

During 2001 and 2002, our Delos/NSF working group explored the possibilities that emerging language technologies open up for teaching, learning, and research in the broad area of cultural heritage. On the one hand, emerging language technologies will profoundly redefine the research and teaching of all those working with cultural heritage languages. At the same time, developers of language technology would also benefit from exploring the needs of new audiences and new collections. While multilingual technologies may ultimately prove the most revolutionary, this report focuses on monolingual technologies such as information extraction, summarization, and other aspects of document understanding. In this paper, we describe some of the audiences affected and technologies to be evaluated and argue for the creation of venues where the application of these technologies to cultural heritage materials can be rigorously evaluated. The potential impact of language technologies for our understanding of the past will emerge over a long period of time and will doubtless include many techniques not covered here. We make no claim to a comprehensive survey. Our goal is to provide enough information to suggest the potential importance of these new technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Rosenzweig R, Thelen DP (1998) The presence of the past: popular uses of history in American life – supplementary Web site. George Mason University, Fairfax, VA

  2. Rosenzweig R (2002) Everyone a historian – afterthoughts to the presence of the past. George Mason University, Fairfax, VA

  3. D’Addezio IJ (2002) United States Historical Society Directory. D’Addezio

  4. Listokin D, Lahr ML (1997) Economic impacts of historic preservation. New Jersey Historic Trust, Trenton, NJ, p 484

  5. Leithe J, Tigue P (1999) Profiting from the past: the economic impact of historic preservation in Georgia. Georgia Historic Preservation Division, Athens, GA. http://www.gashpo.org. 26

  6. Listokin D et al (2002) Economic Impacts of Historic Preservation in Florida. Florida Department of State, Division of Historic Resources, Bureau of Historic Preservation

  7. Commission MH (2002) Preservation works: the economics of preservation. In: Massachusettts Historic Preservation Conference, 27 September 2002

  8. Burns K et al (1989) The Civil War. PBS Video, Alexandria, VA

  9. Toplin RB (1996) Ken Burns’s The Civil War: the historian’s response. Oxford University Press, New York xxvii, 197

  10. Rosenzweig R, Thelen DP (1998) The presence of the past: popular uses of history in American life. Columbia University Press, New York x, 291

  11. Maynard D et al (2002) Adapting a robust multi-genre NE system for automatic content extraction. In: 10th international conference on artificial intelligence: methodology, systems, applications

  12. Bikel D et al (1997) Nymble: a high-performance learning name-finder. In: Proceedings of the 5th ACM conference on applied natural language processing, pp 194–201

  13. Technologies B, AFRL/IFED (2001) Information extraction (IE) technology for counterdrug applications. Department of Defense: Counterdrug Technology Development Program, Washington, DC, p 5

  14. Sperberg-McQueen CM, Burnard L (eds) (2001) TEI P4: Guidelines for electronic text encoding and interchange – XML-compatible version. TEI-Consortium

  15. Sperberg-McQueen CM, Burnard L (1990) Guidelines for the encoding and interchange of machine-readable texts, version 1.0 ed. The Association for Computers and the Humanities; the Association for Computational Linguistics; the Association for Literacy and Linguistic Computing

  16. Anand P et al (2001) Qanda and the Catalyst architecture. In: 10th Text REtrieval Conference (TREC 2001). Department of Commerce, National Institute of Standards and Technology, Gaithersburg, MD

  17. Grishman R, Contractors TPI (1998) TIPSTER Text Architecture Design. New York University Press, New York, p 70

  18. Program, ACftTTPI (1996) TIPSTER Text Phase II Architecture Concept

  19. Cunningham H et al (2002) Developing language processing components with GATE (a user guide). University of Sheffield Press, Sheffield, UK

  20. Bird S et al (2002) TableTrans, MultiTrans, InterTrans and TreeTrans: diverse tools built on the Annotation Graph Toolkit. In: Proceedings of the 3rd international conference on language resources and evaluation, European Language Resources Association, Paris

  21. Bird S, Liberman M (2001) A formal framework for linguistic annotation. Speech Commun 33(1–2):23–60

    Article  MATH  Google Scholar 

  22. Cotton S, Bird S (2002) An integrated framework for treebanks and multilayer annotations. In: 3rd international conference on language resources and evaluation, European Language Resources Association, Paris

  23. Miller E et al (2001) W3C Semantic Web. W3C World Wide Web Consortium

  24. Berners-Lee T, Miller E (2002) The Semantic Web lifts off. In: ERCIM News: online edition

  25. Crane G, Rydberg-Cox JA (2000) New technology and new roles: the need for “corpus editors”. In: 5th ACM conference on digital libraries, San Antonio, TX. ACM Press, New York

  26. Rydberg-Cox JA, Mahoney A, Crane GR (2001) Document quality indicators and corpus editions. In: JDCL 2001: 1st ACM+IEEE joint conference on digital libraries, Roanoke, VA. ACM Press, New York

  27. Friedland L et al (1999) TEI text encoding in libraries: draft guidelines for best encoding practices (version 1.0)

  28. Crane G (1998) New technologies for reading: the lexicon and the digital library. Classical World 92:471–501

    Article  Google Scholar 

  29. Crane G (2000) Designing documents to enhance the performance of digital libraries: time, space, people and a digital library on London. D-Lib Mag 6(7/8)

  30. Crane G et al (2000) The symbiosis between content and technology in the Perseus Digital Library. Cultivate Interact 1(2)

  31. Crane G et al (2001) Drudgery and deep thought: designing digital libraries for the humanities. Commun ACM 44(5)

  32. Crane G, Smith DA, Wulfman C (2001) Building a hypertextual digital library in the humanities: a case study on London. In: JDCL 2001: 1st ACM+IEEE joint conference on digital libraries, Roanoke, VA. ACM Press, New York

  33. Crane G (1996) Building a digital library: the Perseus Project as a case study in the humanities. In: Proceedings of the 1st ACM international conference on digital libraries

  34. Crane G (2002) In a digital world, no books is an island: designing electronic primary sources and reference works for the humanities. In: Breure L, Dillon A (eds) Creation, use and deployment of digital information. Lawrence Earlbaum Associates, p forthcoming

  35. Crane G (2002) Cultural heritage digital libraries: needs and components. In: European conference on digital libraries, Rome. Springer, Berlin Heidelberg New York

  36. Crane G (1998) The Perseus Project and beyond: how building a digital library challenges the humanities and technology. D-Lib Mag

  37. Crane G (2000) Extending a Digital Library: Beginning a Roman Perseus. New Eng Classical J 27(3):140–160

  38. Page W (2002) Command post of the future. DARPA

  39. Rydberg-Cox J et al (2004) Cross-lingual searching and visualization for Greek, Latin, and Old Norse texts. In: Join conference on digital libraries, Tucson, AZ

  40. Rydberg-Cox J et al (2004) Approaching the problem of multi-lingual information retrieval and visualization in Greek and Latin and Old Norse texts. In: European conference on digital libraries

  41. Darwish K, Oard DW (2002) Term selection for searching printed Arabic. In: SIGIR 2002: 25th annual international ACM SIGIR conference on research and development in information retrieval, Tamere, Finland. ACM Press, New York

  42. Mayfield J, McNamee P (2002) Converting on-line bilingual dictionaries from human-readable to machine-readable form. In: SIGIR 2002: 25th annual international ACM SIGIR conference on research and development in information retrieval, Tamere, Finland. ACM Press, New York

  43. Hockey SM (2001) Electronic texts in the humanities: principles and practice. Oxford University Press, Oxford, UK

  44. Fuhr N, Gövert N, Großjohann K (2002) HyREX: Hyper-media retrieval engine for XML. In: SIGIR 2002, Tampere, Finland. ACM Press, New York

  45. Abolhassani M et al (2002) HyREX: Hypermedia retrieval engine for XML. University of Dortmund, Dortmund, Germany

  46. Fuhr N, Großjohann K (2001) XIRQL: A query language for information retrieval in XML. In: Croft B et al (eds) Proceedings of the 24th annual international conference on research and development in information retrieval. ACM, New York, pp 172–180

  47. Fuhr N, Lalmas M, Kazai G (2002) INEX: Initiative for the evaluation of XML retrieval. University of Dortmund, Dortmund, Germany

  48. National Institute for Standards and Technology (2002) Automatic Content Extraction: ACE – Phase 2 – Documentation

  49. Ferro L et al (2001) TIDES temporal annotation guidelines. Mitre.org, McLean, VA, p 57

  50. Pustejovsky J et al (2002) TimeML annotation guidelines. Brandeis University Press, Waltham, MA, p 49

  51. Voorhees EM (2001) Overview of the TREC 2001 Question Answering Track. In: TREC 2001, Gaithersburg, MD. NIST

  52. Lee Y-B, Myaeng SH (2002) Text genre classification with genre-revealing and subject-revealing features. In: SIGIR 2002: 25th annual international ACM SIGIR conference on research and development in information retrieval, Tampere, Finland. ACM Press, New York

  53. Stamatatos E, Fakotakis N, Kokkinakis G (2000) Text genre detection using common word frequencies. In: COLING2000: 18th international conference on computational linguistics, Saarbrücken, Germany

  54. Kessler B, Nunberg G, Schütze H (1997) Automatic detection of text genre. In: ACL 97: Proceedings of the 35th annual meeting of the Association for Computational Linguistics and 8th conference of the European chapter of the Association for Computational Linguistics

  55. Rauber A, Müller-Kögler A (2001) Integrating automatic genre analysis into digital libraries. In: JCDL 2001: 1st ACM/IEEE joint conference on digital libraries, Roanoke, VA. ACM Press, New York

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crane, G., Bontcheva, K., Rydberg-Cox, J. et al. Emerging language technologies and the rediscovery of the past: a research agenda. Int J Digit Libr 5, 309–316 (2005). https://doi.org/10.1007/s00799-004-0096-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-004-0096-6

Keywords

Navigation