Abstract
During 2001 and 2002, our Delos/NSF working group explored the possibilities that emerging language technologies open up for teaching, learning, and research in the broad area of cultural heritage. On the one hand, emerging language technologies will profoundly redefine the research and teaching of all those working with cultural heritage languages. At the same time, developers of language technology would also benefit from exploring the needs of new audiences and new collections. While multilingual technologies may ultimately prove the most revolutionary, this report focuses on monolingual technologies such as information extraction, summarization, and other aspects of document understanding. In this paper, we describe some of the audiences affected and technologies to be evaluated and argue for the creation of venues where the application of these technologies to cultural heritage materials can be rigorously evaluated. The potential impact of language technologies for our understanding of the past will emerge over a long period of time and will doubtless include many techniques not covered here. We make no claim to a comprehensive survey. Our goal is to provide enough information to suggest the potential importance of these new technologies.
Similar content being viewed by others
References
Rosenzweig R, Thelen DP (1998) The presence of the past: popular uses of history in American life – supplementary Web site. George Mason University, Fairfax, VA
Rosenzweig R (2002) Everyone a historian – afterthoughts to the presence of the past. George Mason University, Fairfax, VA
D’Addezio IJ (2002) United States Historical Society Directory. D’Addezio
Listokin D, Lahr ML (1997) Economic impacts of historic preservation. New Jersey Historic Trust, Trenton, NJ, p 484
Leithe J, Tigue P (1999) Profiting from the past: the economic impact of historic preservation in Georgia. Georgia Historic Preservation Division, Athens, GA. http://www.gashpo.org. 26
Listokin D et al (2002) Economic Impacts of Historic Preservation in Florida. Florida Department of State, Division of Historic Resources, Bureau of Historic Preservation
Commission MH (2002) Preservation works: the economics of preservation. In: Massachusettts Historic Preservation Conference, 27 September 2002
Burns K et al (1989) The Civil War. PBS Video, Alexandria, VA
Toplin RB (1996) Ken Burns’s The Civil War: the historian’s response. Oxford University Press, New York xxvii, 197
Rosenzweig R, Thelen DP (1998) The presence of the past: popular uses of history in American life. Columbia University Press, New York x, 291
Maynard D et al (2002) Adapting a robust multi-genre NE system for automatic content extraction. In: 10th international conference on artificial intelligence: methodology, systems, applications
Bikel D et al (1997) Nymble: a high-performance learning name-finder. In: Proceedings of the 5th ACM conference on applied natural language processing, pp 194–201
Technologies B, AFRL/IFED (2001) Information extraction (IE) technology for counterdrug applications. Department of Defense: Counterdrug Technology Development Program, Washington, DC, p 5
Sperberg-McQueen CM, Burnard L (eds) (2001) TEI P4: Guidelines for electronic text encoding and interchange – XML-compatible version. TEI-Consortium
Sperberg-McQueen CM, Burnard L (1990) Guidelines for the encoding and interchange of machine-readable texts, version 1.0 ed. The Association for Computers and the Humanities; the Association for Computational Linguistics; the Association for Literacy and Linguistic Computing
Anand P et al (2001) Qanda and the Catalyst architecture. In: 10th Text REtrieval Conference (TREC 2001). Department of Commerce, National Institute of Standards and Technology, Gaithersburg, MD
Grishman R, Contractors TPI (1998) TIPSTER Text Architecture Design. New York University Press, New York, p 70
Program, ACftTTPI (1996) TIPSTER Text Phase II Architecture Concept
Cunningham H et al (2002) Developing language processing components with GATE (a user guide). University of Sheffield Press, Sheffield, UK
Bird S et al (2002) TableTrans, MultiTrans, InterTrans and TreeTrans: diverse tools built on the Annotation Graph Toolkit. In: Proceedings of the 3rd international conference on language resources and evaluation, European Language Resources Association, Paris
Bird S, Liberman M (2001) A formal framework for linguistic annotation. Speech Commun 33(1–2):23–60
Cotton S, Bird S (2002) An integrated framework for treebanks and multilayer annotations. In: 3rd international conference on language resources and evaluation, European Language Resources Association, Paris
Miller E et al (2001) W3C Semantic Web. W3C World Wide Web Consortium
Berners-Lee T, Miller E (2002) The Semantic Web lifts off. In: ERCIM News: online edition
Crane G, Rydberg-Cox JA (2000) New technology and new roles: the need for “corpus editors”. In: 5th ACM conference on digital libraries, San Antonio, TX. ACM Press, New York
Rydberg-Cox JA, Mahoney A, Crane GR (2001) Document quality indicators and corpus editions. In: JDCL 2001: 1st ACM+IEEE joint conference on digital libraries, Roanoke, VA. ACM Press, New York
Friedland L et al (1999) TEI text encoding in libraries: draft guidelines for best encoding practices (version 1.0)
Crane G (1998) New technologies for reading: the lexicon and the digital library. Classical World 92:471–501
Crane G (2000) Designing documents to enhance the performance of digital libraries: time, space, people and a digital library on London. D-Lib Mag 6(7/8)
Crane G et al (2000) The symbiosis between content and technology in the Perseus Digital Library. Cultivate Interact 1(2)
Crane G et al (2001) Drudgery and deep thought: designing digital libraries for the humanities. Commun ACM 44(5)
Crane G, Smith DA, Wulfman C (2001) Building a hypertextual digital library in the humanities: a case study on London. In: JDCL 2001: 1st ACM+IEEE joint conference on digital libraries, Roanoke, VA. ACM Press, New York
Crane G (1996) Building a digital library: the Perseus Project as a case study in the humanities. In: Proceedings of the 1st ACM international conference on digital libraries
Crane G (2002) In a digital world, no books is an island: designing electronic primary sources and reference works for the humanities. In: Breure L, Dillon A (eds) Creation, use and deployment of digital information. Lawrence Earlbaum Associates, p forthcoming
Crane G (2002) Cultural heritage digital libraries: needs and components. In: European conference on digital libraries, Rome. Springer, Berlin Heidelberg New York
Crane G (1998) The Perseus Project and beyond: how building a digital library challenges the humanities and technology. D-Lib Mag
Crane G (2000) Extending a Digital Library: Beginning a Roman Perseus. New Eng Classical J 27(3):140–160
Page W (2002) Command post of the future. DARPA
Rydberg-Cox J et al (2004) Cross-lingual searching and visualization for Greek, Latin, and Old Norse texts. In: Join conference on digital libraries, Tucson, AZ
Rydberg-Cox J et al (2004) Approaching the problem of multi-lingual information retrieval and visualization in Greek and Latin and Old Norse texts. In: European conference on digital libraries
Darwish K, Oard DW (2002) Term selection for searching printed Arabic. In: SIGIR 2002: 25th annual international ACM SIGIR conference on research and development in information retrieval, Tamere, Finland. ACM Press, New York
Mayfield J, McNamee P (2002) Converting on-line bilingual dictionaries from human-readable to machine-readable form. In: SIGIR 2002: 25th annual international ACM SIGIR conference on research and development in information retrieval, Tamere, Finland. ACM Press, New York
Hockey SM (2001) Electronic texts in the humanities: principles and practice. Oxford University Press, Oxford, UK
Fuhr N, Gövert N, Großjohann K (2002) HyREX: Hyper-media retrieval engine for XML. In: SIGIR 2002, Tampere, Finland. ACM Press, New York
Abolhassani M et al (2002) HyREX: Hypermedia retrieval engine for XML. University of Dortmund, Dortmund, Germany
Fuhr N, Großjohann K (2001) XIRQL: A query language for information retrieval in XML. In: Croft B et al (eds) Proceedings of the 24th annual international conference on research and development in information retrieval. ACM, New York, pp 172–180
Fuhr N, Lalmas M, Kazai G (2002) INEX: Initiative for the evaluation of XML retrieval. University of Dortmund, Dortmund, Germany
National Institute for Standards and Technology (2002) Automatic Content Extraction: ACE – Phase 2 – Documentation
Ferro L et al (2001) TIDES temporal annotation guidelines. Mitre.org, McLean, VA, p 57
Pustejovsky J et al (2002) TimeML annotation guidelines. Brandeis University Press, Waltham, MA, p 49
Voorhees EM (2001) Overview of the TREC 2001 Question Answering Track. In: TREC 2001, Gaithersburg, MD. NIST
Lee Y-B, Myaeng SH (2002) Text genre classification with genre-revealing and subject-revealing features. In: SIGIR 2002: 25th annual international ACM SIGIR conference on research and development in information retrieval, Tampere, Finland. ACM Press, New York
Stamatatos E, Fakotakis N, Kokkinakis G (2000) Text genre detection using common word frequencies. In: COLING2000: 18th international conference on computational linguistics, Saarbrücken, Germany
Kessler B, Nunberg G, Schütze H (1997) Automatic detection of text genre. In: ACL 97: Proceedings of the 35th annual meeting of the Association for Computational Linguistics and 8th conference of the European chapter of the Association for Computational Linguistics
Rauber A, Müller-Kögler A (2001) Integrating automatic genre analysis into digital libraries. In: JCDL 2001: 1st ACM/IEEE joint conference on digital libraries, Roanoke, VA. ACM Press, New York
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Crane, G., Bontcheva, K., Rydberg-Cox, J. et al. Emerging language technologies and the rediscovery of the past: a research agenda. Int J Digit Libr 5, 309–316 (2005). https://doi.org/10.1007/s00799-004-0096-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-004-0096-6