Abstract
How can corpora assist translators in ways in which resources like translation memories or term databases cannot? Our tests on English, Polish and Swedish parts of the JRC-Acquis Multilingual Parallel show that corpora can provide support for term standardization and variation, and, most importantly, for tracing novel expressions. A corpus tool with an explicit dictionary representation is particularly suitable for the last task. Culler is a tool which allows one to select expressions with words absent from its dictionary. Even if the extracted material may be stained with some noise, it has an undeniable value for translators and lexicographers. The quality of extraction depends in a rather obvious way on the dictionary and text processing but also on the query.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Saint Robert de, M.-J.: Language Resources at the Languages Service of the United Nations Office at Geneva. In: Proceedings of LREC 2002 Workshop in Language Resources (LR) for Translation Work, Research and Training (2002)
Maia, B.: Corpora for terminology extraction - the differing perspectives and objectives of researchers, teachers and language service providers. In: Proceedings of LREC 2002 Workshop in Language Resources (LR) for Translation Work, Research and Training (2002)
Dura, E.: Concordances of Snippets. In: Coling Workshop on Using and Enhancing Electronic Dictionaries, Geneva (2004)
Dura, E.: Culler - a User Friendly Corpus Query System. In: Proceedings of the Fourth International Workshop on Dictionary Writing Systems. Euralex (2006)
Culler, http://www.nla.se/culler/ , http://bergelmir.iki.his.se/culler/
Materials of the Workshop in Language Resources (LR) for Translation Work, Research and Training (LREC 2002), http://www.ifi.unizh.ch/cl/yuste/postworkshop
Proceedings of the Fourth International Workshop on Dictionary Writing Systems (Euralex (2006), http://tshwanedje.com/publications/dws2006.pdf6
Gawronska, B., Erlendsson, B., Olsson, B.: Tracking Biological Relations in Texts: a Referent Grammar Based Approach. In: Proceedings of the workshop Biomedical Ontologies and Text Processing, 4th European Conference on Computational Biology (ECCB 2005), Madrid, Spain, pp. 15–22 (2005)
Gawronska, B., Erlendsson, B.: Syntactic, Semantic and Referential Patterns in Biomedical Texts: towards in-depth text comprehension for the purpose of bioinformatics. In: Sharp, B. (ed.) Natural Language Understanding and Cognitive Science. Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science NLUCS 2005, Miami, USA, pp. 68–77 (2005)
Fillmore, C.: Multiword Expressions: An Extremist Approach. A lecture delivered at the conference Collocations and idioms: linguistic, computational, and psycholinguistic perspectives, Berlin (Magnus-Haus) September 18-20 (2003), http://www.bbaw.de/forschung/kollokationen/documents/coll_fillmore_mwe.pdf
Dura, E., Erlendsson, B., Gawronska, B., Olsson, B.: Towards Information Fusion in Pathway Evaluation: Encoding Relations in Biomedical Texts. In: Proceedings of the 9th International Conference on Information Fusion, Florence, Italy, pp. 240–247 (2006)
Kübler, N.: Corpora and LSP Translation. In: Zanettin, F., Bernardini, S., Stewart, D. (eds.) Corpora in Tranlator Education, pp. 25–42. St. Jerome Publishing, Manchester (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dura, E., Gawronska, B. (2009). Novelty Extraction from Special and Parallel Corpora. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-04235-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)