Abstract
This paper examines the status of comparable corpora as potential terminological resources with special reference to the applicational framework of helping online translators. In the past 15 years, we have witnessed great advances in bilingual term extraction technologies based both on parallel and comparable corpora. The use of comparable corpora is widely held to be especially important, because not many parallel corpora are available in many language pairs. However, human language practitioners, including online translators, do not make much use of terminological resources constructed using automatic methods; there seems to be a gap between what can be provided through corpus-based automatic extraction methods and what translators actually require. Against this backdrop, this paper first clarifies online translators’ requirements for terminology resources. Based on this clarification, the paper examines what should be taken into account in the use of comparable corpora for bilingual term extraction if the resultant terminology resources are to be really used by translators. The discussion in this paper is deductive rather than empirical, based on the authors’ experience in talking with online translators in the course of developing the integrated translation-hosting and translation-aid site Minna no Hon’yaku (MNH: translation of/by/for all) since 2005 (the site has been open to the public since April 2009).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abekawa, T., Kageura, K.: QRedit: An integrated editor system to support online volunteer translators. Proceedings of the Digital Humanit, 3–5 (2007)
Abekawa, T., Kageura, K.: QRpotato: a system that exhaustively collects bilingual technical term pairs from the Web. In: Proceedings of the 3rd International Universal Communication Symposium, pp. 115–119 (2009)
Abekawa, T., Kageura, K.: Using seed terms for crawling bilingual terminology lists on the Web. Trans. Comp. (2011) (no pagination)
Baker, M.: Corpus-based translation studies: The challenges that lie ahead. In: Somers, H. (ed.) Terminology, LSP and Translation: Studies in Language Engineering in Honour of Juan C. Sager, pp. 175–186. Benjamins, Amsterdam (1996)
Baroni, M., Kilgarriff, A.: Large linguistically-processed Web corpora for multiple languages. In: Proceedings of the 11th EACL, pp. 87–90 (2006)
Bernardini, S., Castagnoli, S.: Corpora for translator education and translation practice. In: Yuste, E. (ed.) Topics in Language Resources for Translation and Localisation, pp. 39–55. Benjamins, Amsterdam (2008)
Bowker, L.: Corpus-based applications for translator training: Exploring the possibilities. In: Granger, S., Lerot, J., Petch-Tyson, S. (eds.) Corpus-Based Approaches to Contrastive Linguistics and Translation Studies, pp. 185–206. Rodopi, Amsterdam (2003)
Bowker, L., Pearson, J.: Working with Specialized Language: A Practical Guide to Using Corpora. Routledge, London (2002)
Breen, J. W.: A www Japanese dictionary. In: Japanese Studies Centre Symposium (1999)
Burbles, N.C.: Paradoxes of the web: the ethical dimensions of credibility. Libr. Trends 49(3), 441–453 (2001)
Burr, V.: Soc. Constr., 2nd edn. Routledge, London (2003)
Chiao, Y.-C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: Proceedings of the19th COLING, pp. 1208–1212 (2002)
Collier, N., Kumano, A., Hirakawa, H.: An application of local relevance feedback for building comparable corpora from news article matching. Nat. Inst. Inform. J. 5, 9–23 (2003)
Daille, B., Morin, E.: French–English terminology extraction from comparable corpora. In: Proceedings of the 2nd IJCNLP, pp. 707–718 (2005)
Désilets, A., Brunette, L., Malançon, C., Patenaude, G.: Reliable innovation: a tecchie’s travels in the land of translators. In: Proceedings of the 8th AMTA (2008)
EAGLES.: Synopsis and comparison of morphosyntactic phenomena encoded in lexicons and corpora: a common proposal and applications to European languages. Technical report EAG-CLWG-Morphsyn/R, ILC-CNR, Pisa (1996)
Editorial Committee of the Handbook of Library and Information Science (ed.): Handbook of Library and Information Science, 2nd edn. Maruzen, Tokyo (1999)
Fogg, B.J., Marshall. J., Laraki, O., Osipovich, A., Varma, C., Fang, N., Paul, J., Rangnekar, A., Shon, J., Swani, P. Treinen, M.: What makes web credible? a report on a large quantitative study. In: SIGCHI Conference on Human Factors in Computing Systems, pp. 61–68 (2001)
Fung, P., McKeown, K.: Finding terminology translations from non-parallel corpora. In: Proceedings of the 5th Annual Workshop on Very Large Corpora, pp. 192–202 (1997)
Fung, P.: A statistical view on bilingual lexicon extraction: from parallel corpora to nonparallel corpora. In: Proceedings of the 3rd AMTA, pp. 1–17 (1998)
Gaussier, É.: Flow network models for word alignment and terminology extraction from bilingual corpora. In: Proceedings of the 36th ACL and the 17th COLING, pp. 444–450 (1998)
Gaussier, É., Renders, J.-M., Matveeva, I., Goutte, C., Dejean, H.: A geometric view on bilingual lexicon extraction from comparable corpora. In: Proceedings of the 42nd ACL, pp. 502–509 (2004)
Geeraerts, D.: Lexicology. In: Asher, R.E. (ed.) The Encyclopedia of Language and Linguistics, vol. 4, pp. 2189–2192. Pergamon Press, Oxford (1994)
Goeuriot, L., Morin, E., Daille, B.: Compilation of specialized comparable corpora in French and Japanese. In: Proceedings of the 2nd BUCC, Workshop, pp. 55–63 (2009)
Halliday, M.A.K.: Lexicology. In: Halliday, M.A.K., Teubert, W., Yallop, C., Čermáková, A. (eds.) Lexicology and Corpus Linguistics: An Introduction, pp. 1–22. Continuum, London (2004)
Hansen, S.: The nature of translated text: an interdisciplinary methodology for the investigation of the specific properties of translations. Ph.D. thesis. Saarbrücken: Universität des Saarlandes (2002)
Kageura, K.: The Dynamics of Terminology. Benjamins, Amsterdam (2001)
Kageura, K., Abekawa, T., Sekine, S.: QRselect: a user-driven system for collecting translation document pairs from the Web. In: Proceedings of the 10th ICADL, pp. 131–140 (2007)
Kageura, K., Abekawa, T.: NLP meets library science: providing a set of enhanced language reference tools for online translators. In: Proceedings of the Asia–Pacific Conference on Library and Information Education and Practice (2009)
Kilgariff, A.: Using word frequency lists to measure corpus homogeneity and similarity between corpora. In: Proceedings of the 5th ACL Workshop on Very Large Corpora, pp. 231–245 (1996)
Kilgariff, A.: Comparing corpora. Int. J. Corpus Linguist. 6(1), 1–37 (2001)
Kilgariff, A.: Comparable corpora within and across languages, word frequency lists and the KELLY project. In: Proceedings of the 3rd BUCC Workshop (2010)
Kiousis, S.: Public trust or mistrust? perceptions of media credibility in the information age. Mass Comm. Soc. 4(4), 281–403 (2001)
Kwong, O.Y., Tsou, B.K., Lai, T.B.Y.: Alignment and extraction of bilingual legal terminology from context profiles. Terminology 10(1), 81–99 (2004)
Laroche, A., Langlais, P.: Revisiting context-based projection methods for term-translation spotting in comparable corpora. In: Proceedings of the 23rd COLING, pp. 617–625 (2010)
Li, B., Gaussier, É.: Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 23rd COLING, pp. 644–652 (2010)
Maeda, T.: Goi souron. In: Tamamura, F. (ed.) Nihongo no Goi, Imi, pp. 1–22. Meiji Shoin, Tokyo (1989)
McEnery, T., Xiao, R.: Parallel and comparable corpora: What is happening? In: Anderman, G., Rogers, M. (eds.) Incorporating Corpora, pp. 18–31. Multilingual Matters, Clevedon (2007)
Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Bilingual terminology mining: using brain, not brawn comparable corpora. In: Proceedings of the 45th ACL, pp. 664–671 (2007)
Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Brains, not brawn: the use of “smart” comparable corpora in bilingual terminology mining. ACM Trans. Speech Lang. 7(1), Article 1 (2010)
Olohan, M.: Introducing Corpora in Translation Studies. Routledge, London (2004)
Prochasson, E., Morin, E., Kageura, K.: Anchor points for bilingual lexicon extraction from small comparable corpora. MT Summit XII, pp. 284–291 (2009)
Rapp, R.: Automatic identification of word translations from unrelated English and German corpora. In: Proceedings of the 37th ACL, pp. 519–526 (1999)
Rayson, P., Garside, R.: Comparing corpora using frequency profiling. In: Proceedings of the Comparing Corpora, Workshop, pp. 1–6 (2000)
Rey, A.: Essays on Terminology. Benjamins, Amsterdam (1995)
Rieh, S.Y., Danielson, D.R.: Credibility: a multidisciplinary framework. Annu. Rev. Inf. Sci. Tech. 41, 307–364 (2007)
Sadat, F., Yoshikawa, M., Uemura, S.: Bilingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval. In: Proceedings of the 41st ACL (2003)
Sager, J.C.: A Practical Course in Terminology Processing. Benjamins, Amsterdam (1990)
Sanseido: Sanseido’s Grand Concise English–Japanese Dictionary. Sanseido, Tokyo (2004)
Saralegi, X., San Vicente, I., Gurrutxaga, A.: Automatic extraction of bilingual terms from comparable corpora in a popular science domain. In: Proceedings of the 1st BUCC, Workshop, pp. 27–32 (2008)
Sato, S.: Transliteration using a large-scale candidate list. Japio Year Book 2010, pp. 258–261 (2010)
Sato, S.: Non-productive machine transliteration. In: Proceedings of the 9th RIAO, pp. 16–19 (2010)
de Saussure, F.: Cours de Linguistique Générale. Payot, Paris (1916)
Skadina, I., Vasiljevs, A., Skadiņš, R., Gaizauskas, R., Tufiş, D., Gornostay, T.: Analysis and evaluation of comparable corpora for under resourced areas of machine translation. In: Proceedings of the 3rd BUCC Workshop. 6–14 (2010)
Shao, L., Ng, T.N.: Mining new word translations from comparable corpora. In: Proceedings of the 20th COLING, pp. 618–624 (2004)
Sharoff, S.: Translation as problem solving: uses of comparable corpora. In: Proceedings of the Workshop on Language Resources for Translation Research and Practice (2006)
Sharoff, S.: Classifying Web corpora into domain and genre using automatic feature identification. In: Proceeding of the 3rd Web as Corpus Workshop (2007)
Sharoff, S.: Analysing similarities and differences between corpora. In: Proceedings of the 7th Conference of Language Technologies (2010)
Talvensaari, T., Pirkola, A., Järvelin, K., Juhola, M., Laurikkala, J.: Focused web crawling in the acquisition of comparable corpora. Inf. Ret. 11, 427–445 (2008)
Utiyama, M., Abekawa, T., Sumita, E., Kageura, K.: Hosting volunteer translators. MT Summit XII (2009)
Tonoike, M., Kida, M., Takagi, T., Sasaki, Y., Utsuro, T., Sato, S.: A comparative study on compositional translation estimation using a domain/topic-specific corpus collected from the Web. In: Proceedings of the 2nd Web as Corpus Workshop, pp. 11–18 (2006)
Vintar,\(\rm\breve{S}\).: Bilingual term recognition revisited: the bag-of-equivalents term alignment approach and its evaluation. Terminology 16(2), 141–158 (2010)
Wilks, Y., Slator, B., Guthrie, L.: Electric Words: Dictionaries, Computers, and Meanings. MIT Press, Cambridge (1996)
Zanettin, F.: Bilingual comparable corpora and the training of translators. Meta 43(4), 616–630 (1998)
Acknowledgments
This work is partly supported by the Japan Society for the Promotion of Sciences (JSPS) grant-in-aid (A) 21240021 “Developing an integrated translation-aid site which provides comprehensive reference sources for translators.” and the 2011 NII research cooperation project “Automatic construction of practically useful English–Japanese terminological lexica using Web information resources.”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kageura, K., Abekawa, T. (2013). The Place of Comparable Corpora in Providing Terminological Reference Information to Online Translators: A Strategic Framework. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds) Building and Using Comparable Corpora. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20128-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-20128-8_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20127-1
Online ISBN: 978-3-642-20128-8
eBook Packages: Computer ScienceComputer Science (R0)