Skip to main content

The Place of Comparable Corpora in Providing Terminological Reference Information to Online Translators: A Strategic Framework

  • Chapter
  • First Online:
Building and Using Comparable Corpora
  • 1126 Accesses

Abstract

This paper examines the status of comparable corpora as potential terminological resources with special reference to the applicational framework of helping online translators. In the past 15 years, we have witnessed great advances in bilingual term extraction technologies based both on parallel and comparable corpora. The use of comparable corpora is widely held to be especially important, because not many parallel corpora are available in many language pairs. However, human language practitioners, including online translators, do not make much use of terminological resources constructed using automatic methods; there seems to be a gap between what can be provided through corpus-based automatic extraction methods and what translators actually require. Against this backdrop, this paper first clarifies online translators’ requirements for terminology resources. Based on this clarification, the paper examines what should be taken into account in the use of comparable corpora for bilingual term extraction if the resultant terminology resources are to be really used by translators. The discussion in this paper is deductive rather than empirical, based on the authors’ experience in talking with online translators in the course of developing the integrated translation-hosting and translation-aid site Minna no Hon’yaku (MNH: translation of/by/for all) since 2005 (the site has been open to the public since April 2009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abekawa, T., Kageura, K.: QRedit: An integrated editor system to support online volunteer translators. Proceedings of the Digital Humanit, 3–5 (2007)

    Google Scholar 

  2. Abekawa, T., Kageura, K.: QRpotato: a system that exhaustively collects bilingual technical term pairs from the Web. In: Proceedings of the 3rd International Universal Communication Symposium, pp. 115–119 (2009)

    Google Scholar 

  3. Abekawa, T., Kageura, K.: Using seed terms for crawling bilingual terminology lists on the Web. Trans. Comp. (2011) (no pagination)

    Google Scholar 

  4. Baker, M.: Corpus-based translation studies: The challenges that lie ahead. In: Somers, H. (ed.) Terminology, LSP and Translation: Studies in Language Engineering in Honour of Juan C. Sager, pp. 175–186. Benjamins, Amsterdam (1996)

    Google Scholar 

  5. Baroni, M., Kilgarriff, A.: Large linguistically-processed Web corpora for multiple languages. In: Proceedings of the 11th EACL, pp. 87–90 (2006)

    Google Scholar 

  6. Bernardini, S., Castagnoli, S.: Corpora for translator education and translation practice. In: Yuste, E. (ed.) Topics in Language Resources for Translation and Localisation, pp. 39–55. Benjamins, Amsterdam (2008)

    Google Scholar 

  7. Bowker, L.: Corpus-based applications for translator training: Exploring the possibilities. In: Granger, S., Lerot, J., Petch-Tyson, S. (eds.) Corpus-Based Approaches to Contrastive Linguistics and Translation Studies, pp. 185–206. Rodopi, Amsterdam (2003)

    Google Scholar 

  8. Bowker, L., Pearson, J.: Working with Specialized Language: A Practical Guide to Using Corpora. Routledge, London (2002)

    Book  Google Scholar 

  9. Breen, J. W.: A www Japanese dictionary. In: Japanese Studies Centre Symposium (1999)

    Google Scholar 

  10. Burbles, N.C.: Paradoxes of the web: the ethical dimensions of credibility. Libr. Trends 49(3), 441–453 (2001)

    Google Scholar 

  11. Burr, V.: Soc. Constr., 2nd edn. Routledge, London (2003)

    Google Scholar 

  12. Chiao, Y.-C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: Proceedings of the19th COLING, pp. 1208–1212 (2002)

    Google Scholar 

  13. Collier, N., Kumano, A., Hirakawa, H.: An application of local relevance feedback for building comparable corpora from news article matching. Nat. Inst. Inform. J. 5, 9–23 (2003)

    Google Scholar 

  14. Daille, B., Morin, E.: French–English terminology extraction from comparable corpora. In: Proceedings of the 2nd IJCNLP, pp. 707–718 (2005)

    Google Scholar 

  15. Désilets, A., Brunette, L., Malançon, C., Patenaude, G.: Reliable innovation: a tecchie’s travels in the land of translators. In: Proceedings of the 8th AMTA (2008)

    Google Scholar 

  16. EAGLES.: Synopsis and comparison of morphosyntactic phenomena encoded in lexicons and corpora: a common proposal and applications to European languages. Technical report EAG-CLWG-Morphsyn/R, ILC-CNR, Pisa (1996)

    Google Scholar 

  17. Editorial Committee of the Handbook of Library and Information Science (ed.): Handbook of Library and Information Science, 2nd edn. Maruzen, Tokyo (1999)

    Google Scholar 

  18. Fogg, B.J., Marshall. J., Laraki, O., Osipovich, A., Varma, C., Fang, N., Paul, J., Rangnekar, A., Shon, J., Swani, P. Treinen, M.: What makes web credible? a report on a large quantitative study. In: SIGCHI Conference on Human Factors in Computing Systems, pp. 61–68 (2001)

    Google Scholar 

  19. Fung, P., McKeown, K.: Finding terminology translations from non-parallel corpora. In: Proceedings of the 5th Annual Workshop on Very Large Corpora, pp. 192–202 (1997)

    Google Scholar 

  20. Fung, P.: A statistical view on bilingual lexicon extraction: from parallel corpora to nonparallel corpora. In: Proceedings of the 3rd AMTA, pp. 1–17 (1998)

    Google Scholar 

  21. Gaussier, É.: Flow network models for word alignment and terminology extraction from bilingual corpora. In: Proceedings of the 36th ACL and the 17th COLING, pp. 444–450 (1998)

    Google Scholar 

  22. Gaussier, É., Renders, J.-M., Matveeva, I., Goutte, C., Dejean, H.: A geometric view on bilingual lexicon extraction from comparable corpora. In: Proceedings of the 42nd ACL, pp. 502–509 (2004)

    Google Scholar 

  23. Geeraerts, D.: Lexicology. In: Asher, R.E. (ed.) The Encyclopedia of Language and Linguistics, vol. 4, pp. 2189–2192. Pergamon Press, Oxford (1994)

    Google Scholar 

  24. Goeuriot, L., Morin, E., Daille, B.: Compilation of specialized comparable corpora in French and Japanese. In: Proceedings of the 2nd BUCC, Workshop, pp. 55–63 (2009)

    Google Scholar 

  25. Halliday, M.A.K.: Lexicology. In: Halliday, M.A.K., Teubert, W., Yallop, C., Čermáková, A. (eds.) Lexicology and Corpus Linguistics: An Introduction, pp. 1–22. Continuum, London (2004)

    Google Scholar 

  26. Hansen, S.: The nature of translated text: an interdisciplinary methodology for the investigation of the specific properties of translations. Ph.D. thesis. Saarbrücken: Universität des Saarlandes (2002)

    Google Scholar 

  27. Kageura, K.: The Dynamics of Terminology. Benjamins, Amsterdam (2001)

    Google Scholar 

  28. Kageura, K., Abekawa, T., Sekine, S.: QRselect: a user-driven system for collecting translation document pairs from the Web. In: Proceedings of the 10th ICADL, pp. 131–140 (2007)

    Google Scholar 

  29. Kageura, K., Abekawa, T.: NLP meets library science: providing a set of enhanced language reference tools for online translators. In: Proceedings of the Asia–Pacific Conference on Library and Information Education and Practice (2009)

    Google Scholar 

  30. Kilgariff, A.: Using word frequency lists to measure corpus homogeneity and similarity between corpora. In: Proceedings of the 5th ACL Workshop on Very Large Corpora, pp. 231–245 (1996)

    Google Scholar 

  31. Kilgariff, A.: Comparing corpora. Int. J. Corpus Linguist. 6(1), 1–37 (2001)

    Article  Google Scholar 

  32. Kilgariff, A.: Comparable corpora within and across languages, word frequency lists and the KELLY project. In: Proceedings of the 3rd BUCC Workshop (2010)

    Google Scholar 

  33. Kiousis, S.: Public trust or mistrust? perceptions of media credibility in the information age. Mass Comm. Soc. 4(4), 281–403 (2001)

    Google Scholar 

  34. Kwong, O.Y., Tsou, B.K., Lai, T.B.Y.: Alignment and extraction of bilingual legal terminology from context profiles. Terminology 10(1), 81–99 (2004)

    Article  Google Scholar 

  35. Laroche, A., Langlais, P.: Revisiting context-based projection methods for term-translation spotting in comparable corpora. In: Proceedings of the 23rd COLING, pp. 617–625 (2010)

    Google Scholar 

  36. Li, B., Gaussier, É.: Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 23rd COLING, pp. 644–652 (2010)

    Google Scholar 

  37. Maeda, T.: Goi souron. In: Tamamura, F. (ed.) Nihongo no Goi, Imi, pp. 1–22. Meiji Shoin, Tokyo (1989)

    Google Scholar 

  38. McEnery, T., Xiao, R.: Parallel and comparable corpora: What is happening? In: Anderman, G., Rogers, M. (eds.) Incorporating Corpora, pp. 18–31. Multilingual Matters, Clevedon (2007)

    Google Scholar 

  39. Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Bilingual terminology mining: using brain, not brawn comparable corpora. In: Proceedings of the 45th ACL, pp. 664–671 (2007)

    Google Scholar 

  40. Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Brains, not brawn: the use of “smart” comparable corpora in bilingual terminology mining. ACM Trans. Speech Lang. 7(1), Article 1 (2010)

    Google Scholar 

  41. Olohan, M.: Introducing Corpora in Translation Studies. Routledge, London (2004)

    Google Scholar 

  42. Prochasson, E., Morin, E., Kageura, K.: Anchor points for bilingual lexicon extraction from small comparable corpora. MT Summit XII, pp. 284–291 (2009)

    Google Scholar 

  43. Rapp, R.: Automatic identification of word translations from unrelated English and German corpora. In: Proceedings of the 37th ACL, pp. 519–526 (1999)

    Google Scholar 

  44. Rayson, P., Garside, R.: Comparing corpora using frequency profiling. In: Proceedings of the Comparing Corpora, Workshop, pp. 1–6 (2000)

    Google Scholar 

  45. Rey, A.: Essays on Terminology. Benjamins, Amsterdam (1995)

    Google Scholar 

  46. Rieh, S.Y., Danielson, D.R.: Credibility: a multidisciplinary framework. Annu. Rev. Inf. Sci. Tech. 41, 307–364 (2007)

    Article  Google Scholar 

  47. Sadat, F., Yoshikawa, M., Uemura, S.: Bilingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval. In: Proceedings of the 41st ACL (2003)

    Google Scholar 

  48. Sager, J.C.: A Practical Course in Terminology Processing. Benjamins, Amsterdam (1990)

    Google Scholar 

  49. Sanseido: Sanseido’s Grand Concise English–Japanese Dictionary. Sanseido, Tokyo (2004)

    Google Scholar 

  50. Saralegi, X., San Vicente, I., Gurrutxaga, A.: Automatic extraction of bilingual terms from comparable corpora in a popular science domain. In: Proceedings of the 1st BUCC, Workshop, pp. 27–32 (2008)

    Google Scholar 

  51. Sato, S.: Transliteration using a large-scale candidate list. Japio Year Book 2010, pp. 258–261 (2010)

    Google Scholar 

  52. Sato, S.: Non-productive machine transliteration. In: Proceedings of the 9th RIAO, pp. 16–19 (2010)

    Google Scholar 

  53. de Saussure, F.: Cours de Linguistique Générale. Payot, Paris (1916)

    Google Scholar 

  54. Skadina, I., Vasiljevs, A., Skadiņš, R., Gaizauskas, R., Tufiş, D., Gornostay, T.: Analysis and evaluation of comparable corpora for under resourced areas of machine translation. In: Proceedings of the 3rd BUCC Workshop. 6–14 (2010)

    Google Scholar 

  55. Shao, L., Ng, T.N.: Mining new word translations from comparable corpora. In: Proceedings of the 20th COLING, pp. 618–624 (2004)

    Google Scholar 

  56. Sharoff, S.: Translation as problem solving: uses of comparable corpora. In: Proceedings of the Workshop on Language Resources for Translation Research and Practice (2006)

    Google Scholar 

  57. Sharoff, S.: Classifying Web corpora into domain and genre using automatic feature identification. In: Proceeding of the 3rd Web as Corpus Workshop (2007)

    Google Scholar 

  58. Sharoff, S.: Analysing similarities and differences between corpora. In: Proceedings of the 7th Conference of Language Technologies (2010)

    Google Scholar 

  59. Talvensaari, T., Pirkola, A., Järvelin, K., Juhola, M., Laurikkala, J.: Focused web crawling in the acquisition of comparable corpora. Inf. Ret. 11, 427–445 (2008)

    Article  Google Scholar 

  60. Utiyama, M., Abekawa, T., Sumita, E., Kageura, K.: Hosting volunteer translators. MT Summit XII (2009)

    Google Scholar 

  61. Tonoike, M., Kida, M., Takagi, T., Sasaki, Y., Utsuro, T., Sato, S.: A comparative study on compositional translation estimation using a domain/topic-specific corpus collected from the Web. In: Proceedings of the 2nd Web as Corpus Workshop, pp. 11–18 (2006)

    Google Scholar 

  62. Vintar,\(\rm\breve{S}\).: Bilingual term recognition revisited: the bag-of-equivalents term alignment approach and its evaluation. Terminology 16(2), 141–158 (2010)

    Google Scholar 

  63. Wilks, Y., Slator, B., Guthrie, L.: Electric Words: Dictionaries, Computers, and Meanings. MIT Press, Cambridge (1996)

    Google Scholar 

  64. Zanettin, F.: Bilingual comparable corpora and the training of translators. Meta 43(4), 616–630 (1998)

    Article  Google Scholar 

Download references

Acknowledgments

This work is partly supported by the Japan Society for the Promotion of Sciences (JSPS) grant-in-aid (A) 21240021 “Developing an integrated translation-aid site which provides comprehensive reference sources for translators.” and the 2011 NII research cooperation project “Automatic construction of practically useful English–Japanese terminological lexica using Web information resources.”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyo Kageura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kageura, K., Abekawa, T. (2013). The Place of Comparable Corpora in Providing Terminological Reference Information to Online Translators: A Strategic Framework. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds) Building and Using Comparable Corpora. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20128-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20128-8_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20127-1

  • Online ISBN: 978-3-642-20128-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics