Skip to main content

Evaluating Terminology Translation in MT

  • Conference paper
  • First Online:
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13451))

  • 375 Accesses

Abstract

Terminology translation plays a crucial role in domain-specific machine translation (MT). Preservation of domain knowledge from source to target is arguably the most concerning factor for clients in translation industry, especially for critical domains such as medical, transportation, military, legal and aerospace. Evaluation of terminology translation, despite its huge importance in the translation industry, has been a less examined area in MT research. Term translation quality in MT is usually measured with domain experts, either in academia or industry. To the best of our knowledge, as of yet there is no publicly available solution to automatically evaluate terminology translation in MT. In particular, manual intervention is often needed to evaluate terminology translation in MT, which, by nature, is a time-consuming and highly expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems are often needed to be updated for many reasons (e.g. availability of new training data or leading MT techniques). Hence, there is a genuine need to have a faster and less expensive solution to this problem, which could aid the end-users to instantly identify term translation problems in MT. In this study, we propose an automatic evaluation metric, TermEval, for evaluating terminology translation in MT. To the best of our knowledge, there is no gold-standard dataset available for measuring terminology translation quality in MT. In the absence of gold-standard evaluation test set, we semi-automatically create a gold-standard dataset from English–Hindi judicial domain parallel corpus.

We trained state-of-the-art phrase-based SMT (PB-SMT) and neural MT (NMT) models on two translation directions: English-to-Hindi and Hindi-to-English, and use TermEval to evaluate their performance on terminology translation over the created gold-standard test set. In order to measure the correlation between TermEval scores and human judgments, translations of each source terms (of the gold-standard test set) is validated with human evaluator. High correlation between TermEval and human judgements manifests the effectiveness of the proposed terminology translation evaluation metric. We also carry out comprehensive manual evaluation on terminology translation and present our observations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.isi.edu/natural-language/software/nplm/.

  2. 2.

    http://www.statmt.org/moses/giza/GIZA++.html.

  3. 3.

    http://www.cfilt.iitb.ac.in/iitb_parallel/.

  4. 4.

    http://opus.lingfil.uu.se/.

  5. 5.

    https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl.

  6. 6.

    https://en.wikipedia.org/wiki/SDL_Trados_Studio.

  7. 7.

    https://en.wikipedia.org/wiki/PyQt.

  8. 8.

    https://github.com/rejwanul-adapt/TermMarker.

  9. 9.

    https://github.com/rejwanul-adapt/EnHiTerminologyData.

  10. 10.

    Since Moses can supply word-to-word alignments with its output (i.e. translation) from the phrase table (if any), one can exploit this information to trace target translation of a source term in the output. However, there are few potential problems with the alignment information, e.g. there could be null or erroneous alignments. Note that, at the time of this work, the transformer models of MarianNMT could not supply word-alignments (i.e. attention weights). In fact, our intention is to make our proposed evaluation method as generic as possible so that it can be applied to the output of any MT system (e.g. an online commercial MT engine). This led us to abandon such dependency.

References

  1. BitterCorpus. https://hlt-mt.fbk.eu/technologies/bittercorpus. Accessed 28 Aug 2019

  2. Arčan, M., Turchi, M., Tonelli, S., Buitelaar, P.: Enhancing statistical machine translation with bilingual terminology in a cat environment. In: Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas, pp. 54–68 (2014)

    Google Scholar 

  3. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. CoRR abs/1607.06450 (2016). https://arxiv.org/abs/1607.06450

  4. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations, pp. 1–15. San Diego, CA (2015)

    Google Scholar 

  5. Beyer, A.M., Macketanz, V., Burchardt, A., Williams, P.: Can out-of-the-box NMT beat a domain-trained Moses on technical data? In: Proceedings of EAMT User Studies and Project/Product Descriptions, pp. 41–46. Prague, Czech Republic (2017)

    Google Scholar 

  6. Bojar, O., et al.: Hindencorp - Hindi-English and Hindi-only corpus for machine translation. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC, pp. 3550–3555 (2014)

    Google Scholar 

  7. Burchardt, A., Macketanz, V., Dehdari, J., Heigold, G., Peter, J.T., Williams, P.: A linguistic evaluation of rule-based, phrase-based, and neural MT engines. Prague Bull. Math. Linguist. 108(1), 159–170 (2017)

    Article  Google Scholar 

  8. Cherry, C., Foster, G.: Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 427–436. Association for Computational Linguistics, Montréal, Canada (2012)

    Google Scholar 

  9. Cho, K., van Merriënboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Doha, Qatar, October 2014

    Google Scholar 

  10. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)

    Google Scholar 

  11. Denkowski, M., Lavie, A.: Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 85–91. Association for Computational Linguistics, Edinburgh, Scotland, July 2011

    Google Scholar 

  12. Durrani, N., Schmid, H., Fraser, A.: A joint sequence translation model with integrated reordering. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 1045–1054. Association for Computational Linguistics, Portland, Oregon, USA, June 2011

    Google Scholar 

  13. Farajian, M.A., Bertoldi, N., Negri, M., Turchi, M., Federico, M.: Evaluation of terminology translation in instance-based neural MT adaptation. In: Proceedings of the 21st Annual Conference of the European Association for Machine Translation, pp. 149–158. Alicante, Spain (2018)

    Google Scholar 

  14. Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)

    Google Scholar 

  15. Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. CoRR abs/1512.05287 (2016). https://arxiv.org/abs/1512.05287

  16. Haque, R., Hasanuzzaman, M., Way, A.: Investigating terminology translation in statistical and neural machine translation: a case study on English-to-Hindi and Hindi-to-English. In: Proceedings of RANLP 2019: Recent Advances in Natural Language Processing, pp. 437–446. Varna, Bulgaria (2019)

    Google Scholar 

  17. Haque, R., Hasanuzzaman, M., Way, A.: Analysing terminology translation errors in statistical and neural machine translation. Mach. Transl. 34(2), 149–195 (2020)

    Google Scholar 

  18. Haque, R., Penkale, S., Way, A.: Bilingual termbank creation via log-likelihood comparison and phrase-based statistical machine translation. In: Proceedings of the 4th International Workshop on Computational Terminology (Computerm), pp. 42–51. Dublin, Ireland (2014)

    Google Scholar 

  19. Haque, R., Penkale, S., Way, A.: TermFinder: log-likelihood comparison and phrase-based statistical machine translation models for bilingual terminology extraction. Lang. Resour. Eval. 52(2), 365–400 (2018). https://doi.org/10.1007/s10579-018-9412-4

    Article  Google Scholar 

  20. Hassan, H., et al.: Achieving human parity on automatic Chinese to English news translation, March 2018. ArXiv e-prints

    Google Scholar 

  21. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385

  22. Heafield, K., Pouzyrevsky, I., Clark, J.H., Koehn, P.: Scalable modified kneser-ney language model estimation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 690–696. Association for Computational Linguistics, Sofia, Bulgaria, August 2013

    Google Scholar 

  23. Huang, G., Zhang, J., Zhou, Y., Zong, C.: A simple, straightforward and effective model for joint bilingual terms detection and word alignment in SMT. Nat. Lang. Underst. Intell. Appl. ICCPOL/NLPCC 2016 10102, 103–115 (2016)

    Google Scholar 

  24. Huang, L., Chiang, D.: Forest rescoring: faster decoding with integrated language models. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 144–151. Association for Computational Linguistics, Prague, Czech Republic, June 2007

    Google Scholar 

  25. Junczys-Dowmunt, M., Dwojak, T., Hoang, H.: Is neural machine translation ready for deployment? A case study on 30 translation directions. ArXiv e-prints (2016)

    Google Scholar 

  26. Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1700–1709. Seattle, WA, October 2013

    Google Scholar 

  27. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980

  28. Koehn, P.: Statistical significance tests for machine translation evaluation. In: Lin, D., Wu, D. (eds.) Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 388–395. Association for Computational Linguistics, Barcelona, Spain, July 2004. http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Koehn.pdf

  29. Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Proceedings of MT Summit X: The Tenth Machine Translation Summit, pp. 79–86. Phuket, Thailand (2005)

    Google Scholar 

  30. Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: ACL 2007, Proceedings of the Interactive Poster and Demonstration Sessions, pp. 177–180. Prague, Czech Republic (2007)

    Google Scholar 

  31. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: HLT-NAACL 2003: Conference Combining Human Language Technology Conference Series and the North American Chapter of the Association for Computational Linguistics Conference Series, pp. 48–54. Edmonton, AB (2003)

    Google Scholar 

  32. Kunchukuttan, A., Mehta, P., Bhattacharyya, P.: The IIT Bombay English-Hindi parallel corpus. CoRR 1710.02855 (2017). https://arxiv.org/abs/1710.02855

  33. Lommel, A.R., Uszkoreit, H., Burchardt, A.: Multidimensional quality metrics (MQM): a framework for declaring and describing translation quality metrics. Tradumática: tecnologies de la traducció (12), 455–463 (2014)

    Google Scholar 

  34. Macketanz, V., Avramidis, E., Burchardt, A., Helcl, J., Srivastava, A.: Machine translation: phrase-based, rule-based and neural approaches with linguistic evaluation. Cybern. Inf. Technol. 17(2), 28–43 (2017). https://content.sciendo.com/view/journals/pralin/108/1/article-p159.xml

  35. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  36. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: ACL-2002: 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. ACL, Philadelphia, PA (2002)

    Google Scholar 

  37. Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.M.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis, S. (ed.) Knowledge Mining, vol. 185, pp. 255–279. Springer, Berlin, Heidelberg (2005). https://doi.org/10.1007/3-540-32394-5_20

  38. Pinnis, M., Ljubešić, N., Ştefănescu, D., Skadina, I., Tadić, M., Gornostay, T.: Term extraction, tagging, and mapping tools for under-resourced languages. In: Proceedings of the 10th Conference on Terminology and Knowledge Engineering (TKE 2012), pp. 193–208. Madrid, Spain (2012)

    Google Scholar 

  39. Popović, M.: chrF: character n-gram f-score for automatic MT evaluation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 392–395. Association for Computational Linguistics, Lisbon, Portugal, September 2015

    Google Scholar 

  40. Popović, M.: Comparing language related issues for NMT and PBMT between German and English. Prague Bull. Math. Linguist. 108(1), 209–220 (2017)

    Article  Google Scholar 

  41. Press, O., Wolf, L.: Using the output embedding to improve language models. CoRR abs/1608.05859 (2016). http://arxiv.org/abs/1608.05859

  42. Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. CoRR abs/1511.06709 (2015). http://arxiv.org/abs/1511.06709

  43. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725. Association for Computational Linguistics, Berlin, Germany, August 2016

    Google Scholar 

  44. Skadinš, R., Purinš, M., Skadina, I., Vasiljevs, A.: Evaluation of SMT in localization to under-resourced inflected language. In: Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT 2011), pp. 35–40. Leuven, Belgium (2011)

    Google Scholar 

  45. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: In Proceedings of the 7th Biennial Conference of the Association for Machine Translation in the Americas (AMTA-2006), pp. 223–231. Cambridge, Massachusetts (2006)

    Google Scholar 

  46. Specia, L., et al.: Translation quality and productivity: a study on rich morphology languages. In: Proceedings of MT Summit XVI, the 16th Machine Translation Summit, pp. 55–71. Asia-Pacific Association for Machine Translation, Nagoya, Japan (2017)

    Google Scholar 

  47. Stanojević, M., Sima’an, K.: Beer: better evaluation as ranking. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 414–419. Association for Computational Linguistics, Baltimore, Maryland, USA, June 2014

    Google Scholar 

  48. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 3104–3112. NIPS 2014, Montreal, Canada (2014)

    Google Scholar 

  49. Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’2012), pp. 2214–2218. Istanbul, Turkey (2012)

    Google Scholar 

  50. Toral, A., Sánchez-Cartagena, V.M.: A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. CoRR abs/1701.02901 (2017). http://arxiv.org/abs/1701.02901

  51. Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762

  52. Vaswani, A., Zhao, Y., Fossum, V., Chiang, D.: Decoding with large-scale neural language models improves translation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1387–1392. Association for Computational Linguistics, Seattle, Washington, USA, October 2013

    Google Scholar 

  53. Vintar, V.: Terminology translation accuracy in statistical versus neural MT: an evaluation for the English-Slovene language pair. In: Du, J., Arcan, M., Liu, Q., Isahara, H. (eds.) Proceedings of the LREC 2018 Workshop MLP-MomenT: The Second Workshop on Multi-Language Processing in a Globalising World and The First Workshop on Multilingualism at the intersection of Knowledge Bases and Machine Translation, pp. 34–37. European Language Resources Association (ELRA), Miyazaki, Japan, May 2018

    Google Scholar 

  54. Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016). http://arxiv.org/abs/1609.08144

  55. Yeh, A.: More accurate tests for the statistical significance of result differences. In: Proceedings of the 18th Conference on Computational Linguistics - Volume 2, COLING 2000, pp. 947–953. Saarbrücken, Germany (2000)

    Google Scholar 

  56. Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B.: The united nations parallel corpus v1.0. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Portorož, Slovenia (2016)

    Google Scholar 

Download references

Acknowledgments

The ADAPT Centre for Digital Content Technology is funded under the Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rejwanul Haque .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Haque, R., Hasanuzzaman, M., Way, A. (2023). Evaluating Terminology Translation in MT. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24337-0_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24336-3

  • Online ISBN: 978-3-031-24337-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics