Evaluating Terminology Translation in MT

Haque, Rejwanul; Hasanuzzaman, Mohammed; Way, Andy

doi:10.1007/978-3-031-24337-0_35

Rejwanul Haque⁸,
Mohammed Hasanuzzaman⁹ &
Andy Way¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13451))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

375 Accesses

Abstract

Terminology translation plays a crucial role in domain-specific machine translation (MT). Preservation of domain knowledge from source to target is arguably the most concerning factor for clients in translation industry, especially for critical domains such as medical, transportation, military, legal and aerospace. Evaluation of terminology translation, despite its huge importance in the translation industry, has been a less examined area in MT research. Term translation quality in MT is usually measured with domain experts, either in academia or industry. To the best of our knowledge, as of yet there is no publicly available solution to automatically evaluate terminology translation in MT. In particular, manual intervention is often needed to evaluate terminology translation in MT, which, by nature, is a time-consuming and highly expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems are often needed to be updated for many reasons (e.g. availability of new training data or leading MT techniques). Hence, there is a genuine need to have a faster and less expensive solution to this problem, which could aid the end-users to instantly identify term translation problems in MT. In this study, we propose an automatic evaluation metric, TermEval, for evaluating terminology translation in MT. To the best of our knowledge, there is no gold-standard dataset available for measuring terminology translation quality in MT. In the absence of gold-standard evaluation test set, we semi-automatically create a gold-standard dataset from English–Hindi judicial domain parallel corpus.

We trained state-of-the-art phrase-based SMT (PB-SMT) and neural MT (NMT) models on two translation directions: English-to-Hindi and Hindi-to-English, and use TermEval to evaluate their performance on terminology translation over the created gold-standard test set. In order to measure the correlation between TermEval scores and human judgments, translations of each source terms (of the gold-standard test set) is validated with human evaluator. High correlation between TermEval and human judgements manifests the effectiveness of the proposed terminology translation evaluation metric. We also carry out comprehensive manual evaluation on terminology translation and present our observations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.isi.edu/natural-language/software/nplm/.
2.
http://www.statmt.org/moses/giza/GIZA++.html.
3.
http://www.cfilt.iitb.ac.in/iitb_parallel/.
4.
http://opus.lingfil.uu.se/.
5.
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl.
6.
https://en.wikipedia.org/wiki/SDL_Trados_Studio.
7.
https://en.wikipedia.org/wiki/PyQt.
8.
https://github.com/rejwanul-adapt/TermMarker.
9.
https://github.com/rejwanul-adapt/EnHiTerminologyData.
10.
Since Moses can supply word-to-word alignments with its output (i.e. translation) from the phrase table (if any), one can exploit this information to trace target translation of a source term in the output. However, there are few potential problems with the alignment information, e.g. there could be null or erroneous alignments. Note that, at the time of this work, the transformer models of MarianNMT could not supply word-alignments (i.e. attention weights). In fact, our intention is to make our proposed evaluation method as generic as possible so that it can be applied to the output of any MT system (e.g. an online commercial MT engine). This led us to abandon such dependency.

References

BitterCorpus. https://hlt-mt.fbk.eu/technologies/bittercorpus. Accessed 28 Aug 2019
Arčan, M., Turchi, M., Tonelli, S., Buitelaar, P.: Enhancing statistical machine translation with bilingual terminology in a cat environment. In: Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas, pp. 54–68 (2014)
Google Scholar
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. CoRR abs/1607.06450 (2016). https://arxiv.org/abs/1607.06450
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations, pp. 1–15. San Diego, CA (2015)
Google Scholar
Beyer, A.M., Macketanz, V., Burchardt, A., Williams, P.: Can out-of-the-box NMT beat a domain-trained Moses on technical data? In: Proceedings of EAMT User Studies and Project/Product Descriptions, pp. 41–46. Prague, Czech Republic (2017)
Google Scholar
Bojar, O., et al.: Hindencorp - Hindi-English and Hindi-only corpus for machine translation. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC, pp. 3550–3555 (2014)
Google Scholar
Burchardt, A., Macketanz, V., Dehdari, J., Heigold, G., Peter, J.T., Williams, P.: A linguistic evaluation of rule-based, phrase-based, and neural MT engines. Prague Bull. Math. Linguist. 108(1), 159–170 (2017)
Article Google Scholar
Cherry, C., Foster, G.: Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 427–436. Association for Computational Linguistics, Montréal, Canada (2012)
Google Scholar
Cho, K., van Merriënboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Doha, Qatar, October 2014
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)
Google Scholar
Denkowski, M., Lavie, A.: Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 85–91. Association for Computational Linguistics, Edinburgh, Scotland, July 2011
Google Scholar
Durrani, N., Schmid, H., Fraser, A.: A joint sequence translation model with integrated reordering. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 1045–1054. Association for Computational Linguistics, Portland, Oregon, USA, June 2011
Google Scholar
Farajian, M.A., Bertoldi, N., Negri, M., Turchi, M., Federico, M.: Evaluation of terminology translation in instance-based neural MT adaptation. In: Proceedings of the 21st Annual Conference of the European Association for Machine Translation, pp. 149–158. Alicante, Spain (2018)
Google Scholar
Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)
Google Scholar
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. CoRR abs/1512.05287 (2016). https://arxiv.org/abs/1512.05287
Haque, R., Hasanuzzaman, M., Way, A.: Investigating terminology translation in statistical and neural machine translation: a case study on English-to-Hindi and Hindi-to-English. In: Proceedings of RANLP 2019: Recent Advances in Natural Language Processing, pp. 437–446. Varna, Bulgaria (2019)
Google Scholar
Haque, R., Hasanuzzaman, M., Way, A.: Analysing terminology translation errors in statistical and neural machine translation. Mach. Transl. 34(2), 149–195 (2020)
Google Scholar
Haque, R., Penkale, S., Way, A.: Bilingual termbank creation via log-likelihood comparison and phrase-based statistical machine translation. In: Proceedings of the 4th International Workshop on Computational Terminology (Computerm), pp. 42–51. Dublin, Ireland (2014)
Google Scholar
Haque, R., Penkale, S., Way, A.: TermFinder: log-likelihood comparison and phrase-based statistical machine translation models for bilingual terminology extraction. Lang. Resour. Eval. 52(2), 365–400 (2018). https://doi.org/10.1007/s10579-018-9412-4
Article Google Scholar
Hassan, H., et al.: Achieving human parity on automatic Chinese to English news translation, March 2018. ArXiv e-prints
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
Heafield, K., Pouzyrevsky, I., Clark, J.H., Koehn, P.: Scalable modified kneser-ney language model estimation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 690–696. Association for Computational Linguistics, Sofia, Bulgaria, August 2013
Google Scholar
Huang, G., Zhang, J., Zhou, Y., Zong, C.: A simple, straightforward and effective model for joint bilingual terms detection and word alignment in SMT. Nat. Lang. Underst. Intell. Appl. ICCPOL/NLPCC 2016 10102, 103–115 (2016)
Google Scholar
Huang, L., Chiang, D.: Forest rescoring: faster decoding with integrated language models. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 144–151. Association for Computational Linguistics, Prague, Czech Republic, June 2007
Google Scholar
Junczys-Dowmunt, M., Dwojak, T., Hoang, H.: Is neural machine translation ready for deployment? A case study on 30 translation directions. ArXiv e-prints (2016)
Google Scholar
Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1700–1709. Seattle, WA, October 2013
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Koehn, P.: Statistical significance tests for machine translation evaluation. In: Lin, D., Wu, D. (eds.) Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 388–395. Association for Computational Linguistics, Barcelona, Spain, July 2004. http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Koehn.pdf
Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Proceedings of MT Summit X: The Tenth Machine Translation Summit, pp. 79–86. Phuket, Thailand (2005)
Google Scholar
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: ACL 2007, Proceedings of the Interactive Poster and Demonstration Sessions, pp. 177–180. Prague, Czech Republic (2007)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: HLT-NAACL 2003: Conference Combining Human Language Technology Conference Series and the North American Chapter of the Association for Computational Linguistics Conference Series, pp. 48–54. Edmonton, AB (2003)
Google Scholar
Kunchukuttan, A., Mehta, P., Bhattacharyya, P.: The IIT Bombay English-Hindi parallel corpus. CoRR 1710.02855 (2017). https://arxiv.org/abs/1710.02855
Lommel, A.R., Uszkoreit, H., Burchardt, A.: Multidimensional quality metrics (MQM): a framework for declaring and describing translation quality metrics. Tradumática: tecnologies de la traducció (12), 455–463 (2014)
Google Scholar
Macketanz, V., Avramidis, E., Burchardt, A., Helcl, J., Srivastava, A.: Machine translation: phrase-based, rule-based and neural approaches with linguistic evaluation. Cybern. Inf. Technol. 17(2), 28–43 (2017). https://content.sciendo.com/view/journals/pralin/108/1/article-p159.xml
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Article MATH Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: ACL-2002: 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. ACL, Philadelphia, PA (2002)
Google Scholar
Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.M.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis, S. (ed.) Knowledge Mining, vol. 185, pp. 255–279. Springer, Berlin, Heidelberg (2005). https://doi.org/10.1007/3-540-32394-5_20
Pinnis, M., Ljubešić, N., Ştefănescu, D., Skadina, I., Tadić, M., Gornostay, T.: Term extraction, tagging, and mapping tools for under-resourced languages. In: Proceedings of the 10th Conference on Terminology and Knowledge Engineering (TKE 2012), pp. 193–208. Madrid, Spain (2012)
Google Scholar
Popović, M.: chrF: character n-gram f-score for automatic MT evaluation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 392–395. Association for Computational Linguistics, Lisbon, Portugal, September 2015
Google Scholar
Popović, M.: Comparing language related issues for NMT and PBMT between German and English. Prague Bull. Math. Linguist. 108(1), 209–220 (2017)
Article Google Scholar
Press, O., Wolf, L.: Using the output embedding to improve language models. CoRR abs/1608.05859 (2016). http://arxiv.org/abs/1608.05859
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. CoRR abs/1511.06709 (2015). http://arxiv.org/abs/1511.06709
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725. Association for Computational Linguistics, Berlin, Germany, August 2016
Google Scholar
Skadinš, R., Purinš, M., Skadina, I., Vasiljevs, A.: Evaluation of SMT in localization to under-resourced inflected language. In: Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT 2011), pp. 35–40. Leuven, Belgium (2011)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: In Proceedings of the 7th Biennial Conference of the Association for Machine Translation in the Americas (AMTA-2006), pp. 223–231. Cambridge, Massachusetts (2006)
Google Scholar
Specia, L., et al.: Translation quality and productivity: a study on rich morphology languages. In: Proceedings of MT Summit XVI, the 16th Machine Translation Summit, pp. 55–71. Asia-Pacific Association for Machine Translation, Nagoya, Japan (2017)
Google Scholar
Stanojević, M., Sima’an, K.: Beer: better evaluation as ranking. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 414–419. Association for Computational Linguistics, Baltimore, Maryland, USA, June 2014
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 3104–3112. NIPS 2014, Montreal, Canada (2014)
Google Scholar
Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’2012), pp. 2214–2218. Istanbul, Turkey (2012)
Google Scholar
Toral, A., Sánchez-Cartagena, V.M.: A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. CoRR abs/1701.02901 (2017). http://arxiv.org/abs/1701.02901
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Vaswani, A., Zhao, Y., Fossum, V., Chiang, D.: Decoding with large-scale neural language models improves translation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1387–1392. Association for Computational Linguistics, Seattle, Washington, USA, October 2013
Google Scholar
Vintar, V.: Terminology translation accuracy in statistical versus neural MT: an evaluation for the English-Slovene language pair. In: Du, J., Arcan, M., Liu, Q., Isahara, H. (eds.) Proceedings of the LREC 2018 Workshop MLP-MomenT: The Second Workshop on Multi-Language Processing in a Globalising World and The First Workshop on Multilingualism at the intersection of Knowledge Bases and Machine Translation, pp. 34–37. European Language Resources Association (ELRA), Miyazaki, Japan, May 2018
Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016). http://arxiv.org/abs/1609.08144
Yeh, A.: More accurate tests for the statistical significance of result differences. In: Proceedings of the 18th Conference on Computational Linguistics - Volume 2, COLING 2000, pp. 947–953. Saarbrücken, Germany (2000)
Google Scholar
Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B.: The united nations parallel corpus v1.0. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Portorož, Slovenia (2016)
Google Scholar

Download references

Acknowledgments

The ADAPT Centre for Digital Content Technology is funded under the Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund.

Author information

Authors and Affiliations

Department of Computing, South East Technological University, Carlow, Ireland
Rejwanul Haque
School of Computing, Munster Technological University, Cork, Ireland
Mohammed Hasanuzzaman
School of Computing, Dublin City University, Glasnevin, Dublin 9, Dublin, Ireland
Andy Way

Authors

Rejwanul Haque
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Hasanuzzaman
View author publications
You can also search for this author in PubMed Google Scholar
Andy Way
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rejwanul Haque .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haque, R., Hasanuzzaman, M., Way, A. (2023). Evaluating Terminology Translation in MT. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-24337-0_35
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24336-3
Online ISBN: 978-3-031-24337-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating Terminology Translation in MT