Skip to main content
Log in

Analysing terminology translation errors in statistical and neural machine translation

  • Published:
Machine Translation

Abstract

Terminology translation plays a critical role in domain-specific machine translation (MT). Phrase-based statistical MT (PB-SMT) has been the dominant approach to MT for the past 30 years, both in academia and industry. Neural MT (NMT), an end-to-end learning approach to MT, is steadily taking the place of PB-SMT. In this paper, we conduct comparative qualitative evaluation and comprehensive error analysis on terminology translation in PB-SMT and NMT in two translation directions: English-to-Hindi and Hindi-to-English. To the best of our knowledge, there is no gold standard available for evaluating terminology translation quality in MT. For this reason we select an evaluation test set from a legal domain corpus and create a gold standard for evaluating terminology translation in MT. We also propose an error typology taking the terminology translation errors in MT into consideration. We translate sentences of the test set with our MT systems and terminology translations are manually classified as per the error typology. We evaluate the MT system’s performance on terminology translation, and demonstrate our findings, unraveling strengths, weaknesses, and similarities of PB-SMT and NMT in the area of term translation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The MT research community views WMT translation shared tasks (http://www.statmt.org/wmt19/.) as the benchmark for the evaluation of automatic translation systems. In the WMT16 translation shared task (Bojar et al. 2016), we witnessed the rise of the NMT approach that surpassed the then mainstream method (i.e. PB-SMT) in a number of translation tasks (e.g. Sennrich et al. 2016a). In the WMT18 translation shared task (Bojar et al. 2018), the majority of the submissions (33) were based on deep-learning approaches, and only three submissions were PB-SMT models.

  2. International Workshop on Spoken Language Translation (http://workshop2015.iwslt.org/).

  3. A field within geomorphology, specializing in the study of karst formations. https://en.wiktionary.org/wiki/karstology.

  4. https://www.isi.edu/natural-language/software/nplm/.

  5. http://www.statmt.org/moses/giza/GIZA++.html.

  6. http://www.cfilt.iitb.ac.in/iitb_parallel/.

  7. http://opus.lingfil.uu.se/.

  8. https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl.

  9. https://en.wikipedia.org/wiki/SDL_Trados_Studio.

  10. https://en.wikipedia.org/wiki/PyQt.

  11. https://github.com/rejwanul-adapt/TermMarker.

  12. https://github.com/rejwanul-adapt/EnHiTerminologyData.

  13. For the sake of clarity we use Roman instead of the Devanagari scripts for Hindi when showing the translation examples. Note that the characters of the Hindi corpus were in Devanagari scripts.

  14. Hindi is a language whose first alphabet should be capital. However, we carried out experiments with lowercased characters. This is why we show this named-entity in lowercased characters.

  15. In this example, the reference English sentence is the literal translation of the source Hindi sentence.

  16. Halsbury is a location name whose first alphabet is here a lowercased character (cf. footnote 14).

References

  • Arčan M, Buitelaar P (2017) Translating domain-specific expressions in knowledge bases with neural machine translation. CoRR. arXiv:1709.02184

  • Arčan M, Turchi M, Tonelli S, Buitelaar P (2017) Leveraging bilingual terminology to improve machine translation in a cat environment. Nat Lang Eng 23(5):763–788

    Article  Google Scholar 

  • Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. CoRR. arXiv:1607.06450

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations (ICLR 2015), San Diego, CA

  • Bentivogli L, Bisazza A, Cettolo M, Federico M (2016) Neural versus phrase-based machine translation quality: a case study. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 257–267, Austin, TX

  • Beyer AM, Macketanz V, Burchardt A, Williams P (2017) Can out-of-the-box NMT beat a Domain-trained Moses on Technical Data? In: Proceedings of EAMT user studies and project/product descriptions, pp 41–46, Prague, Czech Republic

  • Bojar O, Diatka V, Rychlý P, Straňák P, Suchomel V, Tamchyna A, Zeman D (2014) HindEnCorp—Hindi-English and Hindi-only corpus for machine translation. In: Proceedings of the ninth international language resources and evaluation conference (LREC’14), pp 3550–3555, Reykjavik, Iceland

  • Bojar O, Chatterjee R, Federmann C, Graham Y, Haddow B, Huck M, Jimeno Yepes A, Koehn P, Logacheva V, Monz C, Negri M, Neveol A, Neves M, Popel M, Post M, Rubino R, Scarton C, Specia L, Turchi M, Verspoor K, Zampieri M (2016) Findings of the 2016 conference on machine translation. In: Proceedings of the first conference on machine translation, pp 131–198, Berlin, Germany

  • Bojar O, Federmann C, Fishel M, Graham Y, Haddow B, Huck M, Koehn P, Monz C (2018) Findings of the 2018 conference on machine translation (WMT18). In: Proceedings of the third conference on machine translation, vol. 2: shared task papers, pp 272–307. Association for Computational Linguistics, Belgium, Brussels

  • Burchardt A, Macketanz V, Dehdari J, Heigold G, Peter J-T, Williams P (2017) A linguistic evaluation of rule-based, phrase-based, and neural MT engines. Prague Bull Math Linguist 108(1):159–170

    Article  Google Scholar 

  • Castilho S, Moorkens J, Gaspari F, Sennrich R, Sosoni V, Georgakopoulou P, Lohar P, Way A, Barone AVM, Gialama M (2017) A comparative quality evaluation of PBSMT and NMT using professional translators. In: Proceedings of MT Summit XVI, the 16th machine translation summit, pp 116–131, Nagoya, Japan

  • Cettolo M, Niehues J, Stüker S, Bentivogli L, Cattoni R, Federico M (2015) The IWSLT 2015 evaluation campaign. In: Proceedings of the twelfth international workshop on spoken language translation (IWSLT 2015), Da Nang, Vietnam

  • Chatterjee R, Negri M, Turchi M, Federico M, Specia L, Blain F (2017) Guiding neural machine translation decoding with external knowledge. In: Proceedings of the second conference on machine translation, pp 157–168. Association for Computational Linguistics, Copenhagen, Denmark

  • Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, pp 427–436, Montréal, Canada

  • Cho K, van Merriënboer B, Gülçehre Ç, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1724–1734, Doha, Qatar

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  • Crego J M, Kim J, Klein G, Rebollo A, Yang K, Senellart J, Akhanov E, Brunelle P, Coquard A, Deng Y, Enoue S, Geiss C, Johanson J, Khalsa A, Khiari R, Ko B, Kobus C, Lorieux J, Martins L, Nguyen D, Priori A, Riccardi T, Segal N, Servan C, Tiquet C, Wang B, Yang J, Zhang D, Zhou J, Zoldan P (2016) Systran’s pure neural machine translation systems. CoRR. arXiv:1610.05540

  • Denkowski M, Lavie A (2011) Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems. In: Proceedings of the sixth workshop on statistical machine translation, pp 85–91, Edinburgh, Scotland

  • Durrani N, Schmid H, Fraser A (2011) A joint sequence translation model with integrated reordering. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 1045–1054, Portland, Oregon, USA

  • Farajian MA, Turchi M, Negri M, Bertoldi N, Federico M (2017) Neural vs. phrase-based machine translation in a multi-domain scenario. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 280–284, Valencia, Spain

  • Farajian MA, Bertoldi N, Negri M, Turchi M, Federico M (2018) Evaluation of terminology translation in instance-based neural MT adaptation. In: Proceedings of the 21st Annual conference of the european association for machine translation, pp 149–158, Alicante, Spain

  • Gage P (1994) A new algorithm for data compression. C Users J 12(2):23–38

    Google Scholar 

  • Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. CoRR. arXiv:1512.05287

  • Haque R, Penkale S, Way A (2014) Bilingual termbank creation via log-likelihood comparison and phrase-based statistical machine translation. In: Proceedings of the 4th international workshop on computational terminology (Computerm), pp 42–51, Dublin, Ireland

  • Haque R, Penkale S, Way A (2018) TermFinder: log-likelihood comparison and phrase-based statistical machine translation models for bilingual terminology extraction. Lang Resour Eval 52(2):365–400

    Article  Google Scholar 

  • Haque R, Hasanuzzaman M, Way A (2019a) Investigating terminology translation in statistical and neural machine translation: a case study on English-to-Hindi and Hindi-to-English. In: Proceedings of RANLP 2019: recent advances in natural language processing, pp 437–446, Varna, Bulgaria

  • Haque R, Hasanuzzaman M, Way A (2019b) TermEval: an automatic metric for evaluating terminology translation in MT. In: Proceedings of CICLing 2019, the 20th international conference on computational linguistics and intelligent text processing, La Rochelle, France

  • Haque R, Hasanuzzaman M, Way A (2019c) Terminology translation in low-resource scenarios. Information 10(9):273

    Article  Google Scholar 

  • Hasler E, Gispert A, Iglesias G, Byrne B (2018) Neural machine translation decoding with terminology constraints. In: Proceedings of the 2018 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (short papers), pp 506–512. Association for Computational Linguistics, New Orleans, LA

  • Hassan H, Aue A, Chen C, Chowdhary V, Clark J, Federmann C, Huang X, Junczys-Dowmunt M, Lewis W, Li M, Liu S, Liu T, Luo R, Menezes A, Qin T, Seide F, Tan X, Tian F, Wu L, Wu S, Xia Y, Zhang D, Zhang Z, Zhou M (2018) Achieving human parity on automatic Chinese to English news translation. CoRR. arXiv:1803.05567

  • He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRR. arXiv:1512.03385

  • Heafield K, Pouzyrevsky I, Clark JH, Koehn P (2013) Scalable modified Kneser–Ney language model estimation. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics (vol. 2: short papers), pp 690–696, Sofia, Bulgaria

  • Hokamp C, Liu Q (2017) Lexically constrained decoding for sequence generation using grid beam search. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (vol. 1: long papers), pp 1535–1546, Vancouver, BC

  • Huang G, Zhang J, Zhou Y, Zong C (2016) A simple, straightforward and effective model for joint bilingual terms detection and word alignment in SMT. In: Natural language understanding and intelligent applications, ICCPOL/NLPCC 2016, vol 10102, pp 103–115

  • Huang L, Chiang D (2007) Forest rescoring: faster decoding with integrated language models. In: Proceedings of the 45th annual meeting of the Association of Computational Linguistics, pp 144–151, Prague, Czech Republic

  • Isabelle P, Cherry C, Foster GF (2017) A challenge set approach to evaluating machine translation. CoRR. arXiv:1704.07431

  • James F (2000) Modified Kneser-Ney smoothing of n-gram models. Tech. Rep. 00.07. Research Institute for Advanced Computer Science

  • Junczys-Dowmunt M, Dwojak T, Hoang H (2016) Is neural machine translation ready for deployment? A case study on 30 translation directions. CoRR. arXiv:1610.01108

  • Junczys-Dowmunt M, Grundkiewicz R, Dwojak T, Hoang H, Heafield K, Neckermann T, Seide F, Germann U, Fikri Aji A, Bogoychev N, Martins AFT, Birch A (2018) Marian: Fast neural machine translation in C++. In: Proceedings of ACL 2018, system demonstrations, pp 116–121. Association for Computational Linguistics, Melbourne, Australia

  • Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing (EMNLP), pp 1700–1709, Seattle, WA

  • Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR. arXiv:1412.6980

  • Kinoshita S, Oshio T, Mitsuhashi T (2017) Comparison of SMT and NMT trained with large patent corpora: Japio at WAT2017. In: Proceedings of the 4th workshop on Asian translation (WAT2017), pp 140–145. Asian Federation of Natural Language Processing

  • Klubička F, Toral A, Sánchez-Cartagena VM (2017) Fine-grained human evaluation of neural versus phrase-based machine translation.CoRR, arXiv:1706.04389

  • Klubička F, Toral A, Sánchez-Cartagena VM (2018) Quantitative fine-grained human evaluation of machine translation systems: a case study on English to Croatian. CoRR. arXiv:1802.01451

  • Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Lin D, Wu D (eds) Proceedings of the 2004 conference on empirical methods in natural language processing (EMNLP), pp 388–395, Barcelona, Spain

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of MT Summit X: the tenth machine translation summit, pp 79–86, Phuket, Thailand

  • Koehn P, Knowles R (2017) Six challenges for neural machine translation. CoRR. arXiv:1706.03872

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL 2003: conference combining Human Language Technology conference series and the North American Chapter of the Association for Computational Linguistics conference series, Edmonton, AB, pp 48–54

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, College W, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In: ACL 2007, proceedings of the interactive poster and demonstration sessions, pp 177–180, Prague, Czech Republic

  • Kunchukuttan A, Mehta P, Bhattacharyya P (2017) The IIT Bombay English-Hindi parallel corpus. CoRR 1710:02855

    Google Scholar 

  • Lommel AR, Uszkoreit H, Burchardt A (2014) Multidimensional Quality Metrics (MQM): a framework for declaring and describing translation quality metrics. Tradumática: tecnologies de la traducció (12):455–463

  • Long Z, Utsuro T, Mitsuhashi T, Yamamoto M (2016) Translation of patent sentences with a large vocabulary of technical terms using neural machine translation. In: Proceedings of the 3rd workshop on Asian translation (WAT2016), pp 47–57, Osaka, Japan

  • Macketanz V, Avramidis E, Burchardt A, Helcl J, Srivastava A (2017) Machine translation: phrase-based, rule-based and neural approaches with linguistic evaluation. Cybern Inf Technol 17(2):28–43

    Google Scholar 

  • Mitkov R (2002) Anaphora resolution. Longman, Harlow

    MATH  Google Scholar 

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51

    Article  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL-2002: 40th annual meeting of the Association for Computational Linguistics. ACL, Philadelphia, PA, pp 311–318

  • Pinnis M (2015) Dynamic terminology integration methods in statistical machine translation. In: Proceedings of the 18th annual conference of the European Association for Machine Translation (EAMT 2015), pp 89–96, Antalya, Turkey

  • Pinnis M, Ljubešić N, Ştefănescu D, Skadiņa I, Tadić M, Gornostay T (2012) Term extraction, tagging, and mapping tools for under-resourced languages. In: Proceedings of the 10th conference on terminology and knowledge engineering (TKE 2012), pp 193–208, Madrid, Spain

  • Popović M (2017) Comparing language related issues for NMT and pbmt between German and English. Prague Bull Math Linguist 108(1):209–220

    Article  Google Scholar 

  • Popović M, Ney H (2011) Towards automatic error analysis of machine translation output. Comput Linguist 37(4):657–688

    Article  MathSciNet  Google Scholar 

  • Post M, Vilar D (2018) Fast lexically constrained decoding with dynamic beam allocation for neural machine translation. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1 (long papers), pp 1314–1324, New Orleans, LO

  • Press O, Wolf L (2016) Using the output embedding to improve language models. CoRR. arXiv:1608.05859

  • Rigouts Terryn A, Hoste V, Lefever E (2019) In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Lang Resour Eval 54:385–418

    Article  Google Scholar 

  • Sennrich R, Haddow B, Birch A (2015) Improving neural machine translation models with monolingual data. CoRR. arXiv:1511.06709

  • Sennrich R, Haddow B, Birch A (2016a) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on machine translation, pp 371–376, Berlin, Germany

  • Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics (volume 1: long papers), pp 1715–1725, Berlin, Germany

  • Shterionov D, Nagle P, Casanellas L, Superbo R, O’Dowd T (2017) Empirical evaluation of nmt and pbsmt quality for large-scale translation production. In: User track of the 20th annual conference of the European Association for Machine Translation (EAMT), pp 74–79, Czech Republic, Prague

  • Skadiņš R, Puriņš M, Skadiņa I, Vasiļjevs A (2011) Evaluation of SMT in localization to under-resourced inflected language. In: Proceedings of the 15th international conference of the European Association for Machine Translation (EAMT 2011), pp 35–40, Leuven, Belgium

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: In Proceedings of the 7th biennial conference of the Association for Machine Translation in the Americas (AMTA-2006), pp 223–231, Cambridge, MA

  • Specia L, Harris K, Blain F, Burchardt A, Macketanz V, Skadiņa I, Negri M, Turchi M (2017) Translation quality and productivity: a study on rich morphology languages. In: Proceedings of MT summit XVI, the 16th machine translation summit, pp 55–71, Nagoya, Japan

  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of the 27th international conference on neural information processing systems, NIPS’14, pp 3104–3112, Montreal, Canada

  • Tiedemann J (2012) Parallel data, tools and interfaces in OPUS. In: Proceedings of the 8th international conference on language resources and evaluation (LREC’2012), pp 2214–2218, Istanbul, Turkey

  • Toral A, Sánchez-Cartagena VM (2017) A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. CoRR. arXiv:1701.02901

  • Toral A, Way A (2018) What level of quality can neural machine translation attain on literary text? In: Translation quality assessment. Springer, Cham, pp 263–287

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. CoRR. arXiv:1706.03762

  • Vaswani A, Zhao Y, Fossum V, Chiang D (2013) Decoding with large-scale neural language models improves translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1387–1392, Seattle, Washington, USA

  • Vintar Š (2018) Terminology translation accuracy in statistical versus neural mt: An evaluation for the English–Slovene language pair. In: Du J, Arčan M, Liu Q, Isahara H (eds) Proceedings of the LREC 2018 workshop MLP–MomenT: the second workshop on multi-language processing in a globalising world and the first workshop on multilingualism at the intersection of knowledge bases and machine translation, pp 34–37, Miyazaki, Japan. European Language Resources Association (ELRA), Paris

  • Way A (2018) Quality expectations of machine translation. In: Translation quality assessment: from principles to practice. Springer, Cham

  • Wu Y, Schuster M, Chen Z, Le Q V, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR. arXiv:1609.08144

  • Yeh A (2000) More accurate tests for the statistical significance of result differences. In: Proceedings of the 18th conference on computational linguistics, vol 2, COLING 2000, pp 947–953, Saarbrücken, Germany

  • Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016) The united nations parallel corpus v1.0. In: Proceedings of the tenth international conference on language resources and evaluation (LREC 2016), pp 3530–3534, Portorož, Slovenia

Download references

Acknowledgements

The ADAPT Centre for Digital Content Technology is funded under the Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund. This project has partially received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 713567, and the publication has emanated from research supported in part by a research grant from SFI under Grant Number 13/RC/2077.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rejwanul Haque.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Haque, R., Hasanuzzaman, M. & Way, A. Analysing terminology translation errors in statistical and neural machine translation. Machine Translation 34, 149–195 (2020). https://doi.org/10.1007/s10590-020-09251-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-020-09251-z

Keywords

Navigation