Skip to main content

Advertisement

Log in

Language-related issues for NMT and PBMT for English–German and English–Serbian

  • Published:
Machine Translation

Abstract

This work presents an extensive comparison of language-related problems for neural machine translation (NMT) and phrase-based machine translation (PBMT) for German-to-English, English-to-German and English-to-Serbian. The explored issues are related both to the characteristics of the languages as well as to the (machine) translation process and, although related, go beyond typical translation error classes. It is shown that the main advantage of the NMT approach consists of better generating verb forms, avoiding verb omissions, as well as better handling of English noun collocations and negation. It is also shown that the main obstacles for the NMT system are prepositions, translation of English (source) ambiguous words and generating English (target) continuous and perfect tenses. In addition, preliminary experiments show that a number of issues are complementary, i.e., not occurring in the same segments and/or in the same form. This means that a combination or hybridisation of the NMT and PBMT approaches is a promising direction for improving both types of systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://www.statmt.org/wmt16/.

  2. http://www.statmt.org/wmt16/translation-task.html.

  3. http://server1.nlp.insight-centre.org/asistent/.

  4. We acknowledge that the main deficiency of the described approach is poor scalability, since the evaluation procedure is time-consuming and also resource-intensive. Furthermore, the annotators have to be familiar with both linguistic phenomena and the translation process, and to be fluent in both the source and the target language.

  5. https://github.com/m-popovic/german-english_pbmt-nmt-issues.

References

  • Arčan M, Popović M, Buitelaar P (2016) Asistent—a machine translation system for Slovene, Serbian and Croatian. In: Proceedings of the conference on language technologies and digital humanities, Ljubljana, Slovenia, p 13–20

  • Bentivogli L, Bisazza A, Cettolo M, Federico M (2016) Neural versus phrase-based machine translation quality: a case study. In: Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP 2016), Austin, Texas, p 257–267

  • Bojar O, Chatterjee R, Federmann C, Graham Y, Haddow B, Huck M, Jimeno Yepes A, Koehn P, Logacheva V, Monz C, Negri M, Neveol A, Neves M, Popel M, Post M, Rubino R, Scarton C, Specia L, Turchi M, Verspoor K, Zampieri M (2016) Findings of the 2016 conference on machine translation. In: Proceedings of the first conference on machine translation (WMT 2016), Berlin, Germany, p 131–198

  • Comelles E, Atserias J, Arranz V, Castellón I (2012) VERTa: linguistic features in MT evaluation. In: Proceedings of the 8th international conference on language resources and evaluation (LREC 2012), Istanbul, Turkey, p 3944–3950

  • Comelles E, Arranz V, Castellón I (2016) Guiding automatic MT evaluation by means of linguistic features. Digital Scholarsh Humanit 29(2):761–778

    Google Scholar 

  • Farrús M, Costa-Jussà MR, Mariño JB, Fonollosa JAR (2010) Linguistic-based evaluation criteria to identify statistical machine translation errors. In: Proceedings of the 14th annual conference of the European Association for Machine Translation (EAMT 2010), Saint-Raphaël, France, p 167–173

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics companion volume proceedings of the demo and poster sessions, Prague, Czech Republic, p 177–180

  • Niehues J, Cho E, Ha T, Waibel A (2016) Pre-translation for neural machine translation. In: Proceedings of the 26th international conference on computational linguistics (CoLing 2016), Osaka, Japan, p 1828–1836

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, p 311–318

  • Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the 10th workshop on statistical machine translation (WMT 2015), Lisbon, Portugal, p 392–395

  • Popović M, Arčan M (2015) Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages. In: Proceedings of the 18th annual conference of the European Association for Machine Translation (EAMT 2015), Antalya, Turkey, p 97–104

  • Sennrich R, Haddow B, Birch A (2016) Edinburgh neural machine translation systems for WMT16. In: Proceedings of the 1st conference on machine translation (WMT 2016), Berlin, Germany, p 371–376

  • Toral A, Sánchez-Cartagena VM (2017) A multifaceted evaluation of neural versus statistical machine translation for 9 language directions. In: Proceedings of the 15th conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), Valencia, Spain, p 1063–1073

  • Williams P, Sennrich R, Nadejde M, Huck M, Haddow B, Bojar O (2016) Edinburgh’s statistical machine translation systems for WMT16. In: Proceedings of the 1st conference on machine translation (WMT 2016), Berlin, Germany, p 399–410

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maja Popović.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Popović, M. Language-related issues for NMT and PBMT for English–German and English–Serbian. Machine Translation 32, 237–253 (2018). https://doi.org/10.1007/s10590-018-9219-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-018-9219-5

Keywords