Abstract
This work presents an extensive comparison of language-related problems for neural machine translation (NMT) and phrase-based machine translation (PBMT) for German-to-English, English-to-German and English-to-Serbian. The explored issues are related both to the characteristics of the languages as well as to the (machine) translation process and, although related, go beyond typical translation error classes. It is shown that the main advantage of the NMT approach consists of better generating verb forms, avoiding verb omissions, as well as better handling of English noun collocations and negation. It is also shown that the main obstacles for the NMT system are prepositions, translation of English (source) ambiguous words and generating English (target) continuous and perfect tenses. In addition, preliminary experiments show that a number of issues are complementary, i.e., not occurring in the same segments and/or in the same form. This means that a combination or hybridisation of the NMT and PBMT approaches is a promising direction for improving both types of systems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We acknowledge that the main deficiency of the described approach is poor scalability, since the evaluation procedure is time-consuming and also resource-intensive. Furthermore, the annotators have to be familiar with both linguistic phenomena and the translation process, and to be fluent in both the source and the target language.
References
Arčan M, Popović M, Buitelaar P (2016) Asistent—a machine translation system for Slovene, Serbian and Croatian. In: Proceedings of the conference on language technologies and digital humanities, Ljubljana, Slovenia, p 13–20
Bentivogli L, Bisazza A, Cettolo M, Federico M (2016) Neural versus phrase-based machine translation quality: a case study. In: Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP 2016), Austin, Texas, p 257–267
Bojar O, Chatterjee R, Federmann C, Graham Y, Haddow B, Huck M, Jimeno Yepes A, Koehn P, Logacheva V, Monz C, Negri M, Neveol A, Neves M, Popel M, Post M, Rubino R, Scarton C, Specia L, Turchi M, Verspoor K, Zampieri M (2016) Findings of the 2016 conference on machine translation. In: Proceedings of the first conference on machine translation (WMT 2016), Berlin, Germany, p 131–198
Comelles E, Atserias J, Arranz V, Castellón I (2012) VERTa: linguistic features in MT evaluation. In: Proceedings of the 8th international conference on language resources and evaluation (LREC 2012), Istanbul, Turkey, p 3944–3950
Comelles E, Arranz V, Castellón I (2016) Guiding automatic MT evaluation by means of linguistic features. Digital Scholarsh Humanit 29(2):761–778
Farrús M, Costa-Jussà MR, Mariño JB, Fonollosa JAR (2010) Linguistic-based evaluation criteria to identify statistical machine translation errors. In: Proceedings of the 14th annual conference of the European Association for Machine Translation (EAMT 2010), Saint-Raphaël, France, p 167–173
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics companion volume proceedings of the demo and poster sessions, Prague, Czech Republic, p 177–180
Niehues J, Cho E, Ha T, Waibel A (2016) Pre-translation for neural machine translation. In: Proceedings of the 26th international conference on computational linguistics (CoLing 2016), Osaka, Japan, p 1828–1836
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, p 311–318
Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the 10th workshop on statistical machine translation (WMT 2015), Lisbon, Portugal, p 392–395
Popović M, Arčan M (2015) Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages. In: Proceedings of the 18th annual conference of the European Association for Machine Translation (EAMT 2015), Antalya, Turkey, p 97–104
Sennrich R, Haddow B, Birch A (2016) Edinburgh neural machine translation systems for WMT16. In: Proceedings of the 1st conference on machine translation (WMT 2016), Berlin, Germany, p 371–376
Toral A, Sánchez-Cartagena VM (2017) A multifaceted evaluation of neural versus statistical machine translation for 9 language directions. In: Proceedings of the 15th conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), Valencia, Spain, p 1063–1073
Williams P, Sennrich R, Nadejde M, Huck M, Haddow B, Bojar O (2016) Edinburgh’s statistical machine translation systems for WMT16. In: Proceedings of the 1st conference on machine translation (WMT 2016), Berlin, Germany, p 399–410
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Popović, M. Language-related issues for NMT and PBMT for English–German and English–Serbian. Machine Translation 32, 237–253 (2018). https://doi.org/10.1007/s10590-018-9219-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-018-9219-5