Skip to main content

Advertisement

Log in

Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Machine translation performance prediction (MTPP) system (MTPPS) is an automatic, accurate, language and natural language processing (NLP) output independent prediction model. MTPPS is optimal by the capability to predict translation performance without even using the translation by using only the source, bypassing MT model complexity. MTPPS was casted for tasks involving similarity of text in machine translation (MT), semantic similarity, and parsing of sentences. We present large scale modeling and prediction experiments on MTPP dataset (MTPPDAT) covering 3800 document- and 380,000 sentence-level prediction in 7 different domains using 3800 different MT systems. We provide theoretical and experimental results, empirical lower and upper bounds on the prediction tasks, rank the features used, and present current results. We show that we only need 57 labeled instances at the document-level and 17 at the sentence-level to reach current prediction results. MTPPS achieve \(4\%\) error rate at the document-level and \(45\%\) at the sentence-level relative to the magnitude of the target, \(61\%\) and \(27\%\) relatively better than a mean predictor correspondingly, and \(40\%\) better than the nearest neighbor baseline. Referential translation machines use MTPPS and achieve top results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The MTPPDAT analyzed during the this study is available in the https://github.com/bicici/MTPPDAT repository.

Notes

  1. https://github.com/bicici/relative_evaluation.

  2. Moses contain sparse features such as translation probability of phrase table entries or their frequencies. Using about 100k sparse features improve the performance by 0.3 BLEU points in German to English and by 0.9 BLEU points in Russian to English translation directions [17]. Without using 300 sparse features, the performance drops by 0.2 BLEU points on average for 6 translation directions [18].

  3. Available at https://github.com/bicici/MTPPDAT.

  4. Moses EMS configs are available: https://github.com/bicici/ParFDAWMT16.

  5. \(A>B\) indicates that A’s performance is statistically significantly (s.s.) better than B’s and

    figure a

    indicates better but not s.s. better performance with limited transitivity of statistical equivalance based on the number of lines below. OOD

    figure b

    says that OOD is not s.s. better than the three following but s.s. better than the fourth.

  6. SVR results for ALL used linear kernel instead of rbf.

  7. Test set RAE, MAER, and MRAER for QET18 are not included because test set true labels are not available.

  8. http://www.statmt.org/wmt19/qe-task.html.

References

  1. Avramidis E. Quality estimation for machine translation output using linguistic analysis and decoding features. In: Seventh workshop on statistical machine translation; 2012. p. 84–90.

  2. Biçici E, Specia L. Quest for high quality machine translation. Prague Bull Math Linguist. 2015;103:43–64. https://doi.org/10.1515/pralin-2015-0003.

    Article  Google Scholar 

  3. Bojar O, Buck C, Chatterjee R, Federmann C, Haddow B, Huck M, Yepes JA, Kreutzer J, Logacheva V, Neveol A, Neves M, Koehn P, Monz C, Negri M, Post M, Riezler S, Sokolov A, Specia L, Verspoor K, Turchi M. In: Second conf. on machine translation, Copenhagen, Denmark; 2017.

  4. Biçici E, Way A. Referential translation machines for predicting semantic similarity. Lang Resour Eval. 2015. https://doi.org/10.1007/s10579-015-9322-7.

    Article  Google Scholar 

  5. Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N. In: Confidence estimation for machine translation. Coling 2004, Geneva, Switzerland; 2004. p. 315–321.

  6. Gamon M, Aue A, Smets M. Sentence-level MT evaluation without reference translations: beyond language modeling. In: 10th Conf. of the European assoc. for machine translation (EAMT), Budapest; 2005.

  7. Ravi S, Knight K, Soricut R. Automatic prediction of parser accuracy. In: Conf. on empirical methods in NLP, Stroudsburg; 2008. p. 887–896.

  8. Biçici E. Predicting the performance of parsing with referential translation machines. Prague Bull Math Ling. 2016;106:31–44. https://doi.org/10.1515/pralin-2016-0010.

    Article  Google Scholar 

  9. Soricut R, Echihabi A. Trustrank: inducing trust in automatic translations via ranking. In: 48th Annual meeting of the Assoc. for Comp. Ling.; 2010. p. 612–621.

  10. Papineni K, Roukos S, Ward T, Zhu W-J. BLEU: a method for automatic evaluation of machine translation. In: 40th Annual meeting of the Assoc. for Comp. Ling., Philadelphia; 2002. p. 311–318.

  11. Biçici E, Groves D, van Genabith J. Predicting sentence translation quality using extrinsic and language independent features. Mach Transl. 2013;27(3–4):171–92. https://doi.org/10.1007/s10590-013-9138-4.

    Article  Google Scholar 

  12. Huang F, Xu J-M, Ittycheriah A, Roukos S. Adaptive hter estimation for document-specific mt post-editing. In: 52nd annual meeting of Assoc. for Comp. Ling.; 2014, p. 861–870.

  13. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J. A study of translation edit rate with targeted human annotation. In: Assoc. for machine translation in the Americas; 2006.

  14. Biçici E. RTM results for predicting translation performance. In: Proc. of the third conf. on machine translation (WMT18), Brussels; 2018. p. 765–769. https://aclweb.org/anthology/papers/W/W18/W18-6458/.

  15. Biçici E. Predicting translation performance with referential translation machines. In: Proc. of the second conf. on machine translation (WMT17), Copenhagen; 2017. p. 540–544. http://www.aclweb.org/anthology/W17-4759.

  16. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E. Moses: open source toolkit for statistical machine translation. In: 45th annual meeting of the Assoc. for Comp. Ling., Prague; 2007. p. 177–180.

  17. Eidelman V, Wu K, Ture F, Resnik P, Lin J. Towards efficient large-scale feature-rich statistical machine translation. In: Eighth workshop on statistical machine translation, Sofia; 2013. p. 128–133.

  18. Haddow B, Huck M, Birch A, Bogoychev N, Koehn P. The edinburgh/jhu phrase-based machine translation systems for wmt 2015. In: Tenth workshop on statistical machine translation, Lisbon; 2015. p. 126–133.

  19. Costa-jussà RM, Fonollosa RJA. Character-based neural machine translation. In: 54th Annual meeting of the Assoc. for Comp. Ling., Berlin; 2016. p. 357–361.

  20. Sennrich R, Haddow B, Birch A. Edinburgh neural machine translation systems for WMT 16. In: First conf. on machine translation, Berlin; 2016. p. 371–376. https://doi.org/10.18653/v1/W16-2323.

  21. Biçici E. The regression model of machine translation. PhD thesis, Koç University. 2011. Supervisor: Deniz Yuret.

  22. Biçici E. Machine translation with parfda, Moses, kenlm, nplm, and PRO. In: Proc. of the fourth conf. on machine translation (WMT19), Florence; 2019. p. 122–128. https://doi.org/10.18653/v1/W19-5306.

  23. Seginer Y. Learning syntactic structure. PhD thesis, Universiteit van Amsterdam. 2007.

  24. Biçici E. Context-based sentence alignment in parallel corpora. 9th International Conf. on Intelligent Text Processing and Computational Linguistics (CICLing 2008). Lecture Notes in Computer Science vol. 4919, p. 434–444. 2008. https://doi.org/10.1007/978-3-540-78135-6_37.

  25. Biçici E, Yuret D. Optimizing instance selection for statistical machine translation with feature decay algorithms. IEEE/ACM Transactions On Audio, Speech, and Language Processing (TASLP), vol. 23, p. 339–350. 2015. https://doi.org/10.1109/TASLP.2014.2381882.

  26. Brown PF, Pietra SAD, Pietra VJD, Mercer RL. The mathematics of statistical machine translation: parameter estimation. Comp Ling. 1993;19(2):263–311.

    Google Scholar 

  27. Popović M. chrf: character n-gram f-score for automatic mt evaluation. In: Tenth workshop on statistical machine translation, Lisbon; 2015. p. 392–395.

  28. Sagemo O, Stymne S. The uu submission to the machine translation quality estimation task. In: First conf. on machine translation, Berlin; 2016. p. 825–830.

  29. Biçici E. Domain adaptation for machine translation with instance selection. Prague Bull Math Ling. 2015;103:5–20. https://doi.org/10.1515/pralin-2015-0001.

    Article  Google Scholar 

  30. Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L. Findings of the 2012 workshop on statistical machine translation. In: Seventh workshop on statistical machine translation, Montréal; 2012. p. 10–51.

  31. Biçici E. ParFDA for instance selection for statistical machine translation. In: Proc. of the first conf. on machine translation (WMT16), Berlin; 2016. p. 252–258. https://aclanthology.info/papers/W16-2306/w16-2306.

  32. Bojar O, Buck C, Federmann C, Haddow B, Koehn P, Leveling J, Monz C, Pecina P, Post M, Saint-Amand H, Soricut R, Specia L, Tamchyna A. Findings of the 2014 workshop on statistical machine translation. In: Ninth workshop on statistical machine translation, Baltimore; 2014. p. 12–58.

  33. Callison-Burch C, Koehn P, Monz C, Zaidan OF. Findings of the 2011 workshop on statistical machine translation. In: Sixth workshop on statistical machine translation, Edinburgh; 2011. p. 22–64.

  34. Smola AJ, Schölkopf B. A tutorial on support vector regression. Stat Comput. 2004;14(3):199–222.

    Article  MathSciNet  Google Scholar 

  35. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63(1):3–42.

    Article  Google Scholar 

  36. Bishop CM. Pattern recognition and machine learning (information science and statistics). Berlin, Heidelberg: Springer; 2006.

    MATH  Google Scholar 

  37. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.

    MathSciNet  MATH  Google Scholar 

  38. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46(1–3):389–422.

    Article  Google Scholar 

  39. Specia L, Cancedda N, Dymetman M, Turchi M, Cristianini N. Estimating the sentence-level quality of machine translation systems. In: 13th annual conf. of the European assoc. for machine translation (EAMT), Barcelona; 2009. p. 28–35.

  40. Smola AJ, Murata N, Schölkopf B, Müller KR. Asymptotically optimal choice of \(\varepsilon\)-loss for support vector machines. In: Niklasson L, Boden M, Ziemke T, editors. Berlin: Int. Conf. on Artificial Neural Networks; 1998. p. 105–10.

  41. Kuncheva LI, Rodríguez JJ. A weighted voting framework for classifiers ensembles. Knowl Inf Syst. 2014;38(2):259–75.

    Article  Google Scholar 

  42. Perrone M, Cooper L. When networks disagree: Ensemble methods for hybrid neural networks. Technical report: Brown Univ. Providence RI Inst. for Brain and Neural Systems; 1992.

  43. Ting KM, Witten IH. Issues in stacked generalization. J Artif Intell Res. 1999;10:271–89.

    Article  Google Scholar 

  44. Polley EC, van der Laan MJ. Super learner in prediction. Technical report, U.C. Berkeley Division of Biostatistics (May 2010). https://biostats.bepress.com/ucbbiostat/paper266.

  45. Dudoit S, van der Laan MJ. Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Stat Methodol. 2005;2(2):131–54. https://doi.org/10.1016/j.stamet.2005.02.003.

    Article  MathSciNet  MATH  Google Scholar 

  46. Vapnik V. Statistical learning theory. Wiley-Interscience; 1998.

  47. NIST/SEMATECH: NIST/SEMATECH e-Handbook of Statistical Methods. 2020. http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm. http://www.itl.nist.gov/div898/handbook/.

  48. Biçici E, Yuret D. RegMT system for machine translation, system combination, and evaluation. In: Sixth workshop on statistical machine translation, Edinburgh, 2011; p. 323–329. http://www.aclweb.org/anthology/W11-2137.

  49. Biçici E. RTM at SemEval-2016 task 1: Predicting semantic similarity with referential translation machines and related statistics. In: SemEval-2016: Semantic Evaluation Exercises-Inter. Workshop on Semantic Evaluation, San Diego; 2016. p. 758–764. https://aclanthology.info/papers/S16-1117/s16-1117.

  50. Schapire RE, Freund Y, Bartlett P, Lee WS. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Statist. 1998;26(5):1651–86. https://doi.org/10.1214/aos/1024691352.

    Article  MathSciNet  MATH  Google Scholar 

  51. Kozlova A, Shmatova M, Frolov A. Ysda participation in the wmt’16 quality estimation shared task. In: First conf. on machine translation, Berlin; 2016. p. 793–799.

  52. Bojar O, Buck C, Chatterjee R, Federmann C, Guillou L, Haddow B, Huck M, Yepes JA, Neveol A, Neves M, Pecina P, Popel M, Koehn P, Monz C, Negri M, Post M, Specia L, Verspoor K, Tiedemann J, Turchi M. First Conf. on Machine Translation (2016). Chap. First Conf. on Machine Translation.

  53. Lavie A, Agarwal A. METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Second workshop on statistical machine translation, Prague; 2007. p. 228–231.

  54. Shah K, Avramidis E, Biçici E, Specia L. QuEst-design, implementation and extensions of a framework for machine translation quality estimation. Prague Bull Math Ling. 2013;100:19–30. https://doi.org/10.2478/pralin-2013-0008.

    Article  Google Scholar 

  55. Specia L, Paetzold G, Scarton C. Multi-level translation quality prediction with QuEst++. In: Proc. of ACL-IJCNLP 2015 system demonstrations, Beijing; 2015. p. 115–120.

  56. Ive J, Blain F, Specia L. deepQuest: a framework for neural-based quality estimation. In: Proc. of the 27th intl. conf. on computational linguistics, Santa Fe; 2018. p. 3146–3157.

Download references

Funding

This study was funded by TÜBÏTAK BÏDEB-2232 (118C008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ergun Biçici.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Biçici, E. Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation. SN COMPUT. SCI. 3, 297 (2022). https://doi.org/10.1007/s42979-022-01183-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01183-0

Keywords