Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

Biçici, Ergun

doi:10.1007/s42979-022-01183-0

Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

Original Research
Published: 19 May 2022

Volume 3, article number 297, (2022)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Ergun Biçici ORCID: orcid.org/0000-0002-2293-2031¹

103 Accesses
1 Citation
Explore all metrics

Abstract

Machine translation performance prediction (MTPP) system (MTPPS) is an automatic, accurate, language and natural language processing (NLP) output independent prediction model. MTPPS is optimal by the capability to predict translation performance without even using the translation by using only the source, bypassing MT model complexity. MTPPS was casted for tasks involving similarity of text in machine translation (MT), semantic similarity, and parsing of sentences. We present large scale modeling and prediction experiments on MTPP dataset (MTPPDAT) covering 3800 document- and 380,000 sentence-level prediction in 7 different domains using 3800 different MT systems. We provide theoretical and experimental results, empirical lower and upper bounds on the prediction tasks, rank the features used, and present current results. We show that we only need 57 labeled instances at the document-level and 17 at the sentence-level to reach current prediction results. MTPPS achieve $4\%$ error rate at the document-level and $45\%$ at the sentence-level relative to the magnitude of the target, $61\%$ and $27\%$ relatively better than a mean predictor correspondingly, and $40\%$ better than the nearest neighbor baseline. Referential translation machines use MTPPS and achieve top results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine translation and its evaluation: a study

Article 19 February 2023

Experimenting with Different Machine Translation Models in Medium-Resource Settings

Human versus automatic quality evaluation of NMT and PBSMT

Article 08 May 2018

Data availability

The MTPPDAT analyzed during the this study is available in the https://github.com/bicici/MTPPDAT repository.

Notes

https://github.com/bicici/relative_evaluation.
Moses contain sparse features such as translation probability of phrase table entries or their frequencies. Using about 100k sparse features improve the performance by 0.3 BLEU points in German to English and by 0.9 BLEU points in Russian to English translation directions [17]. Without using 300 sparse features, the performance drops by 0.2 BLEU points on average for 6 translation directions [18].
Available at https://github.com/bicici/MTPPDAT.
Moses EMS configs are available: https://github.com/bicici/ParFDAWMT16.
$A>B$ indicates that A’s performance is statistically significantly (s.s.) better than B’s and
indicates better but not s.s. better performance with limited transitivity of statistical equivalance based on the number of lines below. OOD
says that OOD is not s.s. better than the three following but s.s. better than the fourth.
SVR results for ALL used linear kernel instead of rbf.
Test set RAE, MAER, and MRAER for QET18 are not included because test set true labels are not available.
http://www.statmt.org/wmt19/qe-task.html.

References

Avramidis E. Quality estimation for machine translation output using linguistic analysis and decoding features. In: Seventh workshop on statistical machine translation; 2012. p. 84–90.
Biçici E, Specia L. Quest for high quality machine translation. Prague Bull Math Linguist. 2015;103:43–64. https://doi.org/10.1515/pralin-2015-0003.
Article Google Scholar
Bojar O, Buck C, Chatterjee R, Federmann C, Haddow B, Huck M, Yepes JA, Kreutzer J, Logacheva V, Neveol A, Neves M, Koehn P, Monz C, Negri M, Post M, Riezler S, Sokolov A, Specia L, Verspoor K, Turchi M. In: Second conf. on machine translation, Copenhagen, Denmark; 2017.
Biçici E, Way A. Referential translation machines for predicting semantic similarity. Lang Resour Eval. 2015. https://doi.org/10.1007/s10579-015-9322-7.
Article Google Scholar
Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N. In: Confidence estimation for machine translation. Coling 2004, Geneva, Switzerland; 2004. p. 315–321.
Gamon M, Aue A, Smets M. Sentence-level MT evaluation without reference translations: beyond language modeling. In: 10th Conf. of the European assoc. for machine translation (EAMT), Budapest; 2005.
Ravi S, Knight K, Soricut R. Automatic prediction of parser accuracy. In: Conf. on empirical methods in NLP, Stroudsburg; 2008. p. 887–896.
Biçici E. Predicting the performance of parsing with referential translation machines. Prague Bull Math Ling. 2016;106:31–44. https://doi.org/10.1515/pralin-2016-0010.
Article Google Scholar
Soricut R, Echihabi A. Trustrank: inducing trust in automatic translations via ranking. In: 48th Annual meeting of the Assoc. for Comp. Ling.; 2010. p. 612–621.
Papineni K, Roukos S, Ward T, Zhu W-J. BLEU: a method for automatic evaluation of machine translation. In: 40th Annual meeting of the Assoc. for Comp. Ling., Philadelphia; 2002. p. 311–318.
Biçici E, Groves D, van Genabith J. Predicting sentence translation quality using extrinsic and language independent features. Mach Transl. 2013;27(3–4):171–92. https://doi.org/10.1007/s10590-013-9138-4.
Article Google Scholar
Huang F, Xu J-M, Ittycheriah A, Roukos S. Adaptive hter estimation for document-specific mt post-editing. In: 52nd annual meeting of Assoc. for Comp. Ling.; 2014, p. 861–870.
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J. A study of translation edit rate with targeted human annotation. In: Assoc. for machine translation in the Americas; 2006.
Biçici E. RTM results for predicting translation performance. In: Proc. of the third conf. on machine translation (WMT18), Brussels; 2018. p. 765–769. https://aclweb.org/anthology/papers/W/W18/W18-6458/.
Biçici E. Predicting translation performance with referential translation machines. In: Proc. of the second conf. on machine translation (WMT17), Copenhagen; 2017. p. 540–544. http://www.aclweb.org/anthology/W17-4759.
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E. Moses: open source toolkit for statistical machine translation. In: 45th annual meeting of the Assoc. for Comp. Ling., Prague; 2007. p. 177–180.
Eidelman V, Wu K, Ture F, Resnik P, Lin J. Towards efficient large-scale feature-rich statistical machine translation. In: Eighth workshop on statistical machine translation, Sofia; 2013. p. 128–133.
Haddow B, Huck M, Birch A, Bogoychev N, Koehn P. The edinburgh/jhu phrase-based machine translation systems for wmt 2015. In: Tenth workshop on statistical machine translation, Lisbon; 2015. p. 126–133.
Costa-jussà RM, Fonollosa RJA. Character-based neural machine translation. In: 54th Annual meeting of the Assoc. for Comp. Ling., Berlin; 2016. p. 357–361.
Sennrich R, Haddow B, Birch A. Edinburgh neural machine translation systems for WMT 16. In: First conf. on machine translation, Berlin; 2016. p. 371–376. https://doi.org/10.18653/v1/W16-2323.
Biçici E. The regression model of machine translation. PhD thesis, Koç University. 2011. Supervisor: Deniz Yuret.
Biçici E. Machine translation with parfda, Moses, kenlm, nplm, and PRO. In: Proc. of the fourth conf. on machine translation (WMT19), Florence; 2019. p. 122–128. https://doi.org/10.18653/v1/W19-5306.
Seginer Y. Learning syntactic structure. PhD thesis, Universiteit van Amsterdam. 2007.
Biçici E. Context-based sentence alignment in parallel corpora. 9th International Conf. on Intelligent Text Processing and Computational Linguistics (CICLing 2008). Lecture Notes in Computer Science vol. 4919, p. 434–444. 2008. https://doi.org/10.1007/978-3-540-78135-6_37.
Biçici E, Yuret D. Optimizing instance selection for statistical machine translation with feature decay algorithms. IEEE/ACM Transactions On Audio, Speech, and Language Processing (TASLP), vol. 23, p. 339–350. 2015. https://doi.org/10.1109/TASLP.2014.2381882.
Brown PF, Pietra SAD, Pietra VJD, Mercer RL. The mathematics of statistical machine translation: parameter estimation. Comp Ling. 1993;19(2):263–311.
Google Scholar
Popović M. chrf: character n-gram f-score for automatic mt evaluation. In: Tenth workshop on statistical machine translation, Lisbon; 2015. p. 392–395.
Sagemo O, Stymne S. The uu submission to the machine translation quality estimation task. In: First conf. on machine translation, Berlin; 2016. p. 825–830.
Biçici E. Domain adaptation for machine translation with instance selection. Prague Bull Math Ling. 2015;103:5–20. https://doi.org/10.1515/pralin-2015-0001.
Article Google Scholar
Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L. Findings of the 2012 workshop on statistical machine translation. In: Seventh workshop on statistical machine translation, Montréal; 2012. p. 10–51.
Biçici E. ParFDA for instance selection for statistical machine translation. In: Proc. of the first conf. on machine translation (WMT16), Berlin; 2016. p. 252–258. https://aclanthology.info/papers/W16-2306/w16-2306.
Bojar O, Buck C, Federmann C, Haddow B, Koehn P, Leveling J, Monz C, Pecina P, Post M, Saint-Amand H, Soricut R, Specia L, Tamchyna A. Findings of the 2014 workshop on statistical machine translation. In: Ninth workshop on statistical machine translation, Baltimore; 2014. p. 12–58.
Callison-Burch C, Koehn P, Monz C, Zaidan OF. Findings of the 2011 workshop on statistical machine translation. In: Sixth workshop on statistical machine translation, Edinburgh; 2011. p. 22–64.
Smola AJ, Schölkopf B. A tutorial on support vector regression. Stat Comput. 2004;14(3):199–222.
Article MathSciNet Google Scholar
Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63(1):3–42.
Article Google Scholar
Bishop CM. Pattern recognition and machine learning (information science and statistics). Berlin, Heidelberg: Springer; 2006.
MATH Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
MathSciNet MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46(1–3):389–422.
Article Google Scholar
Specia L, Cancedda N, Dymetman M, Turchi M, Cristianini N. Estimating the sentence-level quality of machine translation systems. In: 13th annual conf. of the European assoc. for machine translation (EAMT), Barcelona; 2009. p. 28–35.
Smola AJ, Murata N, Schölkopf B, Müller KR. Asymptotically optimal choice of $\varepsilon$-loss for support vector machines. In: Niklasson L, Boden M, Ziemke T, editors. Berlin: Int. Conf. on Artificial Neural Networks; 1998. p. 105–10.
Kuncheva LI, Rodríguez JJ. A weighted voting framework for classifiers ensembles. Knowl Inf Syst. 2014;38(2):259–75.
Article Google Scholar
Perrone M, Cooper L. When networks disagree: Ensemble methods for hybrid neural networks. Technical report: Brown Univ. Providence RI Inst. for Brain and Neural Systems; 1992.
Ting KM, Witten IH. Issues in stacked generalization. J Artif Intell Res. 1999;10:271–89.
Article Google Scholar
Polley EC, van der Laan MJ. Super learner in prediction. Technical report, U.C. Berkeley Division of Biostatistics (May 2010). https://biostats.bepress.com/ucbbiostat/paper266.
Dudoit S, van der Laan MJ. Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Stat Methodol. 2005;2(2):131–54. https://doi.org/10.1016/j.stamet.2005.02.003.
Article MathSciNet MATH Google Scholar
Vapnik V. Statistical learning theory. Wiley-Interscience; 1998.
NIST/SEMATECH: NIST/SEMATECH e-Handbook of Statistical Methods. 2020. http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm. http://www.itl.nist.gov/div898/handbook/.
Biçici E, Yuret D. RegMT system for machine translation, system combination, and evaluation. In: Sixth workshop on statistical machine translation, Edinburgh, 2011; p. 323–329. http://www.aclweb.org/anthology/W11-2137.
Biçici E. RTM at SemEval-2016 task 1: Predicting semantic similarity with referential translation machines and related statistics. In: SemEval-2016: Semantic Evaluation Exercises-Inter. Workshop on Semantic Evaluation, San Diego; 2016. p. 758–764. https://aclanthology.info/papers/S16-1117/s16-1117.
Schapire RE, Freund Y, Bartlett P, Lee WS. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Statist. 1998;26(5):1651–86. https://doi.org/10.1214/aos/1024691352.
Article MathSciNet MATH Google Scholar
Kozlova A, Shmatova M, Frolov A. Ysda participation in the wmt’16 quality estimation shared task. In: First conf. on machine translation, Berlin; 2016. p. 793–799.
Bojar O, Buck C, Chatterjee R, Federmann C, Guillou L, Haddow B, Huck M, Yepes JA, Neveol A, Neves M, Pecina P, Popel M, Koehn P, Monz C, Negri M, Post M, Specia L, Verspoor K, Tiedemann J, Turchi M. First Conf. on Machine Translation (2016). Chap. First Conf. on Machine Translation.
Lavie A, Agarwal A. METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Second workshop on statistical machine translation, Prague; 2007. p. 228–231.
Shah K, Avramidis E, Biçici E, Specia L. QuEst-design, implementation and extensions of a framework for machine translation quality estimation. Prague Bull Math Ling. 2013;100:19–30. https://doi.org/10.2478/pralin-2013-0008.
Article Google Scholar
Specia L, Paetzold G, Scarton C. Multi-level translation quality prediction with QuEst++. In: Proc. of ACL-IJCNLP 2015 system demonstrations, Beijing; 2015. p. 115–120.
Ive J, Blain F, Specia L. deepQuest: a framework for neural-based quality estimation. In: Proc. of the 27th intl. conf. on computational linguistics, Santa Fe; 2018. p. 3146–3157.

Download references

Funding

This study was funded by TÜBÏTAK BÏDEB-2232 (118C008).

Author information

Authors and Affiliations

Boğaziçi University, Istanbul, Turkey
Ergun Biçici

Authors

Ergun Biçici
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ergun Biçici.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Biçici, E. Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation. SN COMPUT. SCI. 3, 297 (2022). https://doi.org/10.1007/s42979-022-01183-0

Download citation

Received: 04 November 2021
Accepted: 27 April 2022
Published: 19 May 2022
DOI: https://doi.org/10.1007/s42979-022-01183-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Machine translation and its evaluation: a study

Experimenting with Different Machine Translation Models in Medium-Resource Settings

Human versus automatic quality evaluation of NMT and PBSMT

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now