Abstract
Freely available tools and language resources were used to build the VoiceTRAN statistical machine translation (SMT) system. Various configuration variations of the system are presented and evaluated. The VoiceTRAN SMT system outperformed the baseline conventional rule-based MT system in both English-Slovenian in-domain test setups. To further increase the generalization capability of the translation model for lower-coverage out-of-domain test sentences, an “MSD-recombination” approach was proposed. This approach not only allows a better exploitation of conventional translation models, but also performs well in the more demanding translation direction; that is, into a highly inflectional language. Using this approach in the out-of-domain setup of the English-Slovenian JRC-ACQUIS task, we have achieved significant improvements in translation quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Vičič, J.: Avtomatsko prevajanje iz slovenskega v angleški jezik na osnovi statističnega strojnega prevajanja (Automatic SMT: Slovenian-English), Masters’ thesis, University of Ljubljana, Slovenia (2002)
Romih, M., Holozan, P.: Slovenian-English Translation System. In: Proceedings of the LTC 2002, Ljubljana, Slovenia, p. 167 (2002)
Žganec Gros, J., Gruden, S., Mihelič, F., Erjavec, T., Vintar, Š., Holozan, P., Mihelič, A., Dobrišek, S., Žibert, J., Logar, N., Korošec, T.: The VoiceTRAN Speech Translation Demonstrator. In: Proceedings of the IS-LTC 2006, Ljubljana, Slovenia, pp. 234–239 (2006)
Sepesy Maučec, M., Kačič, Z.: Statistical machine translation from Slovenian to English. Journal of Computing and Information Technology 15(5), 47–59 (2007)
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, ELRA, Paris, pp. 2142–2147 (2006)
Erjavec, T.: The IJS-ELAN Slovene-English Parallel Corpus. International Journal of Corpus Linguistics 7(1), 1–20 (2002)
Erjavec, T.: Compilation and Exploitation of Parallel Corpora. Journal of Computing and In-formation Technology 11(2), 93–102 (2003)
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003), http://www.fjoch.com/GIZA++.html
Rosenfeld, R.: The CMU Statistical Language Modeling Toolkit, and Its Use in the 1994 ARPA CSR Evaluation. In: Proceedings of the ARPA SLT Workshop, http://www.speech.cs.cmu.edu/SLM/toolkit.html
Germann, U.: Greedy Decoding for Statistical Machine Translation in Almost Linear Time. In: Proceedings of the HLT-NAACL- 2003 (2003), http://www.isi.edu/licensed-sw/rewrite-decoder/
Turian, J.P., Shen, L., Dan Melamed, I.: Proteus Technical Report #03-005: Evaluation of Machine Translation and its Evaluation, http://nlp.cs.nyu.edu/eval/
Doddington, G.: Automatic Evaluation of Machine Translation Quality using N-gram Cooccurrence Statistics. In: Proceedings of the 2nd Human Language Technologies Conference, San Diego (2002)
Banerjee, S., Lavie, A.: METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics, Ann Arbor, Michigan (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Žganec-Gros, J., Gruden, S. (2008). MSD Recombination for Statistical Machine Translation into Highly-Inflected Languages. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-87391-4_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)