Abstract
SyMGiza++ — a tool that computes symmetric word alignment models with the capability to take advantage of multi-processor systems — is presented. A series of fairly simple modifications to the original IBM/Giza++ word alignment models allows to update the symmetrized models between chosen iterations of the original training algorithms. We achieve a relative alignment quality improvement of more than 17% compared to Giza++ and MGiza++ on the standard Canadian Hansards task, while maintaining the speed improvements provided by the capability of parallel computations of MGiza++.
Furthermore, the alignment models are evaluated in the context of phrase-based statistical machine translation, where a consistent improvement measured in BLEU scores can be observed when SyMGiza++ is used instead of Giza++ or MGiza++.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of ACL, pp. 836–841 (1996)
Zens, R., Matusov, E., Ney, H.: Improved word alignment using a symmetric lexicon model. In: Proceedings of ACL-COLING, p. 36 (2004)
Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of ACL-COLING, pp. 104–111 (2006)
Gao, Q., Vogel, S.: Parallel implementations of word alignment tool. In: Proceedings of SETQA-NLP, pp. 49–57 (2008)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistcial Society, Series B 39(1), 1–38 (1977)
Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J., Melamed, I., Och, F., Purdy, D., Smith, N., Yarowsky, D.: Statistical machine translation. Technical report, JHU workshop (1999)
Matusov, E., Zens, R., Ney, H.: Symmetric word alignments for statistical machine translation. In: Proceedings of ACL-COLING, pp. 219–225 (2004)
Mihalcea, R., Pedersen, T.: An evaluation exercise for word alignment. In: Proceedings of HLT-NAACL, pp. 1–10 (2003)
Fraser, A., Marcu, D.: Measuring word alignment quality for statistical machine translation. Computational Linguistics 33, 239–303 (2007)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: ACL (2007)
Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Junczys-Dowmunt, M., Szał, A. (2012). SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds) Security and Intelligent Information Systems. SIIS 2011. Lecture Notes in Computer Science, vol 7053. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25261-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-25261-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25260-0
Online ISBN: 978-3-642-25261-7
eBook Packages: Computer ScienceComputer Science (R0)