What types of word alignment improve statistical machine translation?

Lambert, Patrik; Petitrenaud, Simon; Ma, Yanjun; Way, Andy

doi:10.1007/s10590-012-9123-3

What types of word alignment improve statistical machine translation?

Published: 10 March 2012

Volume 26, pages 289–323, (2012)
Cite this article

Machine Translation

Patrik Lambert¹,
Simon Petitrenaud¹,
Yanjun Ma² &
…
Andy Way³

549 Accesses
4 Citations
Explore all metrics

Abstract

In most statistical machine translation (SMT) systems, bilingual segments are extracted via word alignment. However, there is a need for systematic study as to what alignment characteristics can benefit MT under specific experimental settings such as the type of MT system, the language pair or the type or size of the corpus. In this paper we perform, in each of these experimental settings, a statistical analysis of the data and study the sample correlation coefficients between a number of alignment or phrase table characteristics and variables such as the phrase table size, the number of untranslated words or the BLEU score. We report results for two different SMT systems (a phrase-based and an n-gram-based system) on Chinese-to-English FBIS and BTEC data, and Spanish-to-English European Parliament data. We find that the alignment characteristics which help in translation greatly depend on the MT system and on the corpus size. We give alignment hints to improve BLEU score, depending on the SMT system used and the type of corpus. For example, for phrase-based SMT, dense alignments are required with larger corpora, especially on the target side, while with smaller corpora, more precise, sparser alignments are better, especially on the source side. Avoiding some long-distance crossing links may also improve BLEU score with small corpora. We take these conclusions into account to modify two types of alignment systems, and get 1 to 1.6 % relative improvements in BLEU score on two held-out corpora, although the improved system is different in each corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Machine translation systems and quality assessment: a systematic review

Article Open access 10 April 2021

Assessing gender bias in machine translation: a case study with Google Translate

Article 27 March 2019

References

Ayan NF, Dorr BJ (2006) Going beyond AER: an extensive analysis of word alignments and their impact on MT. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. Sydney, Australia, pp 9–16
Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311
Google Scholar
Chen B, Federico M (2006) Improving phrase-based statistical translation through combination of word alignment. In: Proceedings of FinTAL—5th international conference on natural language processing. Turku, Finland, pp 356–367
Clark JH, Dyer C, Lavie A, Smith NA (2011) Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In: Proceedings of the 49th annual meeting of the association for computational linguistics. Portland, Oregon, USA, pp 176–181
Crego JM, Mariño JB (2007) Improving SMT by coupling reordering and decoding. Mach Trans 20(3): 199–215
Article Google Scholar
DeNero J, Klein D (2007) Tailoring word alignments to syntactic machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics. Prague, Czech Republic, pp 17–24
Fraser A, Marcu D (2007) Measuring word alignment quality for statistical machine translation. Comput Linguist 33(3): 293–303
Article MathSciNet MATH Google Scholar
Guzman F, Gao Q, Vogel S (2009) Reassessment of the role of phrase extraction in PBSMT. In: Proceedings of machine translation summit XII. Ottawa, Canada, pp 49–56
Hollander M, Wolfe D (1973) Nonparametric statistical methods. Wiley, New York
MATH Google Scholar
Jolliffe IT (2002) Principal component analysis. Springer, New York
MATH Google Scholar
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the human language technology conference of the NAACL. Edmonton, Canada, pp 48–54
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics (demo and poster sessions). Association for Computational Linguistics, Prague, Czech Republic, pp 177–180
Lambert P, Banchs RE (2006) Tuning machine translation parameters with SPSA. In: Proceedings of the international workshop on spoken language translation, IWSLT’06. Kyoto, Japan, pp 190–196
Lambert P, Banchs RE (2011) BIA: a discriminative phrase alignment toolkit. Prague Bulletin of Mathematical Linguistics 97
Lambert P, de Gispert A, Banchs RE, Mariño JB (2005) Guidelines for word alignment evaluation and manual alignment. Lang Resour Eval 39(4): 267–285
Article Google Scholar
Lambert P, Banchs RE, Crego JM (2007) Discriminative alignment training without annotated data for machine translation. In: Proceedings of the human language technology conference of the NAACL (short papers). Rochester, NY, USA, pp 85–88
Lambert P, Ma Y, Ozdowska S, Way A (2009) Tracking relevant alignment characteristics for machine translation. In: Proceedings of machine translation summit XII. Ottawa, Canada, pp 268–275
Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Proceedings of the human language technology conference of the NAACL. New York City, USA, pp 104–111
Liu Y, Liu Q, Lin S (2010) Discriminative word alignment by linear modeling. Comput Linguist 36(3): 303–339
Article Google Scholar
Mariño JB, Banchs RE, Crego JM, de Gispert A, Lambert P, Fonollosa JA, Costa-jussá MR (2006) N-gram based machine translation. Comput Linguist 32(4): 527–549
Article MathSciNet MATH Google Scholar
Melamed ID (2000) Models of translational equivalence among words. Comput Linguist 26(2): 221–249
Article Google Scholar
Moore RC (2005) A discriminative framework for bilingual word alignment. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing. Vancouver, Canada, pp 81–88
Näther W (2001) Random fuzzy variable of second order and applications to statistical inference. Inform Sci 133: 69–88
Article MathSciNet MATH Google Scholar
Nelder J, Mead R (1965) A simplex method for function minimization. Comput J 7: 308–313
Article MATH Google Scholar
Och F, Ney H (2004) The alignment template approach to statistical machine translation. Comput Linguist 30(4): 417–449
Article MATH Google Scholar
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41th annual meeting of the association for computational linguistics, pp 160–167
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Article MATH Google Scholar
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics. Philadelphia, USA, pp 311–318
Rodgers JL, Nicewander WA (1988) Thirteen ways to look at the correlation coefficient. Am Stat 42(1): 59–66
Article Google Scholar
Spall JC (1992) Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans Automat Control 37: 332–341
Article MathSciNet MATH Google Scholar
Spall JC (1998) An overview of the simultaneous perturbation method for efficient optimization. Johns Hopkins APL Techn Digest 19(4): 482–492
Google Scholar
Stephens MA (1974) EDF statistics for goodness of fit and some comparisons. J Am Stat Assoc 69: 730–737
Article Google Scholar
Takezawa T, Sumita E, Sugaya F, Yamamoto H, Yamamoto S (2002) Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In: Proceedings of third international conference on language resources and evaluation 2002. Las Palmas, Canary Islands, Spain, pp 147–152
Vilar D, Popovic M, Ney H (2006) AER: do we need to “improve” our alignments? In: Proceedings of the international workshop on spoken language translation, IWSLT’06. Kyoto, Japan, pp 205–212

Download references

Author information

Authors and Affiliations

LIUM, LUNAM Université, Univeristy of Le Mans, Avenue Laënnec, 72085, Le Mans Cedex 9, France
Patrik Lambert & Simon Petitrenaud
Baidu Inc., Beijing, China
Yanjun Ma
Applied Language Solutions, Delph, UK
Andy Way

Authors

Patrik Lambert
View author publications
You can also search for this author in PubMed Google Scholar
Simon Petitrenaud
View author publications
You can also search for this author in PubMed Google Scholar
Yanjun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Andy Way
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrik Lambert.

Additional information

P. Lambert, Y. Ma and A. Way–Work partially done while at CNGL, Dublin City University, Ireland.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lambert, P., Petitrenaud, S., Ma, Y. et al. What types of word alignment improve statistical machine translation?. Machine Translation 26, 289–323 (2012). https://doi.org/10.1007/s10590-012-9123-3

Download citation

Received: 08 April 2010
Accepted: 19 January 2012
Published: 10 March 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10590-012-9123-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What types of word alignment improve statistical machine translation?

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Machine translation systems and quality assessment: a systematic review

Assessing gender bias in machine translation: a case study with Google Translate

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

What types of word alignment improve statistical machine translation?

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Machine translation systems and quality assessment: a systematic review

Assessing gender bias in machine translation: a case study with Google Translate

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation