Guidelines for Word Alignment Evaluation and Manual Alignment

Lambert, Patrik; De Gispert, Adrià; Banchs, Rafael; Mariño, José B.

doi:10.1007/s10579-005-4822-5

Guidelines for Word Alignment Evaluation and Manual Alignment

Published: 12 June 2006

Volume 39, pages 267–285, (2005)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Patrik Lambert¹,
Adrià De Gispert¹,
Rafael Banchs¹ &
…
José B. Mariño¹

408 Accesses
17 Citations
Explore all metrics

Abstract

The purpose of this paper is to provide guidelines for building a word alignment evaluation scheme. The notion of word alignment quality depends on the application: here we review standard scoring metrics for full text alignment and give explanations on how to use them better. We discuss strategies to build a reference corpus, and show that the ratio between ambiguous and unambiguous links in the reference has a great impact on scores measured with these metrics. In particular, automatically computed alignments with higher precision or higher recall can be favoured depending on the value of this ratio. Finally, we suggest a strategy to build a reference corpus particularly adapted to applications where recall plays a significant role, like in machine translation. The manually aligned corpus we built for the Spanish-English European Parliament corpus is also described. This corpus is freely available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

word.alignment: an R package for computing statistical word alignment and its evaluation

Article 23 March 2020

Efficient document alignment across scenarios

Article 24 April 2019

Cross-Language Comparability and Its Applications for MT

References

Ahrenberg L., Merkel M., Hein A.S., Tiedemann J. (2000) In: Proc. of the 2nd International Conference on Linguistic Resources and Evaluation (LREC). Athens, Greece, Vol. III: pp. 1255–1261.
Brown P., Della Pietra S., Della Pietra V., Mercer R. (1993) The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2), pp. 263–311.
Google Scholar
Crego J.M., Mariño J., de Gispert A. (2004) Finite-state-based and Phrase-based Statistical Machine Translation. Proc. of the 8th Int. Conf. on Spoken Language Processing, ICSLP’04 pp. 37–40.
David Yarowsky G.N., Wicentowski, R. (2001) Inducing Multilingual Text Analysis Tools via Robust Projection Across Aligned Corpora. In: Proc. of the 1st International Conference on Human Language Technology Research (HLT), pp. 161–168.
de Gispert A., Mariño J., Crego J.M. (2004) Phrase-based Alignment Combining Corpus Cooccurrences and Linguistic Knowledge. Proc. of the Int. Workshop on Spoken Language Translation, IWSLT’04, pp. 107–114.
Diab M., Resnik P. (2002) An Unsupervised Method for Word Sense Tagging Using Parallel Corpora. In: Proc. of the Annual Meeting of the Association for Computational Linguistics. Philadelphia, PA, pp. 255–262.
Kuhn J. (2004) Experiments in Parallel-Text Based Grammar Induction. In: Proc. of the 42th Annual Meeting of the Association for Computational Linguistics. Barcelona, Spain, pp. 470–477.
Lambert P. (2004) The Alignment Set Toolkit. http://gps-tsc.upc.es/veu/personal/lambert/software/AlignmentSet.html.
Martin J., Mihalcea R., Pedersen T. (2005) Word Alignment for Languages with Scarce Resources. In: Proceedings of the ACL Workshop on Building and Using Parallel Texts. Ann Arbor, Michigan.
Melamed I.D. (1998a) Annotation Style Guide for the Blinker Project. Technical Report 98-06, IRCS.
Melamed I.D. (1998b) Manual Annotation of Translational Equivalence. Technical Report 98-07, IRCS.
Mihalcea R. and Pedersen T. (2003). An Evaluation Exercise for Word Alignment. In: Mihalcea, R. and Pedersen, T. (eds) HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, pp 1–10. Edmonton, Alberta, Canada, Association for Computational Linguistics
Chapter Google Scholar
Och F. and Ney H. (2003). A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1): 19–51
Article Google Scholar
Och F. and Ney H. (2004). The Alignment Template Approach to Statistical Machine Translation. Computational Linguistics 30(4): 417–449
Article Google Scholar
Och F.J., Ney H. (2000a) A Comparison of Alignment Models for Statistical Machine Translation. In: Proc. of the 18th Int. Conf. on Computational Linguistics. Saarbrucken,Germany, pp. 1086–1090.
Och F.J., Ney H. (2000b) Improved Statistical Alignment Models. In: Proc. of the 38th Annual Meeting of the Association for Computational Linguistics. Hongkong, China, pp. 440–447
Pedersen T., Rassier B. (2003) Aligner for Parallel Corpora. http://www.d.umn.edu/^∼tpederse/parallel.html.
Ribeiro A., Lopes G. and Mexia J. (2001). Extracting Translation Equivalents from Portuguese–Chinese Parallel Texts. Journal of Studies in Lexicography 11(1): 118–194
Google Scholar
Smadja F.A., McKeown K.R. and Hatzivassiloglou V. (1996). Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics 22(1): 1–38
Google Scholar
(1979). Information Retrieval. Butterworths, London
Google Scholar
Véronis J. (2000) Evaluation of Parallel Text Alignment Systems: The ARCADE Project. In: Parallel Text Processing: Alignment and Use of Translation Corpora. Kluwer Academic Publishers, pp. 369–388.
Wang Y.-Y., Waibel A. (1998) Modeling with Structures in Statistical Machine Translation. In: Proc. of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. Montreal, Canada, pp. 1357–1363.

Download references

Author information

Authors and Affiliations

TALP Research Centre, Jordi Girona Salgado, 1-3, 08034, Barcelona, Spain
Patrik Lambert, Adrià De Gispert, Rafael Banchs & José B. Mariño

Authors

Patrik Lambert
View author publications
You can also search for this author in PubMed Google Scholar
Adrià De Gispert
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Banchs
View author publications
You can also search for this author in PubMed Google Scholar
José B. Mariño
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrik Lambert.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lambert, P., De Gispert, A., Banchs, R. et al. Guidelines for Word Alignment Evaluation and Manual Alignment. Lang Resources & Evaluation 39, 267–285 (2005). https://doi.org/10.1007/s10579-005-4822-5

Download citation

Published: 12 June 2006
Issue Date: December 2005
DOI: https://doi.org/10.1007/s10579-005-4822-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Guidelines for Word Alignment Evaluation and Manual Alignment

Abstract

Access this article

Similar content being viewed by others

word.alignment: an R package for computing statistical word alignment and its evaluation

Efficient document alignment across scenarios

Cross-Language Comparability and Its Applications for MT

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Guidelines for Word Alignment Evaluation and Manual Alignment

Abstract

Access this article

Similar content being viewed by others

word.alignment: an R package for computing statistical word alignment and its evaluation

Efficient document alignment across scenarios

Cross-Language Comparability and Its Applications for MT

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation