Abstract
An adaptable statistical or hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches provide a reasonable initial estimate for word alignment. However, they cannot handle certain types of linguistic phenomena such as long-distance dependencies and structural differences between languages. We address this issue in Multi-Align, a new framework for incremental testing of different alignment algorithms and their combinations. Our design allows users to tune their systems to the properties of a particular genre/domain while still benefiting from general linguistic knowledge associated with a language pair. We demonstrate that a combination of statistical and linguistically-informed alignments can resolve translation divergences during the alignment process.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brown, P.F., Della-Pietra, S.A., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)
Cherry, C., Lin, D.: A Probability Model to Improve Word Alignment. In: ACL 2003, pp. 88–95 (2003)
Collins, M.: Three Generative Lexicalized Models for Statistical Parsing. In: Proceedings of ACL 1997, Madrid, Spain (1997)
Diab, M., Resnik, P.: An Unsupervised Method for Word Sense Tagging Using Parallel Corpora. In: Proceedings of ACL 2002, Philadelphia, PA (2002)
Dorr, B.J., Pearl, L., Hwa, R., Habash, N.: DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word–Level Alignment. In: Proceedings of AMTA 2002, Tiburon, CA (2002)
Eisner, J.: Learning Non-isomorphic Tree Mappings for Machine Translation. In: Proceedings of ACL 2003, Supporo, Japan (July 2003)
Goodman, J.: Parsing Algorithm and Metrics. In: Proceedings of ACL 1996, Santa Cruz, CA (1996)
Habash, N.: Generation Heavy Hybrid Machine Translation. In: Proceedings of INLG 2002, New York, NY (2002)
Habash, N., Dorr, B.J.: A Categorial Variation Database for English. In: Proceedings of NAACL/HLT 2003, pp. 96–102, Edmonton, Canada (2003)
Hwa, R., Resnik, P., Weinberg, A., Kolak, O.: Evaluating Translational Correspondence Using Annotation Projection. In: Proceedings of ACL 2002, pp. 392–399, Philadelphia, PA (2002)
Ker, S.J., Chang, J.S.: A Class-based Approach to Word Alignment. Computational Linguistics 23(2), 313–343 (1997)
Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation. In: Proceedings of NAACL/HLT 2003, Edmonton, Canada (2003)
Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993)
Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Melamed, D.: Models of Translational Equivalence Among Words. Computational Linguistics 26(2), 221–249 (2000)
Och, F.J.: Giza++: Training of Statistical Translation Models. Technical report, RWTH Aachen, University of Technology (2000)
Och, F.J.: H. Ney Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. In: Proceedings of ACL 2002, pp. 295–302, Philadelphia, PA (2002)
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 9–51 (2003)
Och, F.J.: H. Weber Improving Statistical Natural Language Translation with Categories and Rules. In: Proceedings of ACL/COLING 1998, pp. 985–989, Montreal, Canada (1998)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of ACL 2002, pp. 311–318, Philadelphia, PA (2002)
Yamada, K., Knight, K.: A Syntax-Based Statistical Translation Model. In: Proceedings of ACL 2001, Toulouse, France (2001)
Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing Multilingual Text Analysis Tools via Robust Projection Across Aligned Corpora. In: Proceedings of HLT 2001, San Diego, CA, pp. 109–116 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ayan, N.F., Dorr, B.J., Habash, N. (2004). Multi-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-30194-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23300-8
Online ISBN: 978-3-540-30194-3
eBook Packages: Springer Book Archive