Multi-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT

Ayan, Necip Fazil; Dorr, Bonnie J.; Habash, Nizar

doi:10.1007/978-3-540-30194-3_3

Multi-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT

Necip Fazil Ayan²⁰,
Bonnie J. Dorr²⁰ &
Nizar Habash²¹

Conference paper

1130 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3265))

Abstract

An adaptable statistical or hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches provide a reasonable initial estimate for word alignment. However, they cannot handle certain types of linguistic phenomena such as long-distance dependencies and structural differences between languages. We address this issue in Multi-Align, a new framework for incremental testing of different alignment algorithms and their combinations. Our design allows users to tune their systems to the properties of a particular genre/domain while still benefiting from general linguistic knowledge associated with a language pair. We demonstrate that a combination of statistical and linguistically-informed alignments can resolve translation divergences during the alignment process.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brown, P.F., Della-Pietra, S.A., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Cherry, C., Lin, D.: A Probability Model to Improve Word Alignment. In: ACL 2003, pp. 88–95 (2003)
Google Scholar
Collins, M.: Three Generative Lexicalized Models for Statistical Parsing. In: Proceedings of ACL 1997, Madrid, Spain (1997)
Google Scholar
Diab, M., Resnik, P.: An Unsupervised Method for Word Sense Tagging Using Parallel Corpora. In: Proceedings of ACL 2002, Philadelphia, PA (2002)
Google Scholar
Dorr, B.J., Pearl, L., Hwa, R., Habash, N.: DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word–Level Alignment. In: Proceedings of AMTA 2002, Tiburon, CA (2002)
Google Scholar
Eisner, J.: Learning Non-isomorphic Tree Mappings for Machine Translation. In: Proceedings of ACL 2003, Supporo, Japan (July 2003)
Google Scholar
Goodman, J.: Parsing Algorithm and Metrics. In: Proceedings of ACL 1996, Santa Cruz, CA (1996)
Google Scholar
Habash, N.: Generation Heavy Hybrid Machine Translation. In: Proceedings of INLG 2002, New York, NY (2002)
Google Scholar
Habash, N., Dorr, B.J.: A Categorial Variation Database for English. In: Proceedings of NAACL/HLT 2003, pp. 96–102, Edmonton, Canada (2003)
Google Scholar
Hwa, R., Resnik, P., Weinberg, A., Kolak, O.: Evaluating Translational Correspondence Using Annotation Projection. In: Proceedings of ACL 2002, pp. 392–399, Philadelphia, PA (2002)
Google Scholar
Ker, S.J., Chang, J.S.: A Class-based Approach to Word Alignment. Computational Linguistics 23(2), 313–343 (1997)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation. In: Proceedings of NAACL/HLT 2003, Edmonton, Canada (2003)
Google Scholar
Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993)
Google Scholar
Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Google Scholar
Melamed, D.: Models of Translational Equivalence Among Words. Computational Linguistics 26(2), 221–249 (2000)
Article Google Scholar
Och, F.J.: Giza++: Training of Statistical Translation Models. Technical report, RWTH Aachen, University of Technology (2000)
Google Scholar
Och, F.J.: H. Ney Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. In: Proceedings of ACL 2002, pp. 295–302, Philadelphia, PA (2002)
Google Scholar
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 9–51 (2003)
Article Google Scholar
Och, F.J.: H. Weber Improving Statistical Natural Language Translation with Categories and Rules. In: Proceedings of ACL/COLING 1998, pp. 985–989, Montreal, Canada (1998)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of ACL 2002, pp. 311–318, Philadelphia, PA (2002)
Google Scholar
Yamada, K., Knight, K.: A Syntax-Based Statistical Translation Model. In: Proceedings of ACL 2001, Toulouse, France (2001)
Google Scholar
Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing Multilingual Text Analysis Tools via Robust Projection Across Aligned Corpora. In: Proceedings of HLT 2001, San Diego, CA, pp. 109–116 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Advanced Computer Studies, University of Maryland, College Park, MD, 20742, USA
Necip Fazil Ayan & Bonnie J. Dorr
Department of Computer Science, Columbia University, New York, NY, 10027, USA
Nizar Habash

Authors

Necip Fazil Ayan
View author publications
You can also search for this author in PubMed Google Scholar
Bonnie J. Dorr
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Habash
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, 15213, Pittsburgh, PA, USA
Robert E. Frederking
Intelligence Technology Innovation Center, 20505, Washington, D.C., USA
Kathryn B. Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ayan, N.F., Dorr, B.J., Habash, N. (2004). Multi-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-30194-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23300-8
Online ISBN: 978-3-540-30194-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics