Skip to main content

Multi-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3265))

Abstract

An adaptable statistical or hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches provide a reasonable initial estimate for word alignment. However, they cannot handle certain types of linguistic phenomena such as long-distance dependencies and structural differences between languages. We address this issue in Multi-Align, a new framework for incremental testing of different alignment algorithms and their combinations. Our design allows users to tune their systems to the properties of a particular genre/domain while still benefiting from general linguistic knowledge associated with a language pair. We demonstrate that a combination of statistical and linguistically-informed alignments can resolve translation divergences during the alignment process.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, P.F., Della-Pietra, S.A., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)

    Google Scholar 

  2. Cherry, C., Lin, D.: A Probability Model to Improve Word Alignment. In: ACL 2003, pp. 88–95 (2003)

    Google Scholar 

  3. Collins, M.: Three Generative Lexicalized Models for Statistical Parsing. In: Proceedings of ACL 1997, Madrid, Spain (1997)

    Google Scholar 

  4. Diab, M., Resnik, P.: An Unsupervised Method for Word Sense Tagging Using Parallel Corpora. In: Proceedings of ACL 2002, Philadelphia, PA (2002)

    Google Scholar 

  5. Dorr, B.J., Pearl, L., Hwa, R., Habash, N.: DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word–Level Alignment. In: Proceedings of AMTA 2002, Tiburon, CA (2002)

    Google Scholar 

  6. Eisner, J.: Learning Non-isomorphic Tree Mappings for Machine Translation. In: Proceedings of ACL 2003, Supporo, Japan (July 2003)

    Google Scholar 

  7. Goodman, J.: Parsing Algorithm and Metrics. In: Proceedings of ACL 1996, Santa Cruz, CA (1996)

    Google Scholar 

  8. Habash, N.: Generation Heavy Hybrid Machine Translation. In: Proceedings of INLG 2002, New York, NY (2002)

    Google Scholar 

  9. Habash, N., Dorr, B.J.: A Categorial Variation Database for English. In: Proceedings of NAACL/HLT 2003, pp. 96–102, Edmonton, Canada (2003)

    Google Scholar 

  10. Hwa, R., Resnik, P., Weinberg, A., Kolak, O.: Evaluating Translational Correspondence Using Annotation Projection. In: Proceedings of ACL 2002, pp. 392–399, Philadelphia, PA (2002)

    Google Scholar 

  11. Ker, S.J., Chang, J.S.: A Class-based Approach to Word Alignment. Computational Linguistics 23(2), 313–343 (1997)

    Google Scholar 

  12. Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation. In: Proceedings of NAACL/HLT 2003, Edmonton, Canada (2003)

    Google Scholar 

  13. Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993)

    Google Scholar 

  14. Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)

    Google Scholar 

  15. Melamed, D.: Models of Translational Equivalence Among Words. Computational Linguistics 26(2), 221–249 (2000)

    Article  Google Scholar 

  16. Och, F.J.: Giza++: Training of Statistical Translation Models. Technical report, RWTH Aachen, University of Technology (2000)

    Google Scholar 

  17. Och, F.J.: H. Ney Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. In: Proceedings of ACL 2002, pp. 295–302, Philadelphia, PA (2002)

    Google Scholar 

  18. Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 9–51 (2003)

    Article  Google Scholar 

  19. Och, F.J.: H. Weber Improving Statistical Natural Language Translation with Categories and Rules. In: Proceedings of ACL/COLING 1998, pp. 985–989, Montreal, Canada (1998)

    Google Scholar 

  20. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of ACL 2002, pp. 311–318, Philadelphia, PA (2002)

    Google Scholar 

  21. Yamada, K., Knight, K.: A Syntax-Based Statistical Translation Model. In: Proceedings of ACL 2001, Toulouse, France (2001)

    Google Scholar 

  22. Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing Multilingual Text Analysis Tools via Robust Projection Across Aligned Corpora. In: Proceedings of HLT 2001, San Diego, CA, pp. 109–116 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ayan, N.F., Dorr, B.J., Habash, N. (2004). Multi-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30194-3_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23300-8

  • Online ISBN: 978-3-540-30194-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics