DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment

Dorr, Bonnie J.; Pearl, Lisa; Hwa, Rebecca; Habash, Nizar

doi:10.1007/3-540-45820-4_4

Bonnie J. Dorr²,
Lisa Pearl²,
Rebecca Hwa² &
…
Nizar Habash²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2499))

Included in the following conference series:

Conference of the Association for Machine Translation in the Americas

676 Accesses

Abstract

The frequent occurrence of divergenceS—structural differences between languages—presents a great challenge for statistical word-level alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Dependency Graphs and TEITOK: Exploiting Dependency Parsing

Language Comparison via Network Topology

Third Approach: Dependency Trees

References

Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J., Melamed, I.D., Och, F.J., Purdy, D., Smith, N.A., Yarowsky, D.: Statistical machine translation: Final report. In: Proceedings of the Summer Workshop on Language Engineering. John Hopkins University Center for Language and Speech Processing (1999)
Google Scholar
Alshawi, H., Douglas, S.: Learning Dependency Transduction Models from Unannotated Examples. Philosophical Transactions, Series A: Mathematical, Physical and Engineering Sciences (2000)
Google Scholar
Alshawi, H., Bangalore, S., Douglas, S.: Learning Dependency Translation Models as Collections of Finite State Head Transducers. Computational Linguistics. Vol. 26 (2000)
Google Scholar
Brown, P.F., Cocke, J., Della-Pietra, S., Della-Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A Statistical Approach to Machine Translation. Computational Linguistics. Vol. 16(2) (1990) 79–85
Google Scholar
Brown, P.F., Della-Pietra, S.A., Della-Pietra, V.J., Mercer, R.L.: The Mathematics of Machine Translation: Parameter Estimation. Computational Linguistics. (1993)
Google Scholar
Dorr, B.J., Pearl, L., Hwa, R., Habash, N.: Improved Word-Level Alignment: Injecting Knowledge about MT Divergences. University of Maryland Technical Report LAMP-TR-082, CS-TR-4333, UMIACS-TR-2002-15 College Park, MD. (2002)
Google Scholar
Fellbaum, C., Palmer, M., Dang, H.T., Delfs, L., Wolff, S.: Manual and Automatic Semantic Annotation with WordNet. In: Proceedings of the NAACL Workshop on WordNet and Other Lexical Resources: Applications, Customizations. Carnegie Mellon University. Pittsburg, PA (2001)
Google Scholar
Habash, N., Dorr, B.J.: Generation-Heavy Machine Translation. In: Proceedings of the Fifth Conference of the Association for Machine Translation in the Americas, AMTA-2002 (this volume). Tiburon, CA. (2002)
Google Scholar
Han, C.-H., Lavoie, B., Palmer, M., Rambow, O., Kittredge, R., Korelsky, T., Kim, N., Kim, M.: Handling Structural Divergences and Recovering Dropped Arguments in a Korean/English Machine Translation System. In: Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas, AMTA-2000. Cuernavaca, Mexico (2000)
Google Scholar
Hermjakob, U., Mooney, R.J.: Learning Parse and Translation Decisions from Examples with Rich Context. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics. (1997) 482–489
Google Scholar
Hwa, R.: Sample selection for statistical grammar induction. In: Proceedings of the 2000 Joint SIGDAT Conference on EMNLP and VLC. Hong Kong, China (2000) 45–52
Google Scholar
Hwa, R., Resnik, P., Weinberg, A., Kolak, O.: Evaluating Translational Correspondence Using Annotation Projection. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, PA (2002)
Google Scholar
Lavoie, B., Kittredge, R., Korelsky, T., Rambow, O.: A Framework for MT and Multilingual NLG Systems Based on Uniform Lexico-Structural Processing. In: Proceedings of the 1st Annual North American Association of Computational Linguistics, ANLP/NAACL-2000. Seattle, WA (2000)
Google Scholar
Lavoie, B., White, M., Korelsky, T.: Inducing Lexico-Structural Transfer Rules from Parsed Bi-texts. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics-DDMT Workshop. Toulouse, France (2001)
Google Scholar
Lin, D.: Government-Binding Theory and Principle-Based Parsing. University of Maryland Technical Report. Submitted to Computational Linguistics. University of Maryland (1995)
Google Scholar
Lin, D.: Dependency-Based Evaluation of MINIPAR. In: Proceedings of the Workshop on the Evaluation of Parsing Systems, First International Conference on Language Resources and Evaluation. Granada, Spain (1998)
Google Scholar
Melamed, I.D.: Empirical Methods for MT Lexicon Development. In: Proceedings of the Third Conference of the Association for Machine Translation in the Americas, AMTA-98. Langhorne, PA (1998)
Google Scholar
Menezes, A., Richardson, S.D.: A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics-DDMT Workshop. Toulouse, France (2001)
Google Scholar
Meyers, A., Kosaka, M., Grishman, R.: Chart-Based Transfer Rule Application in Machine Translation. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING-2000). Saarbrüken, Germany (2000)
Google Scholar
Och, F.J., Ney, H.: Improved Statistical Alignment Models. In: Proceedings of the 38th Annual Conference of the Association for Computational Linguistics. Hongkong, China (2000) 440–447
Google Scholar
Slobin, D.I.: Two Ways to Travel: Verbs of Motion in English and Spanish. In: Shibatani, M., Thompson, S.A. (eds.): Grammatical Constructions: Their Form and Meaning. Oxford University Press, New York (1996) 195–219
Google Scholar
Watanabe, H., Kurohashi, S., Aramaki, E.: Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Transaltion. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING-2000). Saarbrüken, Germany (2000)
Google Scholar
Wu, D.: Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora. Computational Linguistics. Vol. 23(3) (1997) 377–400
Google Scholar
Xia, F., Palmer, M., Xue, N., Okurowski, M.E., Kovarik, J., Huang, S., Kroch, T., Marcus, M.: Developing Guidelines and Ensuring Consistency for Chinese Text Annotation. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC-2000). Athens, Greece (2000)
Google Scholar
Yamada, K., Knight, K.: A Syntax-Based Statistical Translation Model. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. Toulouse, France (2001) 523–529
Google Scholar
Yarowsky, D., Ngai, G.: Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection across Aligned Corpora. In: Proceedings of NAACL-2001. Pittsburgh, PA (2001) 200–207
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Advanced Computer Studies, University of Maryland, College Park, MD, 20740
Bonnie J. Dorr, Lisa Pearl, Rebecca Hwa & Nizar Habash

Authors

Bonnie J. Dorr
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Pearl
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca Hwa
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Habash
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research, 1 Microsoft Way, Redmond, WA, 98052, USA
Stephen D. Richardson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dorr, B.J., Pearl, L., Hwa, R., Habash, N. (2002). DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment. In: Richardson, S.D. (eds) Machine Translation: From Research to Real Users. AMTA 2002. Lecture Notes in Computer Science(), vol 2499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45820-4_4

Download citation

DOI: https://doi.org/10.1007/3-540-45820-4_4
Published: 20 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44282-0
Online ISBN: 978-3-540-45820-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment

Abstract

Access this chapter

Preview

Similar content being viewed by others

Dependency Graphs and TEITOK: Exploiting Dependency Parsing

Language Comparison via Network Topology

Third Approach: Dependency Trees

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment

Abstract

Access this chapter

Preview

Similar content being viewed by others

Dependency Graphs and TEITOK: Exploiting Dependency Parsing

Language Comparison via Network Topology

Third Approach: Dependency Trees

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation