Two-Level Alignment by Words and Phrases Based on Syntactic Information

Kim, Seonho; Yoon, Juntae; Ra, Dong-Yul

doi:10.1007/978-3-540-24630-5_38

Two-Level Alignment by Words and Phrases Based on Syntactic Information

Seonho Kim⁵,
Juntae Yoon⁶ &
Dong-Yul Ra⁷

Conference paper

953 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

Abstract

As a part of work on alignment of the English and Korean parallel corpus, this paper presents a statistical translation model incorporating linguistic knowledge of syntactic and phrasal information for better translations. For this, we propose three models: First, we incorporate syntactic information such as part of speech into the word-based lexical alignment. Based on this model, we propose the second model which finds phrasal correspondence in the parallel corpus. Phrasal mapping through chunk-based shallow parsing enables to settle mismatch of meaningful units in the two languages. Lastly, we develop a two-level alignment model by combining these two models in order to construct both the word and phrase-based translation model. Model parameters are automatically estimated from a set of bilingual sentence pairs by applying the EM algorithm. Experiments show that the structural relationship helps construct a better translation model for structurally different languages like Korean and English.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Google Scholar
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Marcu, D., Wong, W.: A phrase-based, joint probability model for statistical machine translation. In: Proceedings of EMNLP 2002 (2002)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. The Royal Statistics Society 39(B), 205–237 (1976)
Google Scholar
Melamed, I.D.: A word-to-word model of translation equivalence. In: Proceedings of ACL 35/EACL, vol. 8, pp. 16–23 (1997)
Google Scholar
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of ACL (2000)
Google Scholar
Papageorgiou, H., Cranias, L., Piperidis, S.: Automatic alignment in parallel corpora. In: Proceedings of ACL 32 (Student Session) (1994)
Google Scholar
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the 4th Workshop on Very Large Corpora, pp. 82–94 (1995)
Google Scholar
Venugopal, A., Vogel, S., Waibel, A.: Effective Phrase Translation Exctraction from Alignment Models. In: Proceedings of ACL 2003 (2003)
Google Scholar
Wang, W., Zhou, M., Huang, J.-X., Huang, C.-N.: Structure Alignment Using Bilingual Chunking. In: Proceedings of COLING 2002 (2002)
Google Scholar
Wang, Y.-Y., Waibel, A.: Modeling with structures in machine translation. In: Proceedings of ACL 36/COLING
Google Scholar
Watanabe, T., Sumita, E., Okuno, G.H.: Chunk-based Statistical Translation. In: Proceedings of ACL 2003 (2003)
Google Scholar
Wu, D.: Stochastic Inversion Transduction Grammar and Bilingual Parsing of Parallel Corpora. Computational Linguistics 23(3), 377–403
Google Scholar
Yamada, K., Knight, K.: A Syntax-based statistical translation model. In: Proceedings of ACL 2001 (2001)
Google Scholar
Yamamoto, K., Matsumoto, Y.: Acquisition of Phrase-level Bilingual Correspondence using Dependency Structure. In: Proceedings of COLING 2000 (2000)
Google Scholar
Yoon, J., Choi, K.-S., Song, M.: Three Types of Chunking in Korean and Dependency Analysis Based on Lexical Association. In: Proceedings of International Conference on Computer Processing of Oriental Languages (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Language and Information Studies, Yonsei University, Seoul, Korea
Seonho Kim
NLP Lab., Daumsoft, Seoul, Korea
Juntae Yoon
Dept. of Computer Science, Yonsei University, Korea
Dong-Yul Ra

Authors

Seonho Kim
View author publications
You can also search for this author in PubMed Google Scholar
Juntae Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Yul Ra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, S., Yoon, J., Ra, DY. (2004). Two-Level Alignment by Words and Phrases Based on Syntactic Information. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_38

Download citation

DOI: https://doi.org/10.1007/978-3-540-24630-5_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics