Abstract
As a part of work on alignment of the English and Korean parallel corpus, this paper presents a statistical translation model incorporating linguistic knowledge of syntactic and phrasal information for better translations. For this, we propose three models: First, we incorporate syntactic information such as part of speech into the word-based lexical alignment. Based on this model, we propose the second model which finds phrasal correspondence in the parallel corpus. Phrasal mapping through chunk-based shallow parsing enables to settle mismatch of meaningful units in the two languages. Lastly, we develop a two-level alignment model by combining these two models in order to construct both the word and phrase-based translation model. Model parameters are automatically estimated from a set of bilingual sentence pairs by applying the EM algorithm. Experiments show that the structural relationship helps construct a better translation model for structurally different languages like Korean and English.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Marcu, D., Wong, W.: A phrase-based, joint probability model for statistical machine translation. In: Proceedings of EMNLP 2002 (2002)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. The Royal Statistics Society 39(B), 205–237 (1976)
Melamed, I.D.: A word-to-word model of translation equivalence. In: Proceedings of ACL 35/EACL, vol. 8, pp. 16–23 (1997)
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of ACL (2000)
Papageorgiou, H., Cranias, L., Piperidis, S.: Automatic alignment in parallel corpora. In: Proceedings of ACL 32 (Student Session) (1994)
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the 4th Workshop on Very Large Corpora, pp. 82–94 (1995)
Venugopal, A., Vogel, S., Waibel, A.: Effective Phrase Translation Exctraction from Alignment Models. In: Proceedings of ACL 2003 (2003)
Wang, W., Zhou, M., Huang, J.-X., Huang, C.-N.: Structure Alignment Using Bilingual Chunking. In: Proceedings of COLING 2002 (2002)
Wang, Y.-Y., Waibel, A.: Modeling with structures in machine translation. In: Proceedings of ACL 36/COLING
Watanabe, T., Sumita, E., Okuno, G.H.: Chunk-based Statistical Translation. In: Proceedings of ACL 2003 (2003)
Wu, D.: Stochastic Inversion Transduction Grammar and Bilingual Parsing of Parallel Corpora. Computational Linguistics 23(3), 377–403
Yamada, K., Knight, K.: A Syntax-based statistical translation model. In: Proceedings of ACL 2001 (2001)
Yamamoto, K., Matsumoto, Y.: Acquisition of Phrase-level Bilingual Correspondence using Dependency Structure. In: Proceedings of COLING 2000 (2000)
Yoon, J., Choi, K.-S., Song, M.: Three Types of Chunking in Korean and Dependency Analysis Based on Lexical Association. In: Proceedings of International Conference on Computer Processing of Oriental Languages (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, S., Yoon, J., Ra, DY. (2004). Two-Level Alignment by Words and Phrases Based on Syntactic Information. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-24630-5_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive