Skip to main content

Two-Level Alignment by Words and Phrases Based on Syntactic Information

  • Conference paper
  • 953 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

Abstract

As a part of work on alignment of the English and Korean parallel corpus, this paper presents a statistical translation model incorporating linguistic knowledge of syntactic and phrasal information for better translations. For this, we propose three models: First, we incorporate syntactic information such as part of speech into the word-based lexical alignment. Based on this model, we propose the second model which finds phrasal correspondence in the parallel corpus. Phrasal mapping through chunk-based shallow parsing enables to settle mismatch of meaningful units in the two languages. Lastly, we develop a two-level alignment model by combining these two models in order to construct both the word and phrase-based translation model. Model parameters are automatically estimated from a set of bilingual sentence pairs by applying the EM algorithm. Experiments show that the structural relationship helps construct a better translation model for structurally different languages like Korean and English.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)

    Google Scholar 

  2. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)

    Google Scholar 

  3. Marcu, D., Wong, W.: A phrase-based, joint probability model for statistical machine translation. In: Proceedings of EMNLP 2002 (2002)

    Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. The Royal Statistics Society 39(B), 205–237 (1976)

    Google Scholar 

  5. Melamed, I.D.: A word-to-word model of translation equivalence. In: Proceedings of ACL 35/EACL, vol. 8, pp. 16–23 (1997)

    Google Scholar 

  6. Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of ACL (2000)

    Google Scholar 

  7. Papageorgiou, H., Cranias, L., Piperidis, S.: Automatic alignment in parallel corpora. In: Proceedings of ACL 32 (Student Session) (1994)

    Google Scholar 

  8. Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the 4th Workshop on Very Large Corpora, pp. 82–94 (1995)

    Google Scholar 

  9. Venugopal, A., Vogel, S., Waibel, A.: Effective Phrase Translation Exctraction from Alignment Models. In: Proceedings of ACL 2003 (2003)

    Google Scholar 

  10. Wang, W., Zhou, M., Huang, J.-X., Huang, C.-N.: Structure Alignment Using Bilingual Chunking. In: Proceedings of COLING 2002 (2002)

    Google Scholar 

  11. Wang, Y.-Y., Waibel, A.: Modeling with structures in machine translation. In: Proceedings of ACL 36/COLING

    Google Scholar 

  12. Watanabe, T., Sumita, E., Okuno, G.H.: Chunk-based Statistical Translation. In: Proceedings of ACL 2003 (2003)

    Google Scholar 

  13. Wu, D.: Stochastic Inversion Transduction Grammar and Bilingual Parsing of Parallel Corpora. Computational Linguistics 23(3), 377–403

    Google Scholar 

  14. Yamada, K., Knight, K.: A Syntax-based statistical translation model. In: Proceedings of ACL 2001 (2001)

    Google Scholar 

  15. Yamamoto, K., Matsumoto, Y.: Acquisition of Phrase-level Bilingual Correspondence using Dependency Structure. In: Proceedings of COLING 2000 (2000)

    Google Scholar 

  16. Yoon, J., Choi, K.-S., Song, M.: Three Types of Chunking in Korean and Dependency Analysis Based on Lexical Association. In: Proceedings of International Conference on Computer Processing of Oriental Languages (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, S., Yoon, J., Ra, DY. (2004). Two-Level Alignment by Words and Phrases Based on Syntactic Information. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24630-5_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21006-1

  • Online ISBN: 978-3-540-24630-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics