Skip to main content

Aligning Turkish and English Parallel Texts for Statistical Machine Translation

  • Conference paper
Computer and Information Sciences - ISCIS 2005 (ISCIS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3733))

Included in the following conference series:

  • 2637 Accesses

Abstract

This paper presents a preliminary work on aligning Turkish and English parallel texts towards developing a statistical machine translation system for English and Turkish. To avoid the data sparseness problem and to uncover relations between sublexical components of words such as morphemes, we have converted our parallel texts to a morphemic representation and then used standard word alignment algorithms. Results from a mere 3K sentences of parallel English–Turkish texts show that we are able to link Turkish morphemes with English morphemes and function words quite successfully. We have also used the Turkish WordNet which is linked with the English WordNet, as a bootstrapping dictionary to constrain root word alignments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Lafferty, J.D., Mercer, R.L.: Analysis, statistical transfer, and synthesis in machine translation. In: Proceeding of TMI: Fourth International Conference on Theoretical and Methodological Issues in MT, pp. 83–100 (1992)

    Google Scholar 

  2. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–311 (1993)

    Google Scholar 

  3. Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse (2001)

    Google Scholar 

  4. Lee, Y.S.: Morphological analysis for statistical machine translation. In: Proceedings of HLT-NAACL 2004 - Companion Volume, pp. 57–60 (2004)

    Google Scholar 

  5. Niessen, S., Ney, H.: Statistical machine translation with scarce resources using morpho-syntatic information. Computational Linguistics 30, 181–204 (2004)

    Article  Google Scholar 

  6. Bilgin, O., Çetinoǧlu, O., Oflazer, K.: Building a Wordnet for Turkish. Romanian Journal of Information Science and Technology 7, 163–172 (2004)

    Google Scholar 

  7. Fellbaum, C. (ed.): WordNet, An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  8. Oflazer, K.: Two-level description of Turkish morphology. Literary and Linguistic Computing 9, 137–148 (1994)

    Article  Google Scholar 

  9. Karp, D., Schabes, Y., Zaidel, M., Egedi, D.: A freely available wide coverage morphological analyzer for english. In: Proceedings of the 14th International Conference on Computational Linguistics (1992)

    Google Scholar 

  10. Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 440–447 (2000)

    Google Scholar 

  11. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Goldsmith, M., Hajic, J., Mercer, R.L., Mohanty, R.: But dictionaries are data too. In: Procedings of the ARPA Human Language Technology Workshop, Princeton, NJ, pp. 202–205 (2003)

    Google Scholar 

  12. Germann, U., Jahr, M., Knight, K., Marcu, D., Yamada, K.: Fast decoding and optimal decoding for machine translation. In: Procedings of ACL 2001, Toulouse, France (2001)

    Google Scholar 

  13. Ulrich, G.: Greedy decoding for statistical machine translation in almost linear time. In: Procedings of HLT-NAACL-2003, Edmonton, AB, Canada (2003)

    Google Scholar 

  14. Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30, 417–449 (2004)

    Article  Google Scholar 

  15. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of HLT/NAACL (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

El-Kahlout, İ.D., Oflazer, K. (2005). Aligning Turkish and English Parallel Texts for Statistical Machine Translation. In: Yolum, p., Güngör, T., Gürgen, F., Özturan, C. (eds) Computer and Information Sciences - ISCIS 2005. ISCIS 2005. Lecture Notes in Computer Science, vol 3733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11569596_64

Download citation

  • DOI: https://doi.org/10.1007/11569596_64

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29414-6

  • Online ISBN: 978-3-540-32085-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics