Skip to main content

English-Arabic Statistical Machine Translation: State of the Art

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9041))

Abstract

This paper presents state of the art of the statistical methods that enhance English to Arabic (En-Ar) Machine Translation (MT). First, the paper introduces a brief history of the machine translation by clarifying the obstacles it faced; as exploring the history shows that research can develop new ideas. Second, the paper discusses the Statistical Machine Translation (SMT) method as an effective state of the art in the MT field. Moreover, it presents the SMT pipeline in brief and explores the En-Ar MT enhancements that have been applied by processing both sides of the parallel corpus before, after and within the pipeline. The paper explores Arabic linguistic challenges in MT such as: orthographic, morphological and syntactical issues. The purpose of surveying only En-Ar translation direction in the SMT is to help transferring the knowledge and science to the Arabic language and spreading the information to all who are interested in the Arabic language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alqudsi, A., Omar, N., Shaker, K.: Arabic machine translation: a survey. Artificial Intelligence Review, 1–24 (2012)

    Google Scholar 

  2. Badr, I., Zbib, R., Glass, J.: Segmentation for english-to-arabic statistical machine translation. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 153–156. Association for Computational Linguistics (2008)

    Google Scholar 

  3. Badr, I., Zbib, R., Glass, J.: Syntactic phrase reordering for english-to-arabic statistical machine translation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 86–93. Association for Computational Linguistics (2009)

    Google Scholar 

  4. Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics Conference, pp. 132–139. Association for Computational Linguistics (2000)

    Google Scholar 

  5. Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, pp. 16–23. Association for Computational Linguistics (1997)

    Google Scholar 

  6. Diab, M., Hacioglu, K., Jurafsky, D.: Automated methods for processing arabic text: from tokenization to base phrase chunking. In: Arabic Computational Morphology: Knowledge-based and Empirical Methods. Kluwer/Springer (2007)

    Google Scholar 

  7. Dorr, B.J., Jordan, P.W., Benoit, J.W.: A survey of current paradigms in machine translation. Advances in Computers 49, 1–68 (1999)

    Article  Google Scholar 

  8. Eisele, A., Chen, Y.: Multiun: A multilingual corpus from united nation documents. In: LREC (2010)

    Google Scholar 

  9. Elming, J., Habash, N.: Syntactic reordering for english-arabic phrase-based machine translation. In: Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages, pp. 69–77. Association for Computational Linguistics (2009)

    Google Scholar 

  10. El Kholy, A., Habash, N.: Techniques for arabic morphological detokenization and orthographic denormalization. In: Editors & Workshop Chairs, p. 45 (2010)

    Google Scholar 

  11. El Kholy, A., Habash, N.: Orthographic and morphological processing for english–arabic statistical machine translation. Machine Translation 26(1-2), 25–45 (2012)

    Article  Google Scholar 

  12. Elming, J.: Syntactic reordering integrated with phrase-based smt. In: Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation, pp. 46–54. Association for Computational Linguistics (2008)

    Google Scholar 

  13. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)

    Google Scholar 

  14. Farghaly, A., Shaalan, K.: Arabic natural language processing: Challenges and solutions. ACM Transactions on Asian Language Information Processing (TALIP) 8(4), 14 (2009)

    Article  Google Scholar 

  15. Habash, N.: Syntactic preprocessing for statistical machine translation. MT Summit XI, 215–222 (2007)

    Google Scholar 

  16. Habash, N., Rambow, O.: Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 573–580. Association for Computational Linguistics (2005)

    Google Scholar 

  17. Habash, N., Soudi, A., Buckwalter, T.: On arabic transliteration. In: Arabic Computational Morphology, pp. 15–22. Springer (2007)

    Google Scholar 

  18. John Hutchins, W.: Machine translation: A brief history. In: Concise History of the Language Sciences: from the Sumerians to the Cognitivists, pp. 431–445 (1995)

    Google Scholar 

  19. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Richard, Zens, o.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)

    Google Scholar 

  20. Khemakhem, I.T., Jamoussi, S.: Integrating morpho-syntactic features in english-arabic statistical machine translation. In: ACL 2013, p. 74 (2013)

    Google Scholar 

  21. Koehn, P.: Europarl: A parallel corpus for statistical machine translation. MT Summit 5, 79–86 (2005)

    Google Scholar 

  22. Lopez, A.: Statistical machine translation. ACM Computing Surveys (CSUR) 40(3), 8 (2008)

    Article  Google Scholar 

  23. Och, F.J., Ney, H.: Statistical machine translation. In: EAMT Workshop, pp. 39–46 (2000)

    Google Scholar 

  24. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  25. Pasha, A., Al-Badrashiny, M., Kholy, A.E., Eskander, R., Diab, M., Habash, N., Pooleery, M., Rambow, O., Roth, R.: Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In: Proceedings of the 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014)

    Google Scholar 

  26. Ratnaparkhi, A., et al.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, vol. 1, pp. 133–142 (1996)

    Google Scholar 

  27. Resnik, P., Smith, N.A.: The web as a parallel corpus. Computational Linguistics 29(3), 349–380 (2003)

    Article  Google Scholar 

  28. Andreas, S., et al.: Srilm-an extensible language modeling toolkit. In: INTERSPEECH (2002)

    Google Scholar 

  29. Sarikaya, R., Deng, Y.: Joint morphological-lexical language modeling for machine translation. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, pp. 145–148. Association for Computational Linguistics (2007)

    Google Scholar 

  30. Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27 (1948)

    Google Scholar 

  31. Slocum, J.: A survey of machine translation: its history, current status, and future prospects. Computational Linguistics 11(1), 1–17 (1985)

    Google Scholar 

  32. Somers, H.L.: Current research in machine translation. Machine Translation 7(4), 231–246 (1992)

    Article  Google Scholar 

  33. Somers, H.: Review article: Example-based machine translation. Machine Translation 14(2), 113–157 (1999)

    Article  MathSciNet  Google Scholar 

  34. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)

    Google Scholar 

  35. Weaver, W.: Translation. Machine Translation of Languages 14, 15–23 (1955)

    Google Scholar 

  36. Zughoul, M.R.: English/arabic/english machine translation: A historical perspective. Meta: Journal des traducteursMeta:/Translators’ Journal 50(3), 1022–1041 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sara Ebrahim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ebrahim, S., Hegazy, D., Mostafa, M.G.M., El-Beltagy, S.R. (2015). English-Arabic Statistical Machine Translation: State of the Art. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18111-0_39

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18110-3

  • Online ISBN: 978-3-319-18111-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics