Abstract
This paper presents state of the art of the statistical methods that enhance English to Arabic (En-Ar) Machine Translation (MT). First, the paper introduces a brief history of the machine translation by clarifying the obstacles it faced; as exploring the history shows that research can develop new ideas. Second, the paper discusses the Statistical Machine Translation (SMT) method as an effective state of the art in the MT field. Moreover, it presents the SMT pipeline in brief and explores the En-Ar MT enhancements that have been applied by processing both sides of the parallel corpus before, after and within the pipeline. The paper explores Arabic linguistic challenges in MT such as: orthographic, morphological and syntactical issues. The purpose of surveying only En-Ar translation direction in the SMT is to help transferring the knowledge and science to the Arabic language and spreading the information to all who are interested in the Arabic language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alqudsi, A., Omar, N., Shaker, K.: Arabic machine translation: a survey. Artificial Intelligence Review, 1–24 (2012)
Badr, I., Zbib, R., Glass, J.: Segmentation for english-to-arabic statistical machine translation. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 153–156. Association for Computational Linguistics (2008)
Badr, I., Zbib, R., Glass, J.: Syntactic phrase reordering for english-to-arabic statistical machine translation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 86–93. Association for Computational Linguistics (2009)
Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics Conference, pp. 132–139. Association for Computational Linguistics (2000)
Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, pp. 16–23. Association for Computational Linguistics (1997)
Diab, M., Hacioglu, K., Jurafsky, D.: Automated methods for processing arabic text: from tokenization to base phrase chunking. In: Arabic Computational Morphology: Knowledge-based and Empirical Methods. Kluwer/Springer (2007)
Dorr, B.J., Jordan, P.W., Benoit, J.W.: A survey of current paradigms in machine translation. Advances in Computers 49, 1–68 (1999)
Eisele, A., Chen, Y.: Multiun: A multilingual corpus from united nation documents. In: LREC (2010)
Elming, J., Habash, N.: Syntactic reordering for english-arabic phrase-based machine translation. In: Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages, pp. 69–77. Association for Computational Linguistics (2009)
El Kholy, A., Habash, N.: Techniques for arabic morphological detokenization and orthographic denormalization. In: Editors & Workshop Chairs, p. 45 (2010)
El Kholy, A., Habash, N.: Orthographic and morphological processing for english–arabic statistical machine translation. Machine Translation 26(1-2), 25–45 (2012)
Elming, J.: Syntactic reordering integrated with phrase-based smt. In: Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation, pp. 46–54. Association for Computational Linguistics (2008)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)
Farghaly, A., Shaalan, K.: Arabic natural language processing: Challenges and solutions. ACM Transactions on Asian Language Information Processing (TALIP)Â 8(4), 14 (2009)
Habash, N.: Syntactic preprocessing for statistical machine translation. MT Summit XI, 215–222 (2007)
Habash, N., Rambow, O.: Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 573–580. Association for Computational Linguistics (2005)
Habash, N., Soudi, A., Buckwalter, T.: On arabic transliteration. In: Arabic Computational Morphology, pp. 15–22. Springer (2007)
John Hutchins, W.: Machine translation: A brief history. In: Concise History of the Language Sciences: from the Sumerians to the Cognitivists, pp. 431–445 (1995)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Richard, Zens, o.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)
Khemakhem, I.T., Jamoussi, S.: Integrating morpho-syntactic features in english-arabic statistical machine translation. In: ACL 2013, p. 74 (2013)
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. MT Summit 5, 79–86 (2005)
Lopez, A.: Statistical machine translation. ACM Computing Surveys (CSUR)Â 40(3), 8 (2008)
Och, F.J., Ney, H.: Statistical machine translation. In: EAMT Workshop, pp. 39–46 (2000)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Pasha, A., Al-Badrashiny, M., Kholy, A.E., Eskander, R., Diab, M., Habash, N., Pooleery, M., Rambow, O., Roth, R.: Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In: Proceedings of the 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014)
Ratnaparkhi, A., et al.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, vol. 1, pp. 133–142 (1996)
Resnik, P., Smith, N.A.: The web as a parallel corpus. Computational Linguistics 29(3), 349–380 (2003)
Andreas, S., et al.: Srilm-an extensible language modeling toolkit. In: INTERSPEECH (2002)
Sarikaya, R., Deng, Y.: Joint morphological-lexical language modeling for machine translation. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, pp. 145–148. Association for Computational Linguistics (2007)
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27 (1948)
Slocum, J.: A survey of machine translation: its history, current status, and future prospects. Computational Linguistics 11(1), 1–17 (1985)
Somers, H.L.: Current research in machine translation. Machine Translation 7(4), 231–246 (1992)
Somers, H.: Review article: Example-based machine translation. Machine Translation 14(2), 113–157 (1999)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)
Weaver, W.: Translation. Machine Translation of Languages 14, 15–23 (1955)
Zughoul, M.R.: English/arabic/english machine translation: A historical perspective. Meta: Journal des traducteursMeta:/Translators’ Journal 50(3), 1022–1041 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ebrahim, S., Hegazy, D., Mostafa, M.G.M., El-Beltagy, S.R. (2015). English-Arabic Statistical Machine Translation: State of the Art. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-18111-0_39
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18110-3
Online ISBN: 978-3-319-18111-0
eBook Packages: Computer ScienceComputer Science (R0)