Skip to main content
Log in

Arabic machine translation: a survey

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Although there is no machine learning technique that fully meets human requirements, finding a quick and efficient translation mechanism has become an urgent necessity, due to the differences between the languages spoken in the world’s communities and the vast development that has occurred worldwide, as each technique demonstrates its own advantages and disadvantages. Thus, the purpose of this paper is to shed light on some of the techniques that employ machine translation available in literature, to encourage researchers to study these techniques. We discuss some of the linguistic characteristics of the Arabic language. Features of Arabic that are related to machine translation are discussed in detail, along with possible difficulties that they might present. This paper summarizes the major techniques used in machine translation from Arabic into English, and discusses their strengths and weaknesses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abbès R, Dichy J, Hassoun M (2004) The architecture of a standard Arabic lexical database: some figures, ratios and categories from the DIINAR.1 source program. In: The workshop on computational approaches to Arabic script-based languages, COLING 2004. Geneva, Switzerland, pp 15–22

  • Abraham I, Salim R (2005) A maximum entropy word aligner for Arabic-English machine translation. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing (HLT/EMNLP). pp 89–96 (Vancouver)

  • Abu Shugier M, Sembok T (2007) Handling agreement in machine translation from English to Arabic. In: 1st International conference on digital communications and computer applications (DCCA2007). JUST. pp 385–379

  • Abu Shugier M (2009) Word agreement and ordering in English-Arabic machine translation: a rule-based approach. PhD thesis, FTSM, University Kebangsaan Malaysia, p 175

  • Afify M, Sarikaya R, HKJ Kuo LB, Gao Y (2006) On the use of morphological analysis for dialectal Arabic speech recognition. In: 9th International conference on spoken language processing (Interspeech—ICSLP), Pittsburgh. pp 277–280

  • Alansary S, Nagi M, Adly N (2009) Towards analysing the international corpus of Arabic (ICA). In: International conference on language engineering. Progress of Morphological Stage, Egypt, pp 241–245

  • Albared M, Nazlia O, Mohd J, Ab Aziz (2009) Classifiers combination to Arabic morphoSyntactic disambiguation. In: International conference on electrical engineering and informatics, Malaysia. 978-1-4244-4913-2/09 (IEEE)

  • Almas Y, Ahmed K (2007) A note on extracting “sentiments” in financial news in English, Arabic, and Urdu. In: Proceedings of the 2nd workshop on computational approaches to Arabic script-based languages (CAASL’07). pp 1–12

  • Alsalman S (2004) The effectiveness of machine translation. Int J Arab Engl Stud 5: 145–160

    Google Scholar 

  • Alsharaf H, Sylviane C, Peter G (2004) French to Arabic machine translation. In: The specificity of language couples 9th EAMT workshop, “Broadening horizons of machine translation and its applications”, pp 26–27 April 2004, Malta, pp 11–17

  • Al-Sughaiyer I, Al-Kharashi IA (2004) Arabic morphological analysis techniques: a comprehensive survey. JASIST 55(3): 189–213

    Article  Google Scholar 

  • Aoun J, Elabbas B, Dominique S (1994) Agreement, word order, and conjunction in some varieties of Arabic. Linguist Inq 25: 195–220

    Google Scholar 

  • Arnold D, Balkan L, Lee H, Meijer S, Sadler L (1994) Machine translation: an introductory guide. Blackwell, Manchester

    Google Scholar 

  • Attia M (2006) An ambiguity-controlled morphological analyser for modern standard Arabic modelling finite state networks. In: Challenge of Arabic for NLP/MT conference. The British Computer Society, London, pp 48–67

  • Attia M (2007) Arabic tokenization system. In: ACL-Workshop on computational approaches to semitic languages, Prague

  • Attia M (2005) Developing a robust Arabic morphological transducer using finite state technology. In: The 8th annual CLUK research colloquium. Manchester

  • Attia M (2008) Handling Arabic morphological and syntactic ambiguity within the LFG framework with a view to machine translation. Ph.D. Thesis. The University of Manchester, Manchester, p 61

  • Attia M (2003) Implications of the agreement features in machine translation. M.A. Thesis. University of Manchester

  • Azmi M (1988) Arabic morphology: a study in the system of conjugation. Hasan Publishers, Hyderabad

    Google Scholar 

  • Badr I, Zbib R, Glass J (2009) Syntactic phrase reordering for English-to-Arabic statistical machine translation. In: The 12th conference of the European chapter of the association for computational linguistics. Athens, pp 86–93

  • Beesley K (1996) Arabic finite-state morphological analysis and generation. In: Proceedings of the 16th conference on association for computational linguistics. pp 89–94

  • Beesley KR (1998) Arabic morphology using only finite-state operations. In: Computational approaches to semitic languages: proceedings of the workshop. Montreal, pp 50–57

  • Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Palo Alto, CA

  • Besançon R, Mostefa D, Timimi I, Chaudiron S, Laïb M (2009) Arabic, English and French: three languages in a filtering systems evaluation project. In: MEDAR 2009: 2nd international conference on Arabic language resources & Tools, 22–23 April 2009, Cairo, pp 163–167

  • Bisazza A, Federico M (2010) Chunk-based verb reordering in VSO sentences for Arabic-English statistical machine translation. In: ACL 2010: joint fifth workshop on statistical machine translation and MetricsMATR. Proceedings of the workshop, 15–16 July 2010, Uppsala University, Uppsala, pp 235–243

  • Bonnie J, Dorr E, Hovy H, Lori S (2004) Machine translation: interlingual methods. In: Brown K (ed) Encyclopaedia of language and linguistics, 2nd edn, ms. 939

  • Bouillon P, Sonia H, Yukie N, Kyoko K, Hitoshi I, Nikos T, Marianne S, Beth AH, Manny R (2008) Developing non-European translation pairs in a medium-vocabulary medical speech translation system. In: LREC 2008: 6th Language resources and evaluation conference, Marrakech, Morocco, 26–30 May, pp 1741–1748

  • Brill E, Resnik P (1994) A rule-based approach to prepositional phrase attachment. In: Proceedings of the 15th conference on 1994, acl.ldc.upenn.edu

  • Brown D, Ralf B (1996) Example-based machine translation in the Pangloss system. In: Proceedings of the COLING-96, vol 1, pp 169–174 (Copenhagen)

  • Carpuat M, Yuval M, Nizar H (2010) Improving Arabic-to-English statistical machine translation by reordering post-verbal subjects for alignment. In: ACL 2010: the 48th annual meeting of the association for computational linguistics, Uppsala, July 11–16, 2010: Proceedings of the Conference Short Papers, pp 178–183

  • Chafia M, Ali Mili (1995) Machine translation from Arabic to English and French information sciences 3(2):91–109

  • Chalabi A (2004) Elliptic personal pronoun and MT in Arabic. In: JEP-2004-TALN 2004 special session on Arabic language processing-text and speech. http://www.lpl.univ-aix.fr/jep-taln04/proceed/actes/arabe2004/TAAC17.pdf

  • Chalabi A (2000) MT-based transparent Arabization of the internet TARJIM.COM. In: White JS (ed.) AMTA 2000, LNAI 1934. Springer, Berlin, pp 189–191

    Google Scholar 

  • Chalabi A (2001) Sakhr web-based Arabic/English MT engine. Downloaded from www.elsnet.org/arabic2001/chalabi.pdf on 25 Aug

  • Charoenpornsawat P, Sornlertlamvanich V, Charoenporn, T (2002) Improving translation quality of rule-based machine translation. In: Proceedings of COLING-02 on machine translation in Asia. Morristown, pp 1–6

  • Daimi K (2001) Identifying syntactic ambiguities in single-parse Arabic sentence. Comput Hum 35: 333–349

    Article  Google Scholar 

  • Darwish K (2002) Building a shallow Arabic morphological analyser in one day. In: Proceedings of the ACL workshop on natural language processing in the biomedical domain, PA, USA. Association for Computational Linguistics

  • Debili F (1992) Aligning sentences in bilingual texts French–English and French–Arabic. In: COLING, pp 517–525 (Nantes)

  • Ditters E (2001) A formal grammar for the description of sentence structure in modern standard Arabic. In: Workshop on Arabic processing: status and prospects at ACL/EACL, Toulouse

  • Doaa S, Ana GL (2008) Pragmatic annotation of discourse markers in a multilingual parallel corpus (Arabic-Spanish-English). In: LREC 2008: 6th language resources and evaluation conference, Marrakech, 26–30 May 2008

  • Doaa S, Antonio M, Sandoval J, Guirao M, Enrique A (2006) Building a parallel multilingual corpus (Arabic-Spanish-English). In: LREC-2006: fifth international conference on language resources and evaluation. Proceedings, Genoa, Italy, 22–28 May 2006, pp 2176–2181 (increase)

  • Dorr BJ, Jordan PW, Benoit JW (1999) A survey of current paradigms in machine translation. In: Zelkowitz M (ed.) Advances in computers, vol 49. Academic Press, London, pp 1–68

    Chapter  Google Scholar 

  • Elming J, Habash N (2009) Syntactic reordering for English-Arabic phrase-based machine translation. In: Proceedings of the EACL 2009 workshop on computational approaches to semitic languages, Athens, pp 69–77

  • Eric HN, Teruko M (1992) The KANT system: fast, accurate, high-quality translation in practical domains. In: International conference on computational linguistics proceedings of the 14th conference on computational linguistics, vol 3. pp 1069–1073

  • Farghaly A, Shaalan K (2009) Arabic natural language processing: challenges and solutions. ACM Trans Asian Lang Inform Process Assoc Comput Mach 8: 1–22. doi:10.1145/1644879.1644881

    Article  Google Scholar 

  • Farghaly A, Senellart J (2003) Intuitive coding of the Arabic lexicon. In: Proceedings of the MT Summit IX, the association for machine translation in the Americas (AMTA’03)

  • Fehri AF (1993) Issues in structure of Arabic clauses and works. Kulwer, Dordrecht

    Book  Google Scholar 

  • Furuse O, Iida H (1992) An example-based method for transfer-driven machine translation. In: The third international conference on theoretical and methodological issues, Empiristic vs. Rationalist methods in MT. Montréal, pp 139–150

  • Groves D, Way A (2006) Hybrid data-driven models of machine translation. Springer Science & Business Media B.V., Berlin, pp 301–323

    Google Scholar 

  • Groves D, Way A (2005) Hybrid example-based SMT: the best of both worlds? In: Proceedings of the ACL 2005 workshop on building and using parallel texts: data-driven machine translation and beyond, Ann Arbor, pp 183–190

  • Guessoum A, Zantout R (2005) A methodology for evaluating Arabic machine translation systems. Mach Trans 18:299–335 doi:10.1007/s10590-005-2412-3 (Springer)

    Google Scholar 

  • Guidere M (2002) Toward Corpus-Based Machine Translation for Standard Arabic Translation Journal 6.1. http://accurapid.com/journal/19mt.htm, visited September

  • Habash N (2010) Introduction to Arabic natural language processing. In: Graeme H (ed.) Synthesis lectures on human language technologies. Morgan & Claypool Publishers, San Rafael, p 187

    Google Scholar 

  • Habash N, Jun Hu (2009) Improving Arabic-Chinese statistical machine translation using English as pivot language. In: Proceedings of the fourth workshop on statistical machine translation, Athens, 30 March–31 March, pp 173–181

  • Habash N, Sadat F (2006) Arabic pre-processing schemes for statistical machine translation. In: Proceedings of the 7th meeting of the North American chapter of the association for computational linguistics/human language technologies conference (HLT-NAACL06). New York, pp 49–52

  • Hasan S, Isbihani A El I, Hermann N (2006) Creating a large-scale Arabic to French statistical machine translation system. In: LREC-2006: fifth international conference on language resources and evaluation. Proceedings, Genoa, Italy, 22–28 May

  • Hatem A, Nassar A (2008) Modified Dijstra-like search algorithm for English to Arabic machine translation system. In: Hutchins J, Hahn Walther v (eds) Proceedings EAMT 2008: 12th annual conference of the European association for machine translation, September 22–23, 2008. Hamburg, pp 66–71

  • Hatem A, Omar N (2010) Syntactic reordering for Arabic-English phrase-based machine translation. In: Database theory and application, bio-science and bio-technology. Springer Lecture Notes in Computer Science, vol 118. Verlag, Berlin, pp 198–206

  • Hutchins J (2007) Machine translation: a concise history. In: Wai CS (ed.) Computer aided translation: theory and practice. Chinese University of Hong Kong, Hong Kong

    Google Scholar 

  • Hutchins WJ, Harold LS (1992) An introduction to machine translation. Academic Press, London

    MATH  Google Scholar 

  • Hutchins WJ (1986) Machine translation: past, present, future. Ellis Horwood Limited, West Sussex

    Google Scholar 

  • Ibrahim K (2002) Al-Murshid fi Qawa’id Al-Nahw wa Al-Sarf [The Guide in Syntax and Morphology Rules]. Amman, Jordan, Al-Ahliyyah for Publishing and Distribution

  • Josef FO, Ney H (2000) Improved statistical alignment models. In: ACL00: Proceedings of the 38th annual meeting of the association for computational linguistics., Hongkong, pp 440–447

  • Joshan GS, Lehal GS (2007) Evaluation of direct machine translation system from Punjabi to Hindi. Int J Systemics Cybern Inform, 76–83

  • Kamir D, Soreq N, Neeman Y (2002) A comprehensive NLP system for modern standard Arabic and modern hebrew. In: Proceedings of the workshop on computational approaches to semitic languages in the 40th annual meeting of the association for computational linguistics (ACL-02). Philadelphia

  • Köpr S, Miller J (2009) A unification based approach to the morphological analysis and generation of Arabic. In: CAASL-3—third workshop on computational approaches to Arabic script-based languages [at] MT Summit XII, August 26 2009

  • Langlais P, Simard M (2002) Merging example-based and statistical machine translation. In: Richardson SD (ed) Machine translation: from research to real users, 5th conference of the association for machine translation in the Americas (AMTA-2002), Tiburon, October 2002. proceedings, Springer, Berlin, pp 104–113

  • Larkey L, Ballesteros L, Connell M (2002) Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. pp 275–282

  • Lavie A, Probst K, Peterson E, Vogel S, Levin L, Font-Llitjos A, Carbonell J (2004) A Trainable transfer-based machine translation approach for languages with limited resources. In: Proceedings of workshop of the European association for machine translation (EAMT-2004), Valletta, Malta, pp 116–123

  • Lee Y (2004) Morphological analysis for satistical machine translation. In: Proceedings of the joint conference on human language technologies and the annual meeting of the North American chapter of the association of computational linguistics (HLT-NAACL)

  • Lee Y, Suk L, Kishore P, Salim R (2003) Language model based Arabic word segmentation. In: 41st annual meeting of the association for computational linguistics. Sapporo, pp 399–406

  • Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3): 1–49

    Article  Google Scholar 

  • Marcu D (2001) Towards a unified approach to memory- and statistical-based machine translation. In: Association for computational linguistics: 39th annual meeting and 10th conference of the European chapter, Toulouse, pp 378–385

  • Mark P, Domenyk E, Samir K, Lakshmi P (2004) Relative clauses in Hindi and Arabic: a Paninian dependency grammar analysis. In: Coling’04 workshop: proceedings recent advances in dependency grammar, August 28, pp 9–16

  • McCarthy J (1979) Formal problems in semitic phonology and morphology. Ph.D. dissertation, MIT, Cambridge

  • Mitamura T, Nyberg E, Carbonell J (1991) An efficient interlingua translation system for multi-lingual document production. In: Proceedings of machine translation Summit III, Washington, DC, July 2–4

  • Moghrabi C (1998) On parametering the choice of words in text generation and its usefulness in machine translation. In: International conference “Machine translation: ten years on” proceedings held at Cranfield University, England, 12–14 November (Cranfield University Press, pp 1–9

  • Mostefa D, Laïb M, Chaudiron S, Choukri K, Chalendar G (2009) A multilingual named entity corpus for Arabic, English and French. In: MEDAR 2009: 2nd international conference on Arabic language resources & tools, April 2009, Cairo

  • Nagao M (1997) Machine translation through language understanding. In: Proceedings of MT Summit VI, San Diego, pp 41–49

  • Nguyen T, Vogel S (2008) Context-based Arabic morphological analysis for machine translation In: Proceedings of the 12th conference on computational natural language learning, Manchester, pp 135–142

  • Nirenburg S, Beale S, Domashnev C (1994) A full text experiment in example based machine translation. In: Proceedings of the international conference on new methods in language processing, Manchester, pp 78–87

  • Othman E, Shaalan K, Rafea A (2003) A chart parser for analysing modern standard Arabic sentence. In: The MT Summit IX workshop on machine translation for semitic languages: Issues and Approaches, New Orleans

  • Paul M, Doi T, Hwang Y, Imamura K, Sumita E (2005a) Nobody is perfect: ATR’s hybrid approach to spoken language translation. In: Proceedings of the international workshop on spoken language translation (IWSLT 2005), Pittsburgh, pp 55–62

  • Paul M, Sumita E, Yamamoto S (2005b) A machine learning approach to hypothesis selection of greedy decoding for SMT. In: MT Summit X workshop: second workshop on example-based machine translation, Phuket, pp 117–124

  • Ratcliffe R (1998) The broken plural problem in Arabic and comparative semitic: allomorphy and analogy in non-concatenative morphology. J. Benjamins, Amsterdam

  • Richardson S, Dolan W, Menezes A, Pinkham J (2001) Achieving commercial-quality translation with example-based methods. In: Proceedings of MT summit VIII, Santiago De Compostela, Spain

  • Salem Y, Arnold H, Brian N (2008) Implementing Arabic to English machine translation using the role and reference grammar linguistic model. In: Proceedings of the eighth annual international conference on information technology and telecommunication (ITT 2008), Galway, Ireland, October 2008 (Runner-up for Best Paper Award)

  • Shaalan K, Rafea A, Abdel Monem A, Baraka H (2004) Machine translation of English noun phrases into Arabic. Int J Comput Process Orient Lang. World Scientific Publishing Company 17(2): 121–134

    Google Scholar 

  • Shaalan K, Raza H (2009) NERA: named entity recognition for Arabic. J Am Soc Inf Sci Technol. John Wiley & Sons, Inc., NJ 60(8): 1652–1663

    Article  Google Scholar 

  • Shirko O, Omar N, Arshad H, Albared M (2010) Machine translation of noun phrases from Arabic to English using transfer-based approach. J Comput Sci 6(3):350–356 (ISSN 1549-3636)

    Google Scholar 

  • Soudi A, Bosch A, Neumann G (2007) Arabic computational morphology: knowledge-based and empirical methods. Springer, Berlin

    Book  Google Scholar 

  • Spence G, Christopher D (2010) Better Arabic parsing: baselines, evaluations, and analysis. In: Coling 2010: 23rd international conference on computational linguistics. Proceedings of the conference, 23–27 August 2010, Beijing International Convention Centre, Beijing

  • Tahir GR, Asghar S, Masood N (2010) Knowledge based machine translation. In: Proceedings of international conference on information and emerging technologies (ICIET). Karachi, Pakistan pp 1–5

  • Toma P (1977) SYSTRAN as a multilingual machine translation systemmt-archive. In: The third European congress on information systems, pp 569–581

  • Toutanova K, Suzuki H, Ruopp A (2008) Applying morphology generation models to machine translation. In: ACL-08: HLT. 46th annual meeting of the association for computational linguistics: human language technologies. Proceedings of the conference, June 15–20, 2008, The Ohio State University, Columbus,, pp 514–522

  • Trujillo A (1999) Translation engines: techniques for machine translation. Springer, London

    Book  MATH  Google Scholar 

  • Yassine B, Imed Z, Mona D, Paolo R (2010) Arabic named entity recognition: using features extracted from noisy data. In: Proceedings of the ACL 2010 conference short papers, pp 281–285, Uppsala, 11–16 July 2010. c 2010 Association for Computational Linguistics

  • Žabokrtský Z, Smrž O Arabic syntactic trees: from constituency to dependency. In: The 10th conference of the European chapter of the association for computational linguistics, Budapest, pp 183–186

  • Zavrel J, Daelemans W, Veenstra J (1997) Resolving PP. attachment Ambiguities with memory-based learning. In: The workshop on computational natural language learning (CoNLL’97). Madrid, pp 136–144

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arwa Alqudsi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alqudsi, A., Omar, N. & Shaker, K. Arabic machine translation: a survey. Artif Intell Rev 42, 549–572 (2014). https://doi.org/10.1007/s10462-012-9351-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-012-9351-1

Keywords

Navigation