Abstract
Although there is no machine learning technique that fully meets human requirements, finding a quick and efficient translation mechanism has become an urgent necessity, due to the differences between the languages spoken in the world’s communities and the vast development that has occurred worldwide, as each technique demonstrates its own advantages and disadvantages. Thus, the purpose of this paper is to shed light on some of the techniques that employ machine translation available in literature, to encourage researchers to study these techniques. We discuss some of the linguistic characteristics of the Arabic language. Features of Arabic that are related to machine translation are discussed in detail, along with possible difficulties that they might present. This paper summarizes the major techniques used in machine translation from Arabic into English, and discusses their strengths and weaknesses.
Similar content being viewed by others
References
Abbès R, Dichy J, Hassoun M (2004) The architecture of a standard Arabic lexical database: some figures, ratios and categories from the DIINAR.1 source program. In: The workshop on computational approaches to Arabic script-based languages, COLING 2004. Geneva, Switzerland, pp 15–22
Abraham I, Salim R (2005) A maximum entropy word aligner for Arabic-English machine translation. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing (HLT/EMNLP). pp 89–96 (Vancouver)
Abu Shugier M, Sembok T (2007) Handling agreement in machine translation from English to Arabic. In: 1st International conference on digital communications and computer applications (DCCA2007). JUST. pp 385–379
Abu Shugier M (2009) Word agreement and ordering in English-Arabic machine translation: a rule-based approach. PhD thesis, FTSM, University Kebangsaan Malaysia, p 175
Afify M, Sarikaya R, HKJ Kuo LB, Gao Y (2006) On the use of morphological analysis for dialectal Arabic speech recognition. In: 9th International conference on spoken language processing (Interspeech—ICSLP), Pittsburgh. pp 277–280
Alansary S, Nagi M, Adly N (2009) Towards analysing the international corpus of Arabic (ICA). In: International conference on language engineering. Progress of Morphological Stage, Egypt, pp 241–245
Albared M, Nazlia O, Mohd J, Ab Aziz (2009) Classifiers combination to Arabic morphoSyntactic disambiguation. In: International conference on electrical engineering and informatics, Malaysia. 978-1-4244-4913-2/09 (IEEE)
Almas Y, Ahmed K (2007) A note on extracting “sentiments” in financial news in English, Arabic, and Urdu. In: Proceedings of the 2nd workshop on computational approaches to Arabic script-based languages (CAASL’07). pp 1–12
Alsalman S (2004) The effectiveness of machine translation. Int J Arab Engl Stud 5: 145–160
Alsharaf H, Sylviane C, Peter G (2004) French to Arabic machine translation. In: The specificity of language couples 9th EAMT workshop, “Broadening horizons of machine translation and its applications”, pp 26–27 April 2004, Malta, pp 11–17
Al-Sughaiyer I, Al-Kharashi IA (2004) Arabic morphological analysis techniques: a comprehensive survey. JASIST 55(3): 189–213
Aoun J, Elabbas B, Dominique S (1994) Agreement, word order, and conjunction in some varieties of Arabic. Linguist Inq 25: 195–220
Arnold D, Balkan L, Lee H, Meijer S, Sadler L (1994) Machine translation: an introductory guide. Blackwell, Manchester
Attia M (2006) An ambiguity-controlled morphological analyser for modern standard Arabic modelling finite state networks. In: Challenge of Arabic for NLP/MT conference. The British Computer Society, London, pp 48–67
Attia M (2007) Arabic tokenization system. In: ACL-Workshop on computational approaches to semitic languages, Prague
Attia M (2005) Developing a robust Arabic morphological transducer using finite state technology. In: The 8th annual CLUK research colloquium. Manchester
Attia M (2008) Handling Arabic morphological and syntactic ambiguity within the LFG framework with a view to machine translation. Ph.D. Thesis. The University of Manchester, Manchester, p 61
Attia M (2003) Implications of the agreement features in machine translation. M.A. Thesis. University of Manchester
Azmi M (1988) Arabic morphology: a study in the system of conjugation. Hasan Publishers, Hyderabad
Badr I, Zbib R, Glass J (2009) Syntactic phrase reordering for English-to-Arabic statistical machine translation. In: The 12th conference of the European chapter of the association for computational linguistics. Athens, pp 86–93
Beesley K (1996) Arabic finite-state morphological analysis and generation. In: Proceedings of the 16th conference on association for computational linguistics. pp 89–94
Beesley KR (1998) Arabic morphology using only finite-state operations. In: Computational approaches to semitic languages: proceedings of the workshop. Montreal, pp 50–57
Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Palo Alto, CA
Besançon R, Mostefa D, Timimi I, Chaudiron S, Laïb M (2009) Arabic, English and French: three languages in a filtering systems evaluation project. In: MEDAR 2009: 2nd international conference on Arabic language resources & Tools, 22–23 April 2009, Cairo, pp 163–167
Bisazza A, Federico M (2010) Chunk-based verb reordering in VSO sentences for Arabic-English statistical machine translation. In: ACL 2010: joint fifth workshop on statistical machine translation and MetricsMATR. Proceedings of the workshop, 15–16 July 2010, Uppsala University, Uppsala, pp 235–243
Bonnie J, Dorr E, Hovy H, Lori S (2004) Machine translation: interlingual methods. In: Brown K (ed) Encyclopaedia of language and linguistics, 2nd edn, ms. 939
Bouillon P, Sonia H, Yukie N, Kyoko K, Hitoshi I, Nikos T, Marianne S, Beth AH, Manny R (2008) Developing non-European translation pairs in a medium-vocabulary medical speech translation system. In: LREC 2008: 6th Language resources and evaluation conference, Marrakech, Morocco, 26–30 May, pp 1741–1748
Brill E, Resnik P (1994) A rule-based approach to prepositional phrase attachment. In: Proceedings of the 15th conference on 1994, acl.ldc.upenn.edu
Brown D, Ralf B (1996) Example-based machine translation in the Pangloss system. In: Proceedings of the COLING-96, vol 1, pp 169–174 (Copenhagen)
Carpuat M, Yuval M, Nizar H (2010) Improving Arabic-to-English statistical machine translation by reordering post-verbal subjects for alignment. In: ACL 2010: the 48th annual meeting of the association for computational linguistics, Uppsala, July 11–16, 2010: Proceedings of the Conference Short Papers, pp 178–183
Chafia M, Ali Mili (1995) Machine translation from Arabic to English and French information sciences 3(2):91–109
Chalabi A (2004) Elliptic personal pronoun and MT in Arabic. In: JEP-2004-TALN 2004 special session on Arabic language processing-text and speech. http://www.lpl.univ-aix.fr/jep-taln04/proceed/actes/arabe2004/TAAC17.pdf
Chalabi A (2000) MT-based transparent Arabization of the internet TARJIM.COM. In: White JS (ed.) AMTA 2000, LNAI 1934. Springer, Berlin, pp 189–191
Chalabi A (2001) Sakhr web-based Arabic/English MT engine. Downloaded from www.elsnet.org/arabic2001/chalabi.pdf on 25 Aug
Charoenpornsawat P, Sornlertlamvanich V, Charoenporn, T (2002) Improving translation quality of rule-based machine translation. In: Proceedings of COLING-02 on machine translation in Asia. Morristown, pp 1–6
Daimi K (2001) Identifying syntactic ambiguities in single-parse Arabic sentence. Comput Hum 35: 333–349
Darwish K (2002) Building a shallow Arabic morphological analyser in one day. In: Proceedings of the ACL workshop on natural language processing in the biomedical domain, PA, USA. Association for Computational Linguistics
Debili F (1992) Aligning sentences in bilingual texts French–English and French–Arabic. In: COLING, pp 517–525 (Nantes)
Ditters E (2001) A formal grammar for the description of sentence structure in modern standard Arabic. In: Workshop on Arabic processing: status and prospects at ACL/EACL, Toulouse
Doaa S, Ana GL (2008) Pragmatic annotation of discourse markers in a multilingual parallel corpus (Arabic-Spanish-English). In: LREC 2008: 6th language resources and evaluation conference, Marrakech, 26–30 May 2008
Doaa S, Antonio M, Sandoval J, Guirao M, Enrique A (2006) Building a parallel multilingual corpus (Arabic-Spanish-English). In: LREC-2006: fifth international conference on language resources and evaluation. Proceedings, Genoa, Italy, 22–28 May 2006, pp 2176–2181 (increase)
Dorr BJ, Jordan PW, Benoit JW (1999) A survey of current paradigms in machine translation. In: Zelkowitz M (ed.) Advances in computers, vol 49. Academic Press, London, pp 1–68
Elming J, Habash N (2009) Syntactic reordering for English-Arabic phrase-based machine translation. In: Proceedings of the EACL 2009 workshop on computational approaches to semitic languages, Athens, pp 69–77
Eric HN, Teruko M (1992) The KANT system: fast, accurate, high-quality translation in practical domains. In: International conference on computational linguistics proceedings of the 14th conference on computational linguistics, vol 3. pp 1069–1073
Farghaly A, Shaalan K (2009) Arabic natural language processing: challenges and solutions. ACM Trans Asian Lang Inform Process Assoc Comput Mach 8: 1–22. doi:10.1145/1644879.1644881
Farghaly A, Senellart J (2003) Intuitive coding of the Arabic lexicon. In: Proceedings of the MT Summit IX, the association for machine translation in the Americas (AMTA’03)
Fehri AF (1993) Issues in structure of Arabic clauses and works. Kulwer, Dordrecht
Furuse O, Iida H (1992) An example-based method for transfer-driven machine translation. In: The third international conference on theoretical and methodological issues, Empiristic vs. Rationalist methods in MT. Montréal, pp 139–150
Groves D, Way A (2006) Hybrid data-driven models of machine translation. Springer Science & Business Media B.V., Berlin, pp 301–323
Groves D, Way A (2005) Hybrid example-based SMT: the best of both worlds? In: Proceedings of the ACL 2005 workshop on building and using parallel texts: data-driven machine translation and beyond, Ann Arbor, pp 183–190
Guessoum A, Zantout R (2005) A methodology for evaluating Arabic machine translation systems. Mach Trans 18:299–335 doi:10.1007/s10590-005-2412-3 (Springer)
Guidere M (2002) Toward Corpus-Based Machine Translation for Standard Arabic Translation Journal 6.1. http://accurapid.com/journal/19mt.htm, visited September
Habash N (2010) Introduction to Arabic natural language processing. In: Graeme H (ed.) Synthesis lectures on human language technologies. Morgan & Claypool Publishers, San Rafael, p 187
Habash N, Jun Hu (2009) Improving Arabic-Chinese statistical machine translation using English as pivot language. In: Proceedings of the fourth workshop on statistical machine translation, Athens, 30 March–31 March, pp 173–181
Habash N, Sadat F (2006) Arabic pre-processing schemes for statistical machine translation. In: Proceedings of the 7th meeting of the North American chapter of the association for computational linguistics/human language technologies conference (HLT-NAACL06). New York, pp 49–52
Hasan S, Isbihani A El I, Hermann N (2006) Creating a large-scale Arabic to French statistical machine translation system. In: LREC-2006: fifth international conference on language resources and evaluation. Proceedings, Genoa, Italy, 22–28 May
Hatem A, Nassar A (2008) Modified Dijstra-like search algorithm for English to Arabic machine translation system. In: Hutchins J, Hahn Walther v (eds) Proceedings EAMT 2008: 12th annual conference of the European association for machine translation, September 22–23, 2008. Hamburg, pp 66–71
Hatem A, Omar N (2010) Syntactic reordering for Arabic-English phrase-based machine translation. In: Database theory and application, bio-science and bio-technology. Springer Lecture Notes in Computer Science, vol 118. Verlag, Berlin, pp 198–206
Hutchins J (2007) Machine translation: a concise history. In: Wai CS (ed.) Computer aided translation: theory and practice. Chinese University of Hong Kong, Hong Kong
Hutchins WJ, Harold LS (1992) An introduction to machine translation. Academic Press, London
Hutchins WJ (1986) Machine translation: past, present, future. Ellis Horwood Limited, West Sussex
Ibrahim K (2002) Al-Murshid fi Qawa’id Al-Nahw wa Al-Sarf [The Guide in Syntax and Morphology Rules]. Amman, Jordan, Al-Ahliyyah for Publishing and Distribution
Josef FO, Ney H (2000) Improved statistical alignment models. In: ACL00: Proceedings of the 38th annual meeting of the association for computational linguistics., Hongkong, pp 440–447
Joshan GS, Lehal GS (2007) Evaluation of direct machine translation system from Punjabi to Hindi. Int J Systemics Cybern Inform, 76–83
Kamir D, Soreq N, Neeman Y (2002) A comprehensive NLP system for modern standard Arabic and modern hebrew. In: Proceedings of the workshop on computational approaches to semitic languages in the 40th annual meeting of the association for computational linguistics (ACL-02). Philadelphia
Köpr S, Miller J (2009) A unification based approach to the morphological analysis and generation of Arabic. In: CAASL-3—third workshop on computational approaches to Arabic script-based languages [at] MT Summit XII, August 26 2009
Langlais P, Simard M (2002) Merging example-based and statistical machine translation. In: Richardson SD (ed) Machine translation: from research to real users, 5th conference of the association for machine translation in the Americas (AMTA-2002), Tiburon, October 2002. proceedings, Springer, Berlin, pp 104–113
Larkey L, Ballesteros L, Connell M (2002) Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. pp 275–282
Lavie A, Probst K, Peterson E, Vogel S, Levin L, Font-Llitjos A, Carbonell J (2004) A Trainable transfer-based machine translation approach for languages with limited resources. In: Proceedings of workshop of the European association for machine translation (EAMT-2004), Valletta, Malta, pp 116–123
Lee Y (2004) Morphological analysis for satistical machine translation. In: Proceedings of the joint conference on human language technologies and the annual meeting of the North American chapter of the association of computational linguistics (HLT-NAACL)
Lee Y, Suk L, Kishore P, Salim R (2003) Language model based Arabic word segmentation. In: 41st annual meeting of the association for computational linguistics. Sapporo, pp 399–406
Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3): 1–49
Marcu D (2001) Towards a unified approach to memory- and statistical-based machine translation. In: Association for computational linguistics: 39th annual meeting and 10th conference of the European chapter, Toulouse, pp 378–385
Mark P, Domenyk E, Samir K, Lakshmi P (2004) Relative clauses in Hindi and Arabic: a Paninian dependency grammar analysis. In: Coling’04 workshop: proceedings recent advances in dependency grammar, August 28, pp 9–16
McCarthy J (1979) Formal problems in semitic phonology and morphology. Ph.D. dissertation, MIT, Cambridge
Mitamura T, Nyberg E, Carbonell J (1991) An efficient interlingua translation system for multi-lingual document production. In: Proceedings of machine translation Summit III, Washington, DC, July 2–4
Moghrabi C (1998) On parametering the choice of words in text generation and its usefulness in machine translation. In: International conference “Machine translation: ten years on” proceedings held at Cranfield University, England, 12–14 November (Cranfield University Press, pp 1–9
Mostefa D, Laïb M, Chaudiron S, Choukri K, Chalendar G (2009) A multilingual named entity corpus for Arabic, English and French. In: MEDAR 2009: 2nd international conference on Arabic language resources & tools, April 2009, Cairo
Nagao M (1997) Machine translation through language understanding. In: Proceedings of MT Summit VI, San Diego, pp 41–49
Nguyen T, Vogel S (2008) Context-based Arabic morphological analysis for machine translation In: Proceedings of the 12th conference on computational natural language learning, Manchester, pp 135–142
Nirenburg S, Beale S, Domashnev C (1994) A full text experiment in example based machine translation. In: Proceedings of the international conference on new methods in language processing, Manchester, pp 78–87
Othman E, Shaalan K, Rafea A (2003) A chart parser for analysing modern standard Arabic sentence. In: The MT Summit IX workshop on machine translation for semitic languages: Issues and Approaches, New Orleans
Paul M, Doi T, Hwang Y, Imamura K, Sumita E (2005a) Nobody is perfect: ATR’s hybrid approach to spoken language translation. In: Proceedings of the international workshop on spoken language translation (IWSLT 2005), Pittsburgh, pp 55–62
Paul M, Sumita E, Yamamoto S (2005b) A machine learning approach to hypothesis selection of greedy decoding for SMT. In: MT Summit X workshop: second workshop on example-based machine translation, Phuket, pp 117–124
Ratcliffe R (1998) The broken plural problem in Arabic and comparative semitic: allomorphy and analogy in non-concatenative morphology. J. Benjamins, Amsterdam
Richardson S, Dolan W, Menezes A, Pinkham J (2001) Achieving commercial-quality translation with example-based methods. In: Proceedings of MT summit VIII, Santiago De Compostela, Spain
Salem Y, Arnold H, Brian N (2008) Implementing Arabic to English machine translation using the role and reference grammar linguistic model. In: Proceedings of the eighth annual international conference on information technology and telecommunication (ITT 2008), Galway, Ireland, October 2008 (Runner-up for Best Paper Award)
Shaalan K, Rafea A, Abdel Monem A, Baraka H (2004) Machine translation of English noun phrases into Arabic. Int J Comput Process Orient Lang. World Scientific Publishing Company 17(2): 121–134
Shaalan K, Raza H (2009) NERA: named entity recognition for Arabic. J Am Soc Inf Sci Technol. John Wiley & Sons, Inc., NJ 60(8): 1652–1663
Shirko O, Omar N, Arshad H, Albared M (2010) Machine translation of noun phrases from Arabic to English using transfer-based approach. J Comput Sci 6(3):350–356 (ISSN 1549-3636)
Soudi A, Bosch A, Neumann G (2007) Arabic computational morphology: knowledge-based and empirical methods. Springer, Berlin
Spence G, Christopher D (2010) Better Arabic parsing: baselines, evaluations, and analysis. In: Coling 2010: 23rd international conference on computational linguistics. Proceedings of the conference, 23–27 August 2010, Beijing International Convention Centre, Beijing
Tahir GR, Asghar S, Masood N (2010) Knowledge based machine translation. In: Proceedings of international conference on information and emerging technologies (ICIET). Karachi, Pakistan pp 1–5
Toma P (1977) SYSTRAN as a multilingual machine translation systemmt-archive. In: The third European congress on information systems, pp 569–581
Toutanova K, Suzuki H, Ruopp A (2008) Applying morphology generation models to machine translation. In: ACL-08: HLT. 46th annual meeting of the association for computational linguistics: human language technologies. Proceedings of the conference, June 15–20, 2008, The Ohio State University, Columbus,, pp 514–522
Trujillo A (1999) Translation engines: techniques for machine translation. Springer, London
Yassine B, Imed Z, Mona D, Paolo R (2010) Arabic named entity recognition: using features extracted from noisy data. In: Proceedings of the ACL 2010 conference short papers, pp 281–285, Uppsala, 11–16 July 2010. c 2010 Association for Computational Linguistics
Žabokrtský Z, Smrž O Arabic syntactic trees: from constituency to dependency. In: The 10th conference of the European chapter of the association for computational linguistics, Budapest, pp 183–186
Zavrel J, Daelemans W, Veenstra J (1997) Resolving PP. attachment Ambiguities with memory-based learning. In: The workshop on computational natural language learning (CoNLL’97). Madrid, pp 136–144
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alqudsi, A., Omar, N. & Shaker, K. Arabic machine translation: a survey. Artif Intell Rev 42, 549–572 (2014). https://doi.org/10.1007/s10462-012-9351-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-012-9351-1