Abstract
Question answering is a subfield of information retrieval. It is a task of answering a question posted in a natural language. A question answering system (QAS) may be considered a good alternative to search engines that return a set of related documents. The QAS system is composed of three main modules; question analysis, passage retrieval, and answer extraction. Over the years, numerous QASs have been presented for use in different languages. However, the the development of Arabic QASs has been slowed by linguistic challenges and the lack of resources and tools available to researchers. In this survey, we start with the challenges due to the language and how these challenges make the development of new Arabic QAS more difficult. Next, we do a detailed review of several Arabic QASs. This is followed by an in-depth analysis of the techniques and approaches in the three modules of a QAS. We present an overview of important and recent tools that were developed to help the researchers in this field. We also cover the available Arabic and multilingual datasets, and a look at the different measures used to assess QASs. Finally, the survey delves into the future direction of Arabic QAS systems based on the current state-of-the-art techniques developed for question answering in other languages.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdelali A, Darwish K, Durrani N, Mubarak H (2016) Farasa: a fast and furious segmenter for Arabic. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Demonstrations, Association for Computational Linguistics. San Diego, California, pp 11–16
Abdelnasser H, Ragab M, Mohamed R, Mohamed A, Farouk B, El-Makky NM, Torki M (2014) Al-Bayan: an Arabic question answering system for the Holy Quran. In: Proceedings of the EMNLP 2014 workshop on Arabic natural language processing (ANLP), Association for Computational Linguistics. Doha, Qatar, pp 57–64
Abouenour L, Bouzoubaa K, Rosso P (2008) Improving Q/A using Arabic WordNet. In: International Arab conference on information technology (ACIT’2008)
Abouenour L, Bouzouba K, Rosso P (2010) An evaluated semantic query expansion and structure-based approach for enhancing Arabic question/answering. Int J Inf Commun Technol 3(3):37–51
Abouenour L, Bouzoubaa K, Rosso P (2012) IDRAAQ: New Arabic question answering system based on query expansion and passage retrieval. In: CLEF, (2012) conference and labs of the evaluation forum. Computer Science, Rome, Italy
Abouenour L, Bouzoubaa K, Rosso P (2013) On the evaluation and improvement of Arabic WordNet coverage and usability. Lang Resour Eval 47(3):891–917
Ahmed W, Anto B (2016) Answer extraction for how and why questions in question answering systems. Int J Comput Eng Res 06:18–22
Ahmed W, Ahmed A, Babu AP (2017a) Web-based Arabic question answering system using machine learning approach. Int J Adv Res Comput Sci 8(1)
Ahmed W, Babu D, Anto P (2017b) Question analysis for Arabic question answering systems. Int J Nat Lang Comput 5(6):21–30
Akour M, Abufardeh SO, Magel K, Al-Radaideh Q (2011) QArabPro: a rule based question answering system for reading comprehension tests in Arabic. Am J Appl Sci 8(6):652–661
Al-Anazi S, AlMahmoud H, Al-Turaiki I (2016) Finding similar documents using different clustering techniques. Proc Comput Sci 82(1):28–34
Al-Chalabi H, Ray S, Shaalan K (2015) Semantic based query expansion for Arabic question answering systems. In: 2015 first international conference on Arabic computational Linguistics (ACLing). IEEE, pp 127–132
AL-Khawaldeh FT (2019) Answer extraction for why Arabic questions answering systems: EWAQ. World Comput Sci Inf Technol J 5(5):82–86
Albarghothi A, Khater F, Shaalan K (2017) Arabic question answering using ontology. Proc Comput Sci 117:183–191
Alkhairy M, Jafri A, Smith DA (2020) Finite state machine pattern-root Arabic morphological generator, analyzer and diacritizer. In: Proceedings of The 12th language resources and evaluation conference. European Language Resources Association, pp 3834–3841
ALMarwi H, Ghurab M, Al-Baltah I (2020) A hybrid semantic query expansion approach for Arabic information retrieval. J Big Data 7(1):39:1-39:19
Almiman A, Osman N, Torki M (2020) Deep neural network approach for Arabic community question answering. Alexandria Eng J 59(6):4427–4434
Almuzaini HA, Azmi AM (2020) Impact of stemming and word embedding on deep learning-based Arabic text categorization. IEEE Access 8:127913–127928
Alsabbagh L, AlDakkak O, Ghneim N (2020) A new passage retrieval method in Arabic question answering systems. https://assets.researchsquare.com/files/rs-119562/v1_stamped.pdf. Accessed 26 April 2021
Antoun W, Baly F, Hajj H (2020) Arabert: transformer-based model for Arabic language understanding. arXiv preprint arXiv:200300104
Aouichat A, Guessoum A (2017) Building TALAA-AFAQ, a corpus of Arabic FActoid question-answers for a question answering system. In: International conference on applications of natural language to information systems, pp 380–386
Artetxe M, Ruder S, Yogatama D (2019) On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:191011856
Azmi AM, Aljafari EA (2018) Universal web accessibility and the challenge to integrate informal Arabic users: a case study. Univ Access Inf Soc 17(1):131–145
Azmi AM, Almajed RS (2015) A survey of automatic Arabic diacritization techniques. Nat Lang Eng 21(3):477
Azmi AM, Alsaiari A (2014) A calligraphic based scheme to justify Arabic text improving readability and comprehension. Comput Hum Behav 39:177–186
Azmi AM, Alshenaifi NA (2016) Answering Arabic why-questions: baseline vs. RST-based approach. ACM Trans Inf Syst (TOIS) 35(1):6:1-6:19
Azmi AM, Alshenaifi NA (2017) LEMAZA: an Arabic why-question answering system. Nat Lang Eng 23(6):877–903
Bakari W, Neji M (2020) A novel semantic and logical-based approach integrating RTE technique in the Arabic question-answering. Int J Speech Technol 1–17
Bakari W, Trigui O, Neji M (2014) Logic-based approach for improving Arabic question answering. In: 2014 IEEE international conference on computational intelligence and computing research, pp 1–6
Bakari W, Bellot P, Neji M (2016a) AQA-WebCorp: web-based factual questions for Arabic. Proc Comput Sci 96:275–284
Bakari W, Bellot P, Neji M (2016b) Researches and reviews in Arabic question answering: principal approaches and systems with classification. In: International Arab conference on information technology (ACIT ’16)
Bakari W, Bellot P, Neji M (2017) A logical representation of Arabic questions toward automatic passage extraction from the web. Int J Speech Technol 20(2):339–353
Bdour WN, Gharaibeh NK (2013) Development of yes/no Arabic question answering system. Int J Artif Intell Appl (IJAIA) 4(1):51–63
Bekhti S, Al-Harbi M (2013) AQuASys: a question-answering system for Arabic. In: Recent advances in applied computer science and digital services (WSEAS international conference—recent advances in computer engineering series), vol 12, pp 130–139
Ben-Sghaier M, Bakari W, Neji M (2017) An Arabic question-answering system combining a semantic and logical representation of texts. In: International conference on intelligent systems design and applications, pp 735–744
Benajiba Y, Rosso P, Benedíruiz JM (2007) Anersys: an Arabic named entity recognition system based on maximum entropy. In: International conference on intelligent text processing and computational Linguistics. Springer, pp 143–153
Bernstein P (2013) Searchyourcloud survey, it takes up to 8 attempts to find an accurate search result
Boudchiche M, Mazroui A, Bebah M Ould Abdallahi Ould, Lakhouaja A, Boudlal A (2017) AlKhalil Morpho Sys 2: a robust Arabic morpho-syntactic analyzer. J King Saud Univ Comput Inf Sci 29(2):141–146
Boudlal A, Lakhouaja A, Mazroui A, Meziane A, Bebah M, Shoul M (2010) Alkhalil morpho sys1: a morphosyntactic analysis system for Arabic texts. In: International Arab conference on information technology. Elsevier Science Inc New York, NY, pp 1–6D
Bounhas I, Soudani N, Slimani Y (2019) Building a morpho-semantic knowledge graph for Arabic information retrieval. Inf Process Manag 57(6):102–124
Brini W, Ellouze M, Mesfar S, Belguith LH (2009) An Arabic question-answering system for factoid questions. In: 2009 international conference on natural language processing and knowledge engineering. IEEE, pp 1–7
Buckwalter T (2002) Buckwalter Arabic morphological analyzer version 1.0. Linguistic Data Consortium. University of Pennsylvania
Cambria E, Poria S, Bisio F, Bajpai R, Chaturvedi I (2015) The CLSA model: a novel framework for concept-level sentiment analysis. In: LNCS, vol 9042. Springer, pp 3–22
Cambria E, Chattopadhyay A, Linn E, Mandal B, White B (2017) Storages are not forever. Cogn Comput 9:646–658
Chui M, Manyika J, Bughin J (2012) The social economy: unlocking value and productivity through social technologies. Technical report, McKinsey Global Institute
Clark JH, Choi E, Collins M, Garrette D, Kwiatkowski T, Nikolaev V, Palomaki J (2020) TyDi QA: a benchmark for information-seeking question answering in typologically diverse languages. arXiv preprint arXiv:200305002
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Diab M (2009) Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and base phrase chunking. In: 2nd international conference on Arabic language resources and tools, vol 110, p 198
El Adlouni Y, Rodríguez H, Meknassi M, El Alaoui SO, En-nahnahi N (2019) A multi-approach to community question answering. Expert Syst Appl 137:432–442
El-Affendi MA, Al-Tayeb M (2014) The SWAM Arabic morphological tagger: multilevel tagging and diacritization using lexicon driven morphotactics and viterbi. In: Proceedings on the international conference on artificial intelligence (ICAI 2014)
El-Affendi MA, Abuhaimed I, AlRajhi K (2020) A simple Galois Power-of-Two real time embedding scheme for performing Arabic morphology deep learning tasks. Egypt Inf J 22(1):35–43
Etaiwi W, Awajan A (2020) Graph-based Arabic text semantic representation. Inf Process Manag 57(3):102183
Ezzeldin AM, Shaheen M (2012) A survey of Arabic question answering: challenges, tasks, approaches, tools, and future trends. In: Proceedings of The 13th international Arab conference on information technology (ACIT 2012), pp 1–8
Fareed NS, Mousa HM, Elsisi AB (2014) Syntactic open domain Arabic question/answering system for factoid questions. In: The 9th international conference on informatics and systems (INFOS2014). IEEE, pp 1–9
Gaizauskas RJ, Humphreys K (2000) A combined IR/NLP approach to question answering against large text collections. In: Proceeding of the RIAO’00: content-based multimedia information access, pp 1288–1304
Ghwanmeh S, Kanaan G, Al-Shalabi R, Rabab’ah S (2009) Enhanced algorithm for extracting the root of Arabic words. In: 2009 sixth international conference on computer graphics, imaging and visualization, pp 388–391
Guo J, Fan Y, Pang L, Yang L, Ai Q, Zamani H, Wu C, Croft WB, Cheng X (2019) A deep look into neural ranking models for information retrieval. Inf Proc Manag 57(6):102067
Hamed SK, Ab Aziz MJ (2016) A question answering system on holy quran translation based on question expansion technique and neural network classification. J Comput Sci 12(3):169–177
Hammo B, Abu-Salem H, Lytinen SL, Evens M (2002) QARAB: a question answering system to support the Arabic language. In: Proceedings of the ACL-02 workshop on computational approaches to semitic languages. Association for Computational Linguistics, pp 1–11
Hamza A, En-Nahnahi N, Zidani KA, Ouatik SEA (2019) An Arabic question classification method based on new taxonomy and continuous distributed representation of words. J King Saud Univ Comput Inf Sci
Hamza A, En-Nahnahi N, Ouatik SEA (2020) Exploring contextual word representation for Arabic question classification. In: 2020 1st international conference on innovative research in applied science. Engineering and Technology (IRASET). IEEE, pp 1–5
Harman D (1991) How effective is suffixing? J Am Soc Inf Sci 42(1):7–15
Ismail WS, Homsi MN (2018) DAWQAS: a dataset for Arabic why question answering system. Proc Comput Sci 142:123–131
Joulin A, Grave E, Bojanowski P, Douze M, Jégou H, Mikolov T (2016) Fasttext.zip: compressing text classification models. arXiv preprint arXiv:161203651
Karpagam K, Saradha A (2019) A framework for intelligent question answering system using semantic context-specific document clustering and wordnet. Sādhanā 44(3):62
Khalifi H, Cherif W, El Qadi A, Ghanou Y (2019) Query expansion based on clustering and personalized information retrieval. Prog Artif Intell 8(2):241–251
Khoja S, Garside R (1999) Stemming Arabic text. Computing Department, Lancaster University, Lancaster
Kurdi H, Alkhaider S, Alfaifi N (2014) Development and evaluation of a web based question answering system for Arabic language. Comput Sci Inf Technol (CS&IT) 4(02):187–202
Lahbari I, Ouatik S, Zidani KA (2017a) Arabic question classification using machine learning approaches. In: The 18th international Arab conference on information technology (ACIT’2017)
Lahbari I, Ouatik SEA, Zidani KA (2017b) A rule-based method for Arabic question classification. In: 2017 international conference on wireless networks and mobile communications (WINCOM), pp 1–6
Lewis P, Oğuz B, Rinott R, Riedel S, Schwenk H (2019) MLQA: evaluating cross-lingual extractive question answering. arXiv preprint arXiv:191007475
Li X, Roth D (2002) Learning question classifiers. In: COLING 2002: the 19th international conference on computational Linguistics, pp 1–7
Lo SL, Cambria E, Chiong R, Cornforth D (2017) Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 48(4):499–527
Longpre S, Lu Y, Daiber J (2020) MKQA: a linguistically diverse benchmark for multilingual open domain question answering. arXiv:2007.15207
Malhas R, Elsayed T (2020) AyaTEC: building a reusable verse-based test collection for Arabic question answering on the Holy Qur’an. ACM Trans Asian Low Resour Lang Inf Process (TALLIP) 19(6):78:1-78:21
Mann WC, Thompson SA (1988) Rhetorical structure theory: toward a functional theory of text organization. Text 8(3):243–281
Marie-Sainte SL, Alalyani N, Alotaibi S, Ghouzali S, Abunadi I (2018) Arabic natural language processing and machine learning-based systems. IEEE Access 7:7011–7020
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations (ICLR)
Monti J, Monti MSJ (2015) Formalising natural languages with NooJ. Cambridge Scholars Publishing, Tyne
Mozannar H, Hajal KE, Maamary E, Hajj H (2019) Neural Arabic question answering. In: Proceedings of the 4th Arabic natural language processing workshop. Association for Computational Linguistics, Florence, Italy
Mustafa M, Eldeen AS, Bani-Ahmad S, Elfaki AO et al (2017) A comparative survey on Arabic stemming: approaches and challenges. Intell Inf Manag 9(02):39–67
Nabil M, Abdelmegied A, Ayman Y, Fathy A, Khairy G, Yousri M, El-Makky NM, Nagi K (2017) AlQuAnS-an Arabic language question answering system. In: Proceedings of the 9th international joint conference on knowledge engineering and knowledge management, pp 144–154
Nugaliyadde A, Wong KW, Sohel F, Xie H (2017) Reinforced memory network for question answering. In: International conference on neural information processing. Springer, pp 482–490
Obeid O, Zalmout N, Khalifa S, Taji D, Oudah M, Alhafni B, Inoue G, Eryani F, Erdmann A, Habash N (2020) CAMeL tools: an open source python toolkit for Arabic natural language processing. In: Proceedings of The 12th language resources and evaluation conference. European Language Resources Association. Marseille, pp 7022–7032
Othman N, Faiz R, Smaïli K (2019) Enhancing question retrieval in community question answering using word embeddings. Proc Comput Sci 159:485–494
Ouahrani L, Bennouar D (2020) AR-ASAG an Arabic dataset for automatic short answer grading evaluation. In: Proceedings of The 12th conference language resources and evaluation conference (LREC 2020), pp 2634–2643
Oueslati O, Cambria E, HajHmida MB, Ounelli H (2020) A review of sentiment analysis research in Arabic language. Future Gener Comput Syst 112:408–430
Pasha A, Al-Badrashiny M, Diab MT, El Kholy A, Eskander R, Habash N, Pooleery M, Rambow O, Roth R (2014) Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: Lrec. European Language Resources Association. Reykjavik, Iceland, vol 14, pp 1094–1101
Peñas A, Rodrigo A (2011) A simple measure to assess non-response. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies—volume 1, pp 1415–1424
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics. Doha, Qatar, pp 1532–1543
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics. Association for Computational Linguistics, pp 2227–2237
Ray SK, Shaalan K (2016) A review and future perspectives of Arabic question answering systems. IEEE Trans Knowl Data Eng 28(12):3169–3190
Romeo S, Da San Martino G, Belinkov Y, Barrón-Cedeño A, Eldesouki M, Darwish K, Mubarak H, Glass J, Moschitti A (2019) Language processing and learning models for community question answering in Arabic. Inf Process Manag 56(2):274–290
Roul RK, Sahay SK (2012) An effective web document clustering for information retrieval. arXiv preprint arXiv:12111107
Saad MK, Ashour WM (2010) OSAC: Open source Arabic corpora. In: 6th international conference on electrical and computer systems. European University of Lefke. Lefke, North Cyprus, vol 10, pp 25–26
Sadek J, Meziane F (2016) A discourse-based approach for Arabic question answering. ACM Trans Asian Low Resour Lang Inf Proc (TALLIP) 16(2):1–18
Samy H, Hassanein EE, Shaalan K (2019) Arabic question answering: a study on challenges, systems, and techniques. Int J Comput Appl 181(44):6–14
Sheker M, Saad S, Abood R, Shakir M (2016) Domain-specific ontology-based approach for Arabic question answering. J Theor Appl Inf Technol 83(1)
Shrestha P, Jacquin C, Daille B (2012) Clustering short text and its evaluation. In: International conference on intelligent text processing and computational Linguistics. Springer, pp 169–180
Soares MAC, Parreiras FS (2020) A literature review on question answering techniques, paradigms and systems. J King Saud Univ Comput Inf Sci 32(6):635–646
Soliman A (2017) Arabic Q&A dataset. http://xminers.club/2017/07/22/Arabic-qa-dataset/
Soliman AB, Eissa K, El-Beltagy SR (2017) Aravec: a set of Arabic word embedding models for use in Arabic NLP. Proc Comput Sci 117:256–265
Souteh Y, Bouzoubaa K (2011) SAFAR platform and its morphological layer. In: Proceeding of the eleventh conference on language engineering ESOLEC, pp 14–15
Taghva K, Elkhoury R, Coombs J (2005) Arabic stemming without a root dictionary. In: Int Conf Inf Technol Coding Comput (ITCC ’05), vol 2, pp 152–157
Trigui O, Belguith LH, Rosso P (2010) DefArabicQA: Arabic definition question answering system. In: Workshop on language resources and human language technologies for semitic languages, 7th LREC. Valletta, Malta, pp 40–45
Vilares D, Peng H, Satapathy R, Cambria E (2018) BabelSenticNet: a commonsense reasoning framework for multilingual sentiment analysis. In: IEEE symposium series on computational intelligence (SSCI), pp 1292–1298
Wong JL (2019) Real world evidence collaboration and convergence for change: big data, digital and tech—and real world applications and implications for industry
Xiong C, Merity S, Socher R (2016a) Dynamic memory networks for visual and textual question answering. In: International conference on machine learning, pp 2397–2406
Xiong C, Zhong V, Socher R (2016b) Dynamic coattention networks for question answering. In: International conference on learning representations (ICLR)
Yu AW, Dohan D, Luong MT, Zhao R, Chen K, Norouzi M, Le QV (2018) QANet: combining local convolution with global self-attention for reading comprehension. arXiv:1804.09541
Zemirli Z, Elhadj YOM (2012) Morphar+ an Arabic morphosyntactic analyzer. In: Proceedings of the international conference on advances in computing, communications and informatics, pp 816–823
Acknowledgements
This work was funded by the Deanship of Scientific Research at King Saud University through research group no. RG-1441-332, for which the authors are thankful.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Alwaneen, T.H., Azmi, A.M., Aboalsamh, H.A. et al. Arabic question answering system: a survey. Artif Intell Rev 55, 207–253 (2022). https://doi.org/10.1007/s10462-021-10031-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-021-10031-1