Skip to main content
Log in

Arabic question answering system: a survey

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Question answering is a subfield of information retrieval. It is a task of answering a question posted in a natural language. A question answering system (QAS) may be considered a good alternative to search engines that return a set of related documents. The QAS system is composed of three main modules; question analysis, passage retrieval, and answer extraction. Over the years, numerous QASs have been presented for use in different languages. However, the the development of Arabic QASs has been slowed by linguistic challenges and the lack of resources and tools available to researchers. In this survey, we start with the challenges due to the language and how these challenges make the development of new Arabic QAS more difficult. Next, we do a detailed review of several Arabic QASs. This is followed by an in-depth analysis of the techniques and approaches in the three modules of a QAS. We present an overview of important and recent tools that were developed to help the researchers in this field. We also cover the available Arabic and multilingual datasets, and a look at the different measures used to assess QASs. Finally, the survey delves into the future direction of Arabic QAS systems based on the current state-of-the-art techniques developed for question answering in other languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.cs.man.ac.uk/~ramsay/ArabicTE/.

  2. https://rajpurkar.github.io/SQuAD-explorer/.

  3. https://stackoverflow.com.

  4. https://www.quora.com.

  5. https://uima.apache.org.

  6. http://corpus.quran.com/download/.

References

  • Abdelali A, Darwish K, Durrani N, Mubarak H (2016) Farasa: a fast and furious segmenter for Arabic. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Demonstrations, Association for Computational Linguistics. San Diego, California, pp 11–16

  • Abdelnasser H, Ragab M, Mohamed R, Mohamed A, Farouk B, El-Makky NM, Torki M (2014) Al-Bayan: an Arabic question answering system for the Holy Quran. In: Proceedings of the EMNLP 2014 workshop on Arabic natural language processing (ANLP), Association for Computational Linguistics. Doha, Qatar, pp 57–64

  • Abouenour L, Bouzoubaa K, Rosso P (2008) Improving Q/A using Arabic WordNet. In: International Arab conference on information technology (ACIT’2008)

  • Abouenour L, Bouzouba K, Rosso P (2010) An evaluated semantic query expansion and structure-based approach for enhancing Arabic question/answering. Int J Inf Commun Technol 3(3):37–51

    Google Scholar 

  • Abouenour L, Bouzoubaa K, Rosso P (2012) IDRAAQ: New Arabic question answering system based on query expansion and passage retrieval. In: CLEF, (2012) conference and labs of the evaluation forum. Computer Science, Rome, Italy

  • Abouenour L, Bouzoubaa K, Rosso P (2013) On the evaluation and improvement of Arabic WordNet coverage and usability. Lang Resour Eval 47(3):891–917

    Article  Google Scholar 

  • Ahmed W, Anto B (2016) Answer extraction for how and why questions in question answering systems. Int J Comput Eng Res 06:18–22

    Google Scholar 

  • Ahmed W, Ahmed A, Babu AP (2017a) Web-based Arabic question answering system using machine learning approach. Int J Adv Res Comput Sci 8(1)

  • Ahmed W, Babu D, Anto P (2017b) Question analysis for Arabic question answering systems. Int J Nat Lang Comput 5(6):21–30

    Article  Google Scholar 

  • Akour M, Abufardeh SO, Magel K, Al-Radaideh Q (2011) QArabPro: a rule based question answering system for reading comprehension tests in Arabic. Am J Appl Sci 8(6):652–661

    Article  Google Scholar 

  • Al-Anazi S, AlMahmoud H, Al-Turaiki I (2016) Finding similar documents using different clustering techniques. Proc Comput Sci 82(1):28–34

    Article  Google Scholar 

  • Al-Chalabi H, Ray S, Shaalan K (2015) Semantic based query expansion for Arabic question answering systems. In: 2015 first international conference on Arabic computational Linguistics (ACLing). IEEE, pp 127–132

  • AL-Khawaldeh FT (2019) Answer extraction for why Arabic questions answering systems: EWAQ. World Comput Sci Inf Technol J 5(5):82–86

    Google Scholar 

  • Albarghothi A, Khater F, Shaalan K (2017) Arabic question answering using ontology. Proc Comput Sci 117:183–191

    Article  Google Scholar 

  • Alkhairy M, Jafri A, Smith DA (2020) Finite state machine pattern-root Arabic morphological generator, analyzer and diacritizer. In: Proceedings of The 12th language resources and evaluation conference. European Language Resources Association, pp 3834–3841

  • ALMarwi H, Ghurab M, Al-Baltah I (2020) A hybrid semantic query expansion approach for Arabic information retrieval. J Big Data 7(1):39:1-39:19

    Article  Google Scholar 

  • Almiman A, Osman N, Torki M (2020) Deep neural network approach for Arabic community question answering. Alexandria Eng J 59(6):4427–4434

    Article  Google Scholar 

  • Almuzaini HA, Azmi AM (2020) Impact of stemming and word embedding on deep learning-based Arabic text categorization. IEEE Access 8:127913–127928

    Article  Google Scholar 

  • Alsabbagh L, AlDakkak O, Ghneim N (2020) A new passage retrieval method in Arabic question answering systems. https://assets.researchsquare.com/files/rs-119562/v1_stamped.pdf. Accessed 26 April 2021

  • Antoun W, Baly F, Hajj H (2020) Arabert: transformer-based model for Arabic language understanding. arXiv preprint arXiv:200300104

  • Aouichat A, Guessoum A (2017) Building TALAA-AFAQ, a corpus of Arabic FActoid question-answers for a question answering system. In: International conference on applications of natural language to information systems, pp 380–386

  • Artetxe M, Ruder S, Yogatama D (2019) On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:191011856

  • Azmi AM, Aljafari EA (2018) Universal web accessibility and the challenge to integrate informal Arabic users: a case study. Univ Access Inf Soc 17(1):131–145

    Article  Google Scholar 

  • Azmi AM, Almajed RS (2015) A survey of automatic Arabic diacritization techniques. Nat Lang Eng 21(3):477

    Article  Google Scholar 

  • Azmi AM, Alsaiari A (2014) A calligraphic based scheme to justify Arabic text improving readability and comprehension. Comput Hum Behav 39:177–186

    Article  Google Scholar 

  • Azmi AM, Alshenaifi NA (2016) Answering Arabic why-questions: baseline vs. RST-based approach. ACM Trans Inf Syst (TOIS) 35(1):6:1-6:19

    Google Scholar 

  • Azmi AM, Alshenaifi NA (2017) LEMAZA: an Arabic why-question answering system. Nat Lang Eng 23(6):877–903

    Article  Google Scholar 

  • Bakari W, Neji M (2020) A novel semantic and logical-based approach integrating RTE technique in the Arabic question-answering. Int J Speech Technol 1–17

  • Bakari W, Trigui O, Neji M (2014) Logic-based approach for improving Arabic question answering. In: 2014 IEEE international conference on computational intelligence and computing research, pp 1–6

  • Bakari W, Bellot P, Neji M (2016a) AQA-WebCorp: web-based factual questions for Arabic. Proc Comput Sci 96:275–284

    Article  Google Scholar 

  • Bakari W, Bellot P, Neji M (2016b) Researches and reviews in Arabic question answering: principal approaches and systems with classification. In: International Arab conference on information technology (ACIT ’16)

  • Bakari W, Bellot P, Neji M (2017) A logical representation of Arabic questions toward automatic passage extraction from the web. Int J Speech Technol 20(2):339–353

    Article  Google Scholar 

  • Bdour WN, Gharaibeh NK (2013) Development of yes/no Arabic question answering system. Int J Artif Intell Appl (IJAIA) 4(1):51–63

    Google Scholar 

  • Bekhti S, Al-Harbi M (2013) AQuASys: a question-answering system for Arabic. In: Recent advances in applied computer science and digital services (WSEAS international conference—recent advances in computer engineering series), vol 12, pp 130–139

  • Ben-Sghaier M, Bakari W, Neji M (2017) An Arabic question-answering system combining a semantic and logical representation of texts. In: International conference on intelligent systems design and applications, pp 735–744

  • Benajiba Y, Rosso P, Benedíruiz JM (2007) Anersys: an Arabic named entity recognition system based on maximum entropy. In: International conference on intelligent text processing and computational Linguistics. Springer, pp 143–153

  • Bernstein P (2013) Searchyourcloud survey, it takes up to 8 attempts to find an accurate search result

  • Boudchiche M, Mazroui A, Bebah M Ould Abdallahi Ould, Lakhouaja A, Boudlal A (2017) AlKhalil Morpho Sys 2: a robust Arabic morpho-syntactic analyzer. J King Saud Univ Comput Inf Sci 29(2):141–146

    Google Scholar 

  • Boudlal A, Lakhouaja A, Mazroui A, Meziane A, Bebah M, Shoul M (2010) Alkhalil morpho sys1: a morphosyntactic analysis system for Arabic texts. In: International Arab conference on information technology. Elsevier Science Inc New York, NY, pp 1–6D

  • Bounhas I, Soudani N, Slimani Y (2019) Building a morpho-semantic knowledge graph for Arabic information retrieval. Inf Process Manag 57(6):102–124

    Google Scholar 

  • Brini W, Ellouze M, Mesfar S, Belguith LH (2009) An Arabic question-answering system for factoid questions. In: 2009 international conference on natural language processing and knowledge engineering. IEEE, pp 1–7

  • Buckwalter T (2002) Buckwalter Arabic morphological analyzer version 1.0. Linguistic Data Consortium. University of Pennsylvania

  • Cambria E, Poria S, Bisio F, Bajpai R, Chaturvedi I (2015) The CLSA model: a novel framework for concept-level sentiment analysis. In: LNCS, vol 9042. Springer, pp 3–22

  • Cambria E, Chattopadhyay A, Linn E, Mandal B, White B (2017) Storages are not forever. Cogn Comput 9:646–658

    Article  Google Scholar 

  • Chui M, Manyika J, Bughin J (2012) The social economy: unlocking value and productivity through social technologies. Technical report, McKinsey Global Institute

  • Clark JH, Choi E, Collins M, Garrette D, Kwiatkowski T, Nikolaev V, Palomaki J (2020) TyDi QA: a benchmark for information-seeking question answering in typologically diverse languages. arXiv preprint arXiv:200305002

  • Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  • Diab M (2009) Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and base phrase chunking. In: 2nd international conference on Arabic language resources and tools, vol 110, p 198

  • El Adlouni Y, Rodríguez H, Meknassi M, El Alaoui SO, En-nahnahi N (2019) A multi-approach to community question answering. Expert Syst Appl 137:432–442

    Article  Google Scholar 

  • El-Affendi MA, Al-Tayeb M (2014) The SWAM Arabic morphological tagger: multilevel tagging and diacritization using lexicon driven morphotactics and viterbi. In: Proceedings on the international conference on artificial intelligence (ICAI 2014)

  • El-Affendi MA, Abuhaimed I, AlRajhi K (2020) A simple Galois Power-of-Two real time embedding scheme for performing Arabic morphology deep learning tasks. Egypt Inf J 22(1):35–43

    Google Scholar 

  • Etaiwi W, Awajan A (2020) Graph-based Arabic text semantic representation. Inf Process Manag 57(3):102183

    Article  Google Scholar 

  • Ezzeldin AM, Shaheen M (2012) A survey of Arabic question answering: challenges, tasks, approaches, tools, and future trends. In: Proceedings of The 13th international Arab conference on information technology (ACIT 2012), pp 1–8

  • Fareed NS, Mousa HM, Elsisi AB (2014) Syntactic open domain Arabic question/answering system for factoid questions. In: The 9th international conference on informatics and systems (INFOS2014). IEEE, pp 1–9

  • Gaizauskas RJ, Humphreys K (2000) A combined IR/NLP approach to question answering against large text collections. In: Proceeding of the RIAO’00: content-based multimedia information access, pp 1288–1304

  • Ghwanmeh S, Kanaan G, Al-Shalabi R, Rabab’ah S (2009) Enhanced algorithm for extracting the root of Arabic words. In: 2009 sixth international conference on computer graphics, imaging and visualization, pp 388–391

  • Guo J, Fan Y, Pang L, Yang L, Ai Q, Zamani H, Wu C, Croft WB, Cheng X (2019) A deep look into neural ranking models for information retrieval. Inf Proc Manag 57(6):102067

    Article  Google Scholar 

  • Hamed SK, Ab Aziz MJ (2016) A question answering system on holy quran translation based on question expansion technique and neural network classification. J Comput Sci 12(3):169–177

    Article  Google Scholar 

  • Hammo B, Abu-Salem H, Lytinen SL, Evens M (2002) QARAB: a question answering system to support the Arabic language. In: Proceedings of the ACL-02 workshop on computational approaches to semitic languages. Association for Computational Linguistics, pp 1–11

  • Hamza A, En-Nahnahi N, Zidani KA, Ouatik SEA (2019) An Arabic question classification method based on new taxonomy and continuous distributed representation of words. J King Saud Univ Comput Inf Sci

  • Hamza A, En-Nahnahi N, Ouatik SEA (2020) Exploring contextual word representation for Arabic question classification. In: 2020 1st international conference on innovative research in applied science. Engineering and Technology (IRASET). IEEE, pp 1–5

  • Harman D (1991) How effective is suffixing? J Am Soc Inf Sci 42(1):7–15

    Article  Google Scholar 

  • Ismail WS, Homsi MN (2018) DAWQAS: a dataset for Arabic why question answering system. Proc Comput Sci 142:123–131

    Article  Google Scholar 

  • Joulin A, Grave E, Bojanowski P, Douze M, Jégou H, Mikolov T (2016) Fasttext.zip: compressing text classification models. arXiv preprint arXiv:161203651

  • Karpagam K, Saradha A (2019) A framework for intelligent question answering system using semantic context-specific document clustering and wordnet. Sādhanā 44(3):62

    Article  Google Scholar 

  • Khalifi H, Cherif W, El Qadi A, Ghanou Y (2019) Query expansion based on clustering and personalized information retrieval. Prog Artif Intell 8(2):241–251

    Article  Google Scholar 

  • Khoja S, Garside R (1999) Stemming Arabic text. Computing Department, Lancaster University, Lancaster

    Google Scholar 

  • Kurdi H, Alkhaider S, Alfaifi N (2014) Development and evaluation of a web based question answering system for Arabic language. Comput Sci Inf Technol (CS&IT) 4(02):187–202

    Google Scholar 

  • Lahbari I, Ouatik S, Zidani KA (2017a) Arabic question classification using machine learning approaches. In: The 18th international Arab conference on information technology (ACIT’2017)

  • Lahbari I, Ouatik SEA, Zidani KA (2017b) A rule-based method for Arabic question classification. In: 2017 international conference on wireless networks and mobile communications (WINCOM), pp 1–6

  • Lewis P, Oğuz B, Rinott R, Riedel S, Schwenk H (2019) MLQA: evaluating cross-lingual extractive question answering. arXiv preprint arXiv:191007475

  • Li X, Roth D (2002) Learning question classifiers. In: COLING 2002: the 19th international conference on computational Linguistics, pp 1–7

  • Lo SL, Cambria E, Chiong R, Cornforth D (2017) Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 48(4):499–527

    Article  Google Scholar 

  • Longpre S, Lu Y, Daiber J (2020) MKQA: a linguistically diverse benchmark for multilingual open domain question answering. arXiv:2007.15207

  • Malhas R, Elsayed T (2020) AyaTEC: building a reusable verse-based test collection for Arabic question answering on the Holy Qur’an. ACM Trans Asian Low Resour Lang Inf Process (TALLIP) 19(6):78:1-78:21

    Google Scholar 

  • Mann WC, Thompson SA (1988) Rhetorical structure theory: toward a functional theory of text organization. Text 8(3):243–281

    Google Scholar 

  • Marie-Sainte SL, Alalyani N, Alotaibi S, Ghouzali S, Abunadi I (2018) Arabic natural language processing and machine learning-based systems. IEEE Access 7:7011–7020

    Article  Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations (ICLR)

  • Monti J, Monti MSJ (2015) Formalising natural languages with NooJ. Cambridge Scholars Publishing, Tyne

    Google Scholar 

  • Mozannar H, Hajal KE, Maamary E, Hajj H (2019) Neural Arabic question answering. In: Proceedings of the 4th Arabic natural language processing workshop. Association for Computational Linguistics, Florence, Italy

  • Mustafa M, Eldeen AS, Bani-Ahmad S, Elfaki AO et al (2017) A comparative survey on Arabic stemming: approaches and challenges. Intell Inf Manag 9(02):39–67

    Google Scholar 

  • Nabil M, Abdelmegied A, Ayman Y, Fathy A, Khairy G, Yousri M, El-Makky NM, Nagi K (2017) AlQuAnS-an Arabic language question answering system. In: Proceedings of the 9th international joint conference on knowledge engineering and knowledge management, pp 144–154

  • Nugaliyadde A, Wong KW, Sohel F, Xie H (2017) Reinforced memory network for question answering. In: International conference on neural information processing. Springer, pp 482–490

  • Obeid O, Zalmout N, Khalifa S, Taji D, Oudah M, Alhafni B, Inoue G, Eryani F, Erdmann A, Habash N (2020) CAMeL tools: an open source python toolkit for Arabic natural language processing. In: Proceedings of The 12th language resources and evaluation conference. European Language Resources Association. Marseille, pp 7022–7032

  • Othman N, Faiz R, Smaïli K (2019) Enhancing question retrieval in community question answering using word embeddings. Proc Comput Sci 159:485–494

    Article  Google Scholar 

  • Ouahrani L, Bennouar D (2020) AR-ASAG an Arabic dataset for automatic short answer grading evaluation. In: Proceedings of The 12th conference language resources and evaluation conference (LREC 2020), pp 2634–2643

  • Oueslati O, Cambria E, HajHmida MB, Ounelli H (2020) A review of sentiment analysis research in Arabic language. Future Gener Comput Syst 112:408–430

    Article  Google Scholar 

  • Pasha A, Al-Badrashiny M, Diab MT, El Kholy A, Eskander R, Habash N, Pooleery M, Rambow O, Roth R (2014) Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: Lrec. European Language Resources Association. Reykjavik, Iceland, vol 14, pp 1094–1101

  • Peñas A, Rodrigo A (2011) A simple measure to assess non-response. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies—volume 1, pp 1415–1424

  • Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics. Doha, Qatar, pp 1532–1543

  • Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics. Association for Computational Linguistics, pp 2227–2237

  • Ray SK, Shaalan K (2016) A review and future perspectives of Arabic question answering systems. IEEE Trans Knowl Data Eng 28(12):3169–3190

    Article  Google Scholar 

  • Romeo S, Da San Martino G, Belinkov Y, Barrón-Cedeño A, Eldesouki M, Darwish K, Mubarak H, Glass J, Moschitti A (2019) Language processing and learning models for community question answering in Arabic. Inf Process Manag 56(2):274–290

    Article  Google Scholar 

  • Roul RK, Sahay SK (2012) An effective web document clustering for information retrieval. arXiv preprint arXiv:12111107

  • Saad MK, Ashour WM (2010) OSAC: Open source Arabic corpora. In: 6th international conference on electrical and computer systems. European University of Lefke. Lefke, North Cyprus, vol 10, pp 25–26

  • Sadek J, Meziane F (2016) A discourse-based approach for Arabic question answering. ACM Trans Asian Low Resour Lang Inf Proc (TALLIP) 16(2):1–18

    Google Scholar 

  • Samy H, Hassanein EE, Shaalan K (2019) Arabic question answering: a study on challenges, systems, and techniques. Int J Comput Appl 181(44):6–14

    Google Scholar 

  • Sheker M, Saad S, Abood R, Shakir M (2016) Domain-specific ontology-based approach for Arabic question answering. J Theor Appl Inf Technol 83(1)

  • Shrestha P, Jacquin C, Daille B (2012) Clustering short text and its evaluation. In: International conference on intelligent text processing and computational Linguistics. Springer, pp 169–180

  • Soares MAC, Parreiras FS (2020) A literature review on question answering techniques, paradigms and systems. J King Saud Univ Comput Inf Sci 32(6):635–646

    Google Scholar 

  • Soliman A (2017) Arabic Q&A dataset. http://xminers.club/2017/07/22/Arabic-qa-dataset/

  • Soliman AB, Eissa K, El-Beltagy SR (2017) Aravec: a set of Arabic word embedding models for use in Arabic NLP. Proc Comput Sci 117:256–265

    Article  Google Scholar 

  • Souteh Y, Bouzoubaa K (2011) SAFAR platform and its morphological layer. In: Proceeding of the eleventh conference on language engineering ESOLEC, pp 14–15

  • Taghva K, Elkhoury R, Coombs J (2005) Arabic stemming without a root dictionary. In: Int Conf Inf Technol Coding Comput (ITCC ’05), vol 2, pp 152–157

  • Trigui O, Belguith LH, Rosso P (2010) DefArabicQA: Arabic definition question answering system. In: Workshop on language resources and human language technologies for semitic languages, 7th LREC. Valletta, Malta, pp 40–45

  • Vilares D, Peng H, Satapathy R, Cambria E (2018) BabelSenticNet: a commonsense reasoning framework for multilingual sentiment analysis. In: IEEE symposium series on computational intelligence (SSCI), pp 1292–1298

  • Wong JL (2019) Real world evidence collaboration and convergence for change: big data, digital and tech—and real world applications and implications for industry

  • Xiong C, Merity S, Socher R (2016a) Dynamic memory networks for visual and textual question answering. In: International conference on machine learning, pp 2397–2406

  • Xiong C, Zhong V, Socher R (2016b) Dynamic coattention networks for question answering. In: International conference on learning representations (ICLR)

  • Yu AW, Dohan D, Luong MT, Zhao R, Chen K, Norouzi M, Le QV (2018) QANet: combining local convolution with global self-attention for reading comprehension. arXiv:1804.09541

  • Zemirli Z, Elhadj YOM (2012) Morphar+ an Arabic morphosyntactic analyzer. In: Proceedings of the international conference on advances in computing, communications and informatics, pp 816–823

Download references

Acknowledgements

This work was funded by the Deanship of Scientific Research at King Saud University through research group no. RG-1441-332, for which the authors are thankful.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aqil M. Azmi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alwaneen, T.H., Azmi, A.M., Aboalsamh, H.A. et al. Arabic question answering system: a survey. Artif Intell Rev 55, 207–253 (2022). https://doi.org/10.1007/s10462-021-10031-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-10031-1

Keywords

Navigation