skip to main content
research-article

AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an

Published: 02 October 2020 Publication History

Abstract

The absence of publicly available reusable test collections for Arabic question answering on the Holy Qur’an has impeded the possibility of fairly comparing the performance of systems in that domain. In this article, we introduce AyaTEC, a reusable test collection for verse-based question answering on the Holy Qur’an, which serves as a common experimental testbed for this task. AyaTEC includes 207 questions (with their corresponding 1,762 answers) covering 11 topic categories of the Holy Qur’an that target the information needs of both curious and skeptical users. To the best of our effort, the answers to the questions (each represented as a sequence of verses) in AyaTEC were exhaustive—that is, all qur’anic verses that directly answered the questions were exhaustively extracted and annotated. To facilitate the use of AyaTEC in evaluating the systems designed for that task, we propose several evaluation measures to support the different types of questions and the nature of verse-based answers while integrating the concept of partial matching of answers in the evaluation.

References

[1]
Heba Abdelnasser, Maha Ragab, Reham Mohamed, Alaa Mohamed, Bassant Farouk, Nagwa El-Makky, and Marwan Torki. 2014. Al-Bayan: An Arabic question answering system for the Holy Quran. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’14). 57--64. http://www.aclweb.org/anthology/W14-3607.
[2]
Fatimah Dato Ahmad. 1995. A Malay Language Document Retrieval System: An Experimental Approach and Analysis. UKM, Bangi.
[3]
M. Alrabiah, A. Al-Salman, E. S. Atwell, and Nawal Alhelewh. 2014. KSUCCA: A key to exploring Arabic historical linguistics. International Journal of Computational Linguistics 5, 2 (2014), 27--36.
[4]
Eric Atwell, Nizar Habash, Bill Louw, Bayan Abu Shawar, Tony McEnery, Wajdi Zaghouani, and Mahmoud El-Haj. 2010. Understanding the Quran: A new grand challenge for computer science and artificial intelligence. In Proceedings of the Conference on Grand Challenges in Computing Research (GCCR’10).
[5]
Yonatan Belinkov, Alexander Magidow, Alberto Barrón-Cedeño, Avi Shmidman, and Maxim Romanov. 2019. Studying the history of the Arabic language: Language technology and a large-scale historical corpus. Language Resources and Evaluation 53 (2019), 771--805.
[6]
Hoa Trang Dang, Diane Kelly, and Jimmy Lin. 2007. Overview of the TREC 2007 question answering track. In Proceedings of the 15th Text REtrieval Conference (TREC’07).
[7]
Hoa Trang Dang, Jimmy Lin, and Diane Kelly. 2006. Overview of the TREC 2006 question answering track. In Proceedings of the 14th Text REtrieval Conference (TREC’06).
[8]
Aimad Hakkoum and Said Raghay. 2016. Semantic Q8A system on the Quran. Arabian Journal for Science and Engineering 41, 12 (Dec. 2016), 5205--5214.
[9]
M. A. Hamdelsayed and E. S. Atwell. 2016. Islamic applications of automatic question-answering. Journal of Engineering and Computer Science 17, 2 (2016), 51--57.
[10]
Mohamed Adany Hamdelsayed and E. S. Atwell. 2016. Using Arabic numbers (singular, dual, and plurals) patterns to enhance question answering system results. In Proceedings of the 4th International Conference on Islamic Applications in Computer Science and Technologies (IMAN’16).
[11]
Mohamed Adany Hamdelsayed, Ebtihal Mustafa Elamin Mohamed, MohamedAlmoayed TajAlsir Mohamed Saeed, Abakr Musa Ai, Edress Babiker Edress Mohamed Mhmoud, Maha Ali Mahmoud, Ahmed Shamat, and Eric Atwell. 2017. Islamic application of question answering systems: Comparative study. Journal of Advanced Computer Science and Technology Research 7, 1 (2017), 29--41.
[12]
Suhaib Kh Hamed and Mohd Juzaiddin Ab Aziz. 2016. A question answering system on Holy Quran translation based on question expansion technique and neural network classification. Journal of Computer Science 12, 3 (2016), 169--177.
[13]
Bothaina Hamoud and Eric Atwell. 2016. Using an Islamic question and answer knowledge base to answer questions about the Holy Quran. International Journal on Islamic Applications in Computer Science And Technology 4, 4 (2016), 20--29.
[14]
Bothaina Hamoud and Eric Atwell. 2017. Evaluation corpus for restricted-domain question-answering systems for the Holy Quran. International Journal of Science and Research 6, 8 (2017), 1133--1138.
[15]
Clive Holes. 2004. Modern Arabic: Structures, Functions, and Varieties. Georgetown University Press.
[16]
Aisha Jilani. 2013. Parallel Corpus Multi Stream Question Answering with Applications to the Qu’ran. Ph.D. Dissertation. University of Huddersfield.
[17]
J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159--174.
[18]
Jimmy Lin and Boris Katz. 2006. Building a reusable test collection for question answering. Journal of the American Society for Information Science and Technology 57, 7 (2006), 851--861.
[19]
Karim Ouda. 2015. QuranAnalysis: A Semantic Search and Intelligence System for the Quran. Ph.D. Dissertation. University of Leeds, Leeds, UK.
[20]
Hamed Zakeri Rad, Sabrina Tiun, and Saidah Saad. 2018. Lexical scoring system of lexical chain for quranic document retrieval. GEMA Online® Journal of Language Studies 18, 2 (2018), 59--79.
[21]
Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 784--789. https://www.aclweb.org/anthology/papers/P/P18/P18-2124/.
[22]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383--2392.
[23]
Abdul-Baquee M. Sharaf and Eric Atwell. 2012. QurAna: Corpus of the Quran annotated with pronominal anaphora. In Proceedings of the 8th Conference on International Language Resources and Evaluation (LREC’12). 130--137.
[24]
H. Shmeisani, S. Tartir, A. Al-Na’ssaan, and M. Naji. 2014. Semantically answering questions from the Holy Quran. In Proceedings of the 2nd International Conference on Islamic Applications in Computer Science and Technology. 1--8.
[25]
Julius Sim and Chris C. Wright. 2005. The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy 85, 3 (2005), 257--268.
[26]
Ellen M. Voorhees. 2003. Overview of the TREC 2003 question answering track. In Proceedings of the 11th Text REtrieval Conference (TREC’03).
[27]
Ellen M. Voorhees. 2004. Overview of the TREC 2004 question answering track. In Proceedings of the 12th Text REtrieval Conference (TREC’04). 54--68.
[28]
Ellen M. Voorhees and Hoa Trang Dang. 2005. Overview of the TREC 2005 question answering track. In Proceedings of the 13th Text REtrieval Conference (TREC’05). 52--62.
[29]
Ellen M. Voorhees and Dawn M. Tice. 2000. Building a question answering test collection. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 200--207.
[30]
Aliyu Rufai Yauri, Rabiah Abdul Kadir, Azreen Azman, and M. A. Azmi Murad. 2013. Quranic verse extraction base on concepts using OWL-DL ontology. Research Journal of Applied Sciences, Engineering and Technology 6, 23 (2013), 4492--4498.

Cited By

View all
  • (2024)Using the Retrieval-Augmented Generation Technique to Improve the Performance of GPT-4 in Answering Quran Questions2024 6th International Conference on Natural Language Processing (ICNLP)10.1109/ICNLP60986.2024.10692797(377-381)Online publication date: 22-Mar-2024
  • (2024)Decoding Queries: An In-Depth Survey of Quality Techniques for Question Analysis in Arabic Question Answering SystemsIEEE Access10.1109/ACCESS.2024.345846612(135241-135264)Online publication date: 2024
  • (2024)Weight Averaging and re-adjustment ensemble for QRCDJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10203736:4(102037)Online publication date: Apr-2024
  • Show More Cited By

Index Terms

  1. AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 6
        November 2020
        277 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3426881
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 02 October 2020
        Accepted: 01 May 2020
        Revised: 01 March 2020
        Received: 01 October 2019
        Published in TALLIP Volume 19, Issue 6

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Classical Arabic
        2. evaluation

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • Qatar University through

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)38
        • Downloads (Last 6 weeks)7
        Reflects downloads up to 17 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Using the Retrieval-Augmented Generation Technique to Improve the Performance of GPT-4 in Answering Quran Questions2024 6th International Conference on Natural Language Processing (ICNLP)10.1109/ICNLP60986.2024.10692797(377-381)Online publication date: 22-Mar-2024
        • (2024)Decoding Queries: An In-Depth Survey of Quality Techniques for Question Analysis in Arabic Question Answering SystemsIEEE Access10.1109/ACCESS.2024.345846612(135241-135264)Online publication date: 2024
        • (2024)Weight Averaging and re-adjustment ensemble for QRCDJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10203736:4(102037)Online publication date: Apr-2024
        • (2024)Improving ranking-based question answering with weak supervision for low-resource Qur’anic textsArtificial Intelligence Review10.1007/s10462-024-10964-358:1Online publication date: 14-Nov-2024
        • (2024)Towards an Open Domain Arabic Question Answering System: Assessment of the Bert ApproachAdvances in Model and Data Engineering in the Digitalization Era10.1007/978-3-031-55729-3_4(33-46)Online publication date: 21-Mar-2024
        • (2023)Challenges and opportunities for Arabic question-answering systems: current techniques and future directionsPeerJ Computer Science10.7717/peerj-cs.16339(e1633)Online publication date: 20-Oct-2023
        • (2023)A comprehensive survey of techniques for developing an Arabic question answering systemPeerJ Computer Science10.7717/peerj-cs.14139(e1413)Online publication date: 8-Jun-2023
        • (2023)A Data-Driven Exploration of a New Islamic Fatwas Dataset for Arabic NLP TasksData10.3390/data81001558:10(155)Online publication date: 19-Oct-2023
        • (2023)QASiNa: Religious Domain Question Answering Using Sirah Nabawiyah2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA)10.1109/ICAICTA59291.2023.10390123(1-6)Online publication date: 7-Oct-2023
        • (2023)Developing an Open Domain Arabic Question Answering System Using a Deep Learning TechniqueIEEE Access10.1109/ACCESS.2023.329219011(69131-69143)Online publication date: 2023
        • Show More Cited By

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media