Constructing Arabic Reading Comprehension Datasets: Arabic WikiReading and KaifLematha

Albilali, Eman; Al-Twairesh, Nora; Hosny, Manar

doi:10.1007/s10579-022-09577-5

Constructing Arabic Reading Comprehension Datasets: Arabic WikiReading and KaifLematha

Original Paper
Published: 18 March 2022

Volume 56, pages 729–764, (2022)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

560 Accesses
5 Altmetric
Explore all metrics

Abstract

Neural machine reading comprehension models have gained immense popularity over the last decade given the availability of large-scale English datasets. A key limiting factor for neural model development and investigations of the Arabic language is the limitation of the currently available datasets. Current available datasets are either too small to train deep neural models or created by the automatic translation of the available English datasets, where the exact answer may not be found in the corresponding text. In this paper, we propose two high quality and large-scale Arabic reading comprehension datasets: Arabic WikiReading and KaifLematha with around +100 K instances. We followed two different methodologies to construct our datasets. First, we employed crowdworkers to collect non-factoid questions from paragraphs on Wikipedia. Then, we constructed Arabic WikiReading following a distant supervision strategy, utilizing the Wikidata knowledge base as a ground truth. We carried out both quantitative and qualitative analyses to investigate the level of reasoning required to answer the questions in the proposed datasets. We evaluated competitive pre-trained language model that attained F1 scores of 81.77 and 68.61 for the Arabic WikiReading and KaifLematha datasets, respectively, but struggled to extract a precise answer for the KaifLematha dataset. Human performance reported an F1 score of 82.54 for the KaifLematha development set, which leaves ample room for improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ArQuAD: An Expert-Annotated Arabic Machine Reading Comprehension Dataset

Article 11 March 2024

Question-Aware Deep Learning Model for Arabic Machine Reading Comprehension

Combining Classical and Non-classical Features to Improve Readability Measures for Arabic First Language Texts

Data availability

Datasets are available under https://github.com/esulaiman/Arabic-WikiReading-and-KaifLematha-datasets.

Code availability

Not applicable.

Notes

https://github.com/motazsaad/arwikiExtracts
https://github.com/attardi/wikiextractor
https://github.com/xwhan/wikidata-filter
https://www.wikidata.org/wiki/Special:ListDatatypes
This explanation concerns the evaluation of human performance. However, for the model evaluation, all the three collected answers are considered as ground truth.
https://github.com/google-research/bert.
https://github.com/aub-mind/arabert.

References

Abdelali, A., Darwish, K., Durrani, N., & Mubarak, H. (2016). Farasa: A fast and furious segmenter for Arabic. In Proceedings of the 2016 conference of the north American chapter of the association for computational linguistics: Demonstrations (pp. 11–16). Association for Computational Linguistics. https://doi.org/10.18653/v1/N16-3003
Abouenour, L., Bouzoubaa, K., Rosso, P., & School, M. (2012). IDRAAQ: New arabic question answering system based on query expansion and passage retrieval (p. 11). In Presented at the CLEF 2012 conference and labs of the evaluation forum.
Abu Farha, I., & Magdy, W. (2019). Mazajak: An online Arabic sentiment analyser. In Proceedings of the fourth Arabic natural language processing workshop (pp. 192–198). Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-4621
Antoun, W., Baly, F., & Hajj, H. (2020). AraBERT: Transformer-based model for Arabic language understanding. In Proceedings of the 4th workshop on open-source arabic corpora and processing tools, with a shared task on offensive language detection (pp. 9–15). Marseille, France: European Language Resource Association.
Azmi, A. M., & Alshenaifi, N. A. (2017). Lemaza: An Arabic why-question answering system. Natural Language Engineering, 23(6), 877–903. https://doi.org/10.1017/S1351324917000304
Article Google Scholar
Belinkov, Y., Glass, J. (2019). Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7, 49–72
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051
Article Google Scholar
Chen, D., Bolton, J., & Manning, C. D. (2016, August). A Thorough examination of the CNN/daily mail reading comprehension task. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 2358–2367).‏
Choi, E., He, H., Iyyer, M., Yatskar, M., Yih, W., Choi, Y., Liang, P., & Zettlemoyer, L. (2018). QuAC: Question answering in context. In Proceedings of the 2018 conference on empirical methods in natural language processing (pp. 2174–2184). Presented at the EMNLP 2018, Brussels, Belgium:Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1241
Clark, J. H., Palomaki, J., Nikolaev, V., Choi, E., Garrette, D., Collins, M., & Kwiatkowski, T. (2020). TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages. Transactions of the Association for Computational Linguistics, 8, 454–470.
Article Google Scholar
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). https://doi.org/10.18653/v1/N19-1423.
Dunn, M., Sagun, L., Higgins, M., Guney, V. U., Cirik, V., & Cho, K. (2017). SearchQA: A new Q&A dataset augmented with context from a search engine. Retrieved September 3, 2019 from http://arxiv.org/abs/1704.05179.
ElJundi, O., Antoun, W., El Droubi, N., Hajj, H., El-Hajj, W., & Shaban, K. (2019). hULMonA: The universal language model in Arabic. In Proceedings of the fourth arabic natural language processing workshop (pp. 68–77). Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-4608
Ezzeldin, A. M., Kholief, M. H., El-Sonbaty, Y. (2013). ALQASIM: Arabic language question answer selection in machines. In Information access evaluation. multilinguality, multimodality, and visualization (pp. 100–103). Presented at the International Conference of the Cross-Language Evaluation Forum for European Languages, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40802-1_12
Ezzeldin, A. M., El-Sonbaty, Y., & Kholief, M. H. (2015). Exploring the effects of root expansion, sentence splitting and ontology on Arabic answer selection. Natural Language Processing and Cognitive Science: Proceedings , 2014, 273.
Farghaly, A., & Shaalan, K. (2009). Arabic natural language processing: Challenges and solutions. ACM Transactions on Asian Language Information Processing (TALIP), 8(4), 1–22.
Girju, R. (2003). Automatic detection of causal relations for question answering. In Proceedings of the ACL 2003 workshop on multilingual summarization and question answering (pp. 76–83). Association for Computational Linguistics. https://doi.org/10.3115/1119312.1119322
Habash, N. Y. (2010). Introduction to Arabic natural language processing. Synthesis lectures on human language technologies, 3, 1–187.
Hermann, K. M., Kociský, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., & Blunsom, P. (2015). Teaching machines to read and comprehend. In Advances in neural information processing systems (pp. 1693–1701). Retrieved March 7, 2019 from http://arxiv.org/abs/1506.03340.
Hewlett, D., Lacoste, A., Jones, L., Polosukhin, I., Fandrianto, A., Han, J., Kelcey, M., Berthelot, D., Berthelot, D. (2016). WikiReading: A novel large-scale language understanding task over wikipedia. In Proceedings of the 54th annual meeting of the association for computational linguistics (Vol 1: Long Papers, pp. 1535–1545).
Hill, F., Bordes, A., Chopra, S., & Weston, J. (2016). The goldilocks principle: Reading children’s books with explicit memory representations. In Presented at the 4th international conference on learning representations, ICLR 2016. San Juan, Puerto Rico.
Jia, R., & Liang, P. (2017). Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 2021–2031). Presented at the EMNLP 2017, Association for Computational Linguistics. https://doi.org/10.18653/v1/D17-1215
Joshi, M., Choi, E., Weld, D., & Zettlemoyer, L. (2017). TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers) (pp. 1601–1611). Presented at the ACL 2017, Association for Computational Linguistics. https://doi.org/10.18653/v1/P17-1147
Kočiský, T., Schwarz, J., Blunsom, P., Dyer, C., Hermann, K. M., Melis, G., & Grefenstette, E. (2018). The NarrativeQA reading comprehension challenge. Transactions of the Association for Computational Linguistics, 6, 317–328. https://doi.org/10.1162/tacl_a_00023
Article Google Scholar
Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., et al. (2019). Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7, 453–466. https://doi.org/10.1162/tacl_a_00276
Article Google Scholar
Lewis, P., Oguz, B., Rinott, R., Riedel, S., & Schwenk, H. (2020). MLQA: Evaluating cross-lingual extractive question answering. In Presented at the proceedings of the 58th annual meeting of the association for computational linguistics. (pp. 7315–7330). Retrieved July 28, 2020, from https://www.aclweb.org/anthology/2020.acl-main.653.
Liu, Q., Kusner, M. J., & Blunsom, P. (2020). A survey on contextual embeddings. arXiv e-prints, 2003, http://arxiv.org/abs/2003.07278.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Workshop track proceedings. Presented at the 1st International Conference on Learning Representations, Scottsdale, Arizona, USA.
Mintz, M., Bills, S., Snow, R., & Jurafsky, D. (2009). Distant supervision for relation extraction without labeled data. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP (pp. 1003–1011). Presented at the ACL-IJCNLP 2009, Suntec, Singapore: Association for Computational Linguistics. Retrieved August 11, 2020 from https://www.aclweb.org/anthology/P09-1113.
Motaz, S., & Wesam, A. (2010). Osac: Open source Arabic corpora (vol. 10). In Presented at the 6th ArchEng Int. Symposiums, EEECS.
Mozannar, H., Maamary, E., El Hajal, K., & Hajj, H. (2019). Neural Arabic question answering. In Proceedings of the fourth arabic natural language processing workshop (pp. 108–118). Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-4612
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. Presented at the IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2009.191
Peñas, A., Hovy, E., Forner, P., Rodrigo, Á., Sutcliffe, R., Forascu, C., et al. (2012). Overview of QA4MRE at CLEF 2012: Question answering for machine reading evaluation. In CLEF (notebook papers/labs/workshop) (pp. 1–20).
Peñas, A., Hovy, E., Forner, P., Rodrigo, Á., Sutcliffe, R., & Morante, R. (2013). QA4MRE 2011–2013: Overview of question answering for machine reading evaluation. In Information Access Evaluation. Multilinguality, Multimodality, and Visualization. (pp. 303–320). Presented at the International Conference of the Cross-Language Evaluation Forum for European Languages. Berlin: Springer. https://doi.org/10.1007/978-3-642-40802-1_29.
Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543). Presented at the EMNLP 2014, Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162.
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 1 (long papers) (pp. 2227–2237). Presented at the NAACL-HLT 2018, New Orleans, Louisiana: Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1202.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., et al. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research (JMLR), 21(140), 1–67.
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 conference on empirical methods in natural language processing. Retrieved December 1, 2017 from http://arxiv.org/abs/1606.05250.
Rajpurkar, P., Jia, R., & Liang, P. (2018). Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th annual meeting of the association for computational linguistics (volume 2: short papers) (pp. 784–789). Presented at the ACL 2018, Melbourne, Australia: Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-2124.
Reddy, S., Chen, D., & Manning, C. D. (2019). CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7, 249–266. https://doi.org/10.1162/tacl_a_00266
Article Google Scholar
Richardson, M., Burges, C. J. C., & Renshaw, E. (2013). MCTest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 193–203). Presented at the EMNLP 2013, Seattle, Washington, USA: Association for Computational Linguistics. Retrieved July 28, 2020, from https://www.aclweb.org/anthology/D13-1020.
Salem, Z., Sadek, J., Chakkour, F., & Haskkour, N. (2010). Automatically finding answers to “why” and “how to” questions for Arabic language. In R. Setchi, I. Jordanov, R. J. Howlett, & L. C. Jain (Eds.), Knowledge-based and intelligent information and engineering systems (pp. 586–593). Springer.
Chapter Google Scholar
Seo, M. J., Kembhavi, A., Farhadi, A., & Hajishirzi, H. (2017). Bidirectional attention flow for machine comprehension. In International conference on learning representations (ICLR) (Vol. abs/1611.01603). Retrieved March 7, 2019, from http://arxiv.org/abs/1611.01603.
Shaheen, M., & Ezzeldin, A. M. (2014). Arabic question answering: Systems, resources, tools, and future trends. Arabian Journal for Science and Engineering, 39(6), 4541–4564. https://doi.org/10.1007/s13369-014-1062-2
Article Google Scholar
Smirnova, A., Cudré-Mauroux, P. (2018). Relation extraction using distant supervision: A survey. ACM Computing Surveys (CSUR), 51(5), 1–35.
Soliman, A. B., Eissa, K., & El-Beltagy, S. R. (2017). AraVec: A set of Arabic word embedding models for use in Arabic NLP. Procedia Computer Science, 117, 256–265. https://doi.org/10.1016/j.procs.2017.10.117
Article Google Scholar
Trigui, O., Hadrich Belguith, L., Rosso, P., Ben Amor, H., & Gafsaoui, B. (2012). Arabic QA4MRE at CLEF 2012: Arabic question answering for machine reading evaluation. In Presented at the CLEF 2012 workshop on question answering for machine reading evaluation (QA4MRE).
Trischler, A., Wang, T., Yuan, X., Harris, J., Sordoni, A., Bachman, P., & Suleman, K. (2017). NewsQA: A machine comprehension dataset. In Proceedings of the 2nd workshop on representation learning for NLP (pp. 191–200). Vancouver, Canada: Association for Computational Linguistics. https://doi.org/10.18653/v1/W17-2623
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st international conference on neural information processing systems (pp. 6000–6010). Long Beach, California, USA.
Welbl, J., Stenetorp, P., & Riedel, S. (2018). Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, 6, 287–302. https://doi.org/10.1162/tacl_a_00021
Article Google Scholar
Weston, J., Chopra, S., & Bordes, A. (2015). Memory networks. In Presented at the international conference on learning representations, San Diego, CA. http://arxiv.org/abs/1410.3916.
Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W., Salakhutdinov, R., & Manning, C. D. (2018). HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 conference on empirical methods in natural language processing (pp. 2369–2380). Presented at the EMNLP 2018, Brussels, Belgium: Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1259
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2019). XLNet: Generalized autoregressive pretraining for language understanding (pp. 5753–5763). In Advances in Neural Information Processing Systems.
Yu, A. W., Dohan, D., Luong, M.-T., Zhao, R., Chen, K., Norouzi, M., & Le, Q. V. (2018). QANet: Combining local convolution with global self-attention for reading comprehension. In International conference on learning representations. Retrieved May 1, 2018, from http://arxiv.org/abs/1804.09541.
Zeroual, I., Goldhahn, D., Eckart, T., & Lakhouaja, A. (2019). OSIAN: Open source international Arabic news corpus—preparation and integration into the CLARIN-infrastructure. In Proceedings of the fourth arabic natural language processing workshop (pp. 175–182). Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-4619

Download references

Acknowledgements

The authors would like to thank Deanship of scientific research in King Saud University for funding and supporting this research through the initiative of DSR Graduate Students Research Support (GSR). The authors thank the Deanship of Scientific Research and RSSU at King Saud University for their technical support

Funding

The authors would like to thank Deanship of scientific research in King Saud University for funding and supporting this research through the initiative of DSR Graduate Students Research Support (GSR).

Author information

Authors and Affiliations

Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Eman Albilali & Manar Hosny
Information Technology Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Nora Al-Twairesh
STC’s Artificial Intelligence Research Chair, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Nora Al-Twairesh

Authors

Eman Albilali
View author publications
You can also search for this author inPubMed Google Scholar
Nora Al-Twairesh
View author publications
You can also search for this author inPubMed Google Scholar
Manar Hosny
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Eman Albilali.

Ethics declarations

Conflict of interest

All the authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Figs.

7 and

8.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Albilali, E., Al-Twairesh, N. & Hosny, M. Constructing Arabic Reading Comprehension Datasets: Arabic WikiReading and KaifLematha. Lang Resources & Evaluation 56, 729–764 (2022). https://doi.org/10.1007/s10579-022-09577-5

Download citation

Accepted: 10 January 2022
Published: 18 March 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10579-022-09577-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constructing Arabic Reading Comprehension Datasets: Arabic WikiReading and KaifLematha

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

ArQuAD: An Expert-Annotated Arabic Machine Reading Comprehension Dataset

Question-Aware Deep Learning Model for Arabic Machine Reading Comprehension

Combining Classical and Non-classical Features to Improve Readability Measures for Arabic First Language Texts

Data availability

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now