Skip to main content

Building TALAA-AFAQ, a Corpus of Arabic FActoid Question-Answers for a Question Answering System

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10260))

Abstract

In this paper, we describe the development of TALAA-AFAQ, a Corpus of Arabic Factoid Question Answers that is developed to be used in the training modules of an Arabic Question Answering System (AQAS). The process of building our corpus consists of five steps, in which we extract syntactic, semantic features and other information. In addition, we extract a set of answer patterns for each question from the web. The corpus contains 2002 question answer pairs. Out of these, 618 question-answer pairs have their answer-patterns. The corpus is divided into four main classes and 34 finer categories. All answer patterns and features have been validated by experts on Arabic. To the best of our knowledge, this is the first corpus of Arabic Factoid Question Answers which is specifically built to support the development of Arabic QASs (AQAS).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://users.dsic.upv.es/grupos/nle/?file=kop4.php.

  2. 2.

    http://tahadz.com/mishkal/.

References

  1. Agichtein, E., Lawrence, S., Gravano, L.: Learning search engine specific query transformations for question answering. In: Proceedings of the 10th International Conference on World Wide Web, pp. 169–178. ACM (2001)

    Google Scholar 

  2. Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition using optimized feature sets. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 284–293. Association for Computational Linguistics (2008)

    Google Scholar 

  3. Bhaskar, P., Pakray, P., Banerjee, S., Banerjee, S., Bandyopadhyay, S., Gelbukh, A.F.: Question answering system for QA4MRE@CLEF 2012. In: CLEF (Online Working Notes/Labs/Workshop) (2012)

    Google Scholar 

  4. Burke, R.D., Hammond, K.J., Kulyukin, V., Lytinen, S.L., Tomuro, N., Schoenberg, S.: Question answering from frequently asked question files: experiences with the FAQ finder system. AI Mag. 18(2), 57 (1997)

    Google Scholar 

  5. Cohn, A.B.R.C.D., Mittal, D.F.V.: Bridging the lexical chasm: statistical approaches to answer-finding. In: Proceedings of the Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, p. 192. ACM Press (2000)

    Google Scholar 

  6. Diab, M.: Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and base phrase chunking. In: 2nd International Conference on Arabic Language Resources and Tools (2009)

    Google Scholar 

  7. Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Nat. Lang. Eng. 12(03), 229–249 (2006)

    Article  Google Scholar 

  8. Peñas, A., Hovy, E.H., Forner, P., Rodrigo, Á., Sutcliffe, R.F., Forascu, C., Sporleder, C.: Overview of QA4MRE@CLEF 2011: question answering for machine reading evaluation. In: CLEF (Notebook Papers/Labs/Workshop), pp. 1–20 (2011)

    Google Scholar 

  9. Shawar, B.A., Atwell, E.: Arabic question-answering via instance based learning from an FAQ corpus. In: Proceedings of the CL 2009 International Conference on Corpus Linguistics. UCREL (2009)

    Google Scholar 

  10. Soricut, R., Brill, E.: Automatic question answering using the web: beyond the factoid. Inf. Retrieval 9(2), 191–206 (2006)

    Article  Google Scholar 

  11. Tomás, D., Vicedo, J.L., Bisbal, E., Moreno, L.: Trainqa: a training corpus for corpus-based question answering systems. Polibits 40, 5–11 (2009)

    Article  Google Scholar 

  12. Trigui, O., Belguith, H., Rosso, P.: Defarabicqa: Arabic definition question answering system. In: Workshop on Language Resources and Human Language Technologies for Semitic Languages, 7th LREC, Valletta, Malta, pp. 40–45 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asma Aouichat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Aouichat, A., Guessoum, A. (2017). Building TALAA-AFAQ, a Corpus of Arabic FActoid Question-Answers for a Question Answering System. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59569-6_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59568-9

  • Online ISBN: 978-3-319-59569-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics