Skip to main content

Adaptation of a Term Extractor to Arabic Specialised Texts: First Experiments and Limits

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Abstract

In this paper, we present an adaptation to Modern Standard Arabic of a French and English term extractor. The goal of this work is to reduce the lack of resources and NLP tools for Arabic language in specialised domains. The adaptation firstly focuses on the description of extraction processes similar to those already defined for French and English while considering the morpho-syntactic specificity of Arabic. Agglutination phenomena are further taken into account in the term extraction process. The current state of the adapted system was evaluated on a medical text corpus. 400 maximal candidate terms were examined, among which 288 were correct (72% precision). An error analysis shows that term extraction errors are first due to Part-of-Speech tagging errors and the difficulties induced by non-diacritised texts, then to remaining agglutination phenomena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://search.cpan.org/~thhamon/Lingua-YaTeA/.

  2. 2.

    noun-m-s-g-d: defined singular masculine noun in genitive case. noun-f-s-n-c: constructed singular feminine noun in nominative case.

  3. 3.

    http://www.nlm.nih.gov/medlineplus/languages/all_healthtopics.html.

References

  1. Cabré, M.T., Estopà, R., Vivaldi, J.: Automatic term detection: a review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M. (eds.) Recent Advances in Computational Terminology. John Benjamins, Amsterdam (2001)

    Google Scholar 

  2. Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis, S. (ed.) STUDFUZZ. STUDFUZZ, vol. 185, pp. 255–279. Springer, Heidelberg (2005). https://doi.org/10.1007/3-540-32394-5_20

    Chapter  Google Scholar 

  3. Marshman, E., Gariépy, J.L., Harms, C.: Helping language professionals relate to terms: terminological relations and termbases. J. Spec. Transl. 18, 45–71 (2012)

    Google Scholar 

  4. Q. Zadeh, B., Handschuh, S.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: Proceedings of the 4th International Workshop on Computational Terminology (Computerm), Dublin, Ireland, pp. 52–63 (2014)

    Google Scholar 

  5. Cohen, K.B., Demner-Fushman, D.: Biomedical Natural Language Processing. John Benjamins Publishing Company, Philadelphia (2013)

    Google Scholar 

  6. Aubin, S., Hamon, T.: Improving term extraction with terminological resources. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 380–387. Springer, Heidelberg (2006). https://doi.org/10.1007/11816508_39

    Chapter  Google Scholar 

  7. Boulaknadel, S., Daille, B., Aboutajdine, D.: A multi-word term extraction program for arabic language. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Tapias, D. (eds.) Proceedings of the LREC 2008 (2008)

    Google Scholar 

  8. Habash, N.: Introduction to Arabic Natural Language Processing. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, San Raphael (2010)

    Google Scholar 

  9. Massoud, R.: La terminologie au liban : réalités et défis. Annales de l’Institut de langues et de traduction (ILT) 10 (2003)

    Google Scholar 

  10. Samy, D., Moreno-Sandoval, A., Bueno-Díaz, C., Garrote-Salazar, M., Guirao, J.M.: Medical term extraction in an arabic medical corpus. In: Proceedings of LREC 2012 (2012)

    Google Scholar 

  11. Daille, B.: Conceptual structuring through term variations. In: Bond, F., Kohonen, A., Carthy, D.M., Villaciencio, A. (eds.) Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition, and Treatment, pp. 9–16 (2003)

    Google Scholar 

  12. Bounhas, I., Slimani, Y.: A hybrid approach for arabic multi-word term extraction. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2009, pp. 1–8. IEEE (2009)

    Google Scholar 

  13. Dunning, T.: Accurate methods for the statistics of suprise and coincidence. Comput. Linguist. 19, 61–74 (1993). Special Issue on Using Large Corpora: I

    Google Scholar 

  14. AlKhatib, K., Badarneh, A.: Automatic extraction of arabic multi-word terms. In: IMCSIT, pp. 411–418 (2010)

    Google Scholar 

  15. Kageura, K., Umino, B.: Methods of automatic term recognition - a review. Terminology 3, 259–289 (1996)

    Article  Google Scholar 

  16. Maynard, D., Ananiadou, S.: Identifying terms by their family and friends. In: Proceedings of COLING 2000, Saarbrucken, Germany, pp. 530–536 (2000)

    Google Scholar 

  17. Abed, A.M., Tiun, S., Albared, M.: Arabic term extraction using combined approach on islamic document. J. Theor. Appl. Inf. Technol. 58, 601–608 (2013)

    Google Scholar 

  18. Bounhas, I., Elayeb, B., Evrard, F., Slimani, Y.: Organizing contextual knowledge for arabic text disambiguation and terminology extraction. Knowl. Org. J. 38, 473–490 (2011)

    Google Scholar 

  19. Bounhas, I., Lahbib, W., Elayeb, B.: Arabic domain terminology extraction: a literature review. In: Meersman, R., Panetto, H., Dillon, T., Missikoff, M., Liu, L., Pastor, O., Cuzzocrea, A., Sellis, T. (eds.) OTM 2014. LNCS, vol. 8841, pp. 792–799. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45563-0_51

    Google Scholar 

  20. Korkontzelos, I., Klapaftis, I.P., Manandhar, S.: Reviewing and evaluating automatic term recognition techniques. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 248–259. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85287-2_24

    Chapter  Google Scholar 

  21. Hamon, T., Engström, C., Silvestrov, S.: Term ranking adaptation to the domain: genetic algorithm-based optimisation of the C-Value. In: Przepiórkowski, A., Ogrodniczuk, M. (eds.) NLP 2014. LNCS (LNAI), vol. 8686, pp. 71–83. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10888-9_8

    Google Scholar 

  22. Roth, R., Rambow, O., Habash, N., Diab, M., Rudin, C.: Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In: Proceedings of ACL-08: HLT, Short Papers, Columbus, Ohio, pp. 117–120 (2008)

    Google Scholar 

  23. Hadrich, L.B., Chaaben, N.: Analyse et désambiguïsation morphologiques de textes arabes non voyellés. In: Actes de TALN’06, Leuven, Belgique, pp. 493–501 (2006)

    Google Scholar 

  24. Al-Sulaiti, L., Atwell, E.: The design of a corpus of contemporary arabic. Int. J. Corpus Linguist. 11, 1–36 (2006)

    Article  Google Scholar 

  25. Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)

    Google Scholar 

  26. Habash, N., Rambow, O., Roth, R.: MADA+TOKAN Manual. CCLS-10-01 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thierry Hamon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Neifar, W., Hamon, T., Zweigenbaum, P., Khemakhem, M.E., Belguith, L.H. (2018). Adaptation of a Term Extractor to Arabic Specialised Texts: First Experiments and Limits. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75477-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75476-5

  • Online ISBN: 978-3-319-75477-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics