Adaptation of a Term Extractor to Arabic Specialised Texts: First Experiments and Limits

Neifar, Wafa; Hamon, Thierry; Zweigenbaum, Pierre; Khemakhem, Mariem Ellouze; Belguith, Lamia Hadrich

doi:10.1007/978-3-319-75477-2_16

Wafa Neifar^14,15,
Thierry Hamon ORCID: orcid.org/0000-0002-1521-4875^14,16,
Pierre Zweigenbaum¹⁴,
Mariem Ellouze Khemakhem¹⁵ &
…
Lamia Hadrich Belguith¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1351 Accesses

Abstract

In this paper, we present an adaptation to Modern Standard Arabic of a French and English term extractor. The goal of this work is to reduce the lack of resources and NLP tools for Arabic language in specialised domains. The adaptation firstly focuses on the description of extraction processes similar to those already defined for French and English while considering the morpho-syntactic specificity of Arabic. Agglutination phenomena are further taken into account in the term extraction process. The current state of the adapted system was evaluated on a medical text corpus. 400 maximal candidate terms were examined, among which 288 were correct (72% precision). An error analysis shows that term extraction errors are first due to Part-of-Speech tagging errors and the difficulties induced by non-diacritised texts, then to remaining agglutination phenomena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://search.cpan.org/~thhamon/Lingua-YaTeA/.
2.
noun-m-s-g-d: defined singular masculine noun in genitive case. noun-f-s-n-c: constructed singular feminine noun in nominative case.
3.
http://www.nlm.nih.gov/medlineplus/languages/all_healthtopics.html.

References

Cabré, M.T., Estopà, R., Vivaldi, J.: Automatic term detection: a review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M. (eds.) Recent Advances in Computational Terminology. John Benjamins, Amsterdam (2001)
Google Scholar
Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis, S. (ed.) STUDFUZZ. STUDFUZZ, vol. 185, pp. 255–279. Springer, Heidelberg (2005). https://doi.org/10.1007/3-540-32394-5_20
Chapter Google Scholar
Marshman, E., Gariépy, J.L., Harms, C.: Helping language professionals relate to terms: terminological relations and termbases. J. Spec. Transl. 18, 45–71 (2012)
Google Scholar
Q. Zadeh, B., Handschuh, S.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: Proceedings of the 4th International Workshop on Computational Terminology (Computerm), Dublin, Ireland, pp. 52–63 (2014)
Google Scholar
Cohen, K.B., Demner-Fushman, D.: Biomedical Natural Language Processing. John Benjamins Publishing Company, Philadelphia (2013)
Google Scholar
Aubin, S., Hamon, T.: Improving term extraction with terminological resources. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 380–387. Springer, Heidelberg (2006). https://doi.org/10.1007/11816508_39
Chapter Google Scholar
Boulaknadel, S., Daille, B., Aboutajdine, D.: A multi-word term extraction program for arabic language. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Tapias, D. (eds.) Proceedings of the LREC 2008 (2008)
Google Scholar
Habash, N.: Introduction to Arabic Natural Language Processing. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, San Raphael (2010)
Google Scholar
Massoud, R.: La terminologie au liban : réalités et défis. Annales de l’Institut de langues et de traduction (ILT) 10 (2003)
Google Scholar
Samy, D., Moreno-Sandoval, A., Bueno-Díaz, C., Garrote-Salazar, M., Guirao, J.M.: Medical term extraction in an arabic medical corpus. In: Proceedings of LREC 2012 (2012)
Google Scholar
Daille, B.: Conceptual structuring through term variations. In: Bond, F., Kohonen, A., Carthy, D.M., Villaciencio, A. (eds.) Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition, and Treatment, pp. 9–16 (2003)
Google Scholar
Bounhas, I., Slimani, Y.: A hybrid approach for arabic multi-word term extraction. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2009, pp. 1–8. IEEE (2009)
Google Scholar
Dunning, T.: Accurate methods for the statistics of suprise and coincidence. Comput. Linguist. 19, 61–74 (1993). Special Issue on Using Large Corpora: I
Google Scholar
AlKhatib, K., Badarneh, A.: Automatic extraction of arabic multi-word terms. In: IMCSIT, pp. 411–418 (2010)
Google Scholar
Kageura, K., Umino, B.: Methods of automatic term recognition - a review. Terminology 3, 259–289 (1996)
Article Google Scholar
Maynard, D., Ananiadou, S.: Identifying terms by their family and friends. In: Proceedings of COLING 2000, Saarbrucken, Germany, pp. 530–536 (2000)
Google Scholar
Abed, A.M., Tiun, S., Albared, M.: Arabic term extraction using combined approach on islamic document. J. Theor. Appl. Inf. Technol. 58, 601–608 (2013)
Google Scholar
Bounhas, I., Elayeb, B., Evrard, F., Slimani, Y.: Organizing contextual knowledge for arabic text disambiguation and terminology extraction. Knowl. Org. J. 38, 473–490 (2011)
Google Scholar
Bounhas, I., Lahbib, W., Elayeb, B.: Arabic domain terminology extraction: a literature review. In: Meersman, R., Panetto, H., Dillon, T., Missikoff, M., Liu, L., Pastor, O., Cuzzocrea, A., Sellis, T. (eds.) OTM 2014. LNCS, vol. 8841, pp. 792–799. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45563-0_51
Google Scholar
Korkontzelos, I., Klapaftis, I.P., Manandhar, S.: Reviewing and evaluating automatic term recognition techniques. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 248–259. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85287-2_24
Chapter Google Scholar
Hamon, T., Engström, C., Silvestrov, S.: Term ranking adaptation to the domain: genetic algorithm-based optimisation of the C-Value. In: Przepiórkowski, A., Ogrodniczuk, M. (eds.) NLP 2014. LNCS (LNAI), vol. 8686, pp. 71–83. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10888-9_8
Google Scholar
Roth, R., Rambow, O., Habash, N., Diab, M., Rudin, C.: Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In: Proceedings of ACL-08: HLT, Short Papers, Columbus, Ohio, pp. 117–120 (2008)
Google Scholar
Hadrich, L.B., Chaaben, N.: Analyse et désambiguïsation morphologiques de textes arabes non voyellés. In: Actes de TALN’06, Leuven, Belgique, pp. 493–501 (2006)
Google Scholar
Al-Sulaiti, L., Atwell, E.: The design of a corpus of contemporary arabic. Int. J. Corpus Linguist. 11, 1–36 (2006)
Article Google Scholar
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)
Google Scholar
Habash, N., Rambow, O., Roth, R.: MADA+TOKAN Manual. CCLS-10-01 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

LIMSI, CNRS, Université Paris-Saclay, 91405, Orsay, France
Wafa Neifar, Thierry Hamon & Pierre Zweigenbaum
MIRACL Laboratory, Sfax University, B.P-3018, Sfax, Tunisia
Wafa Neifar, Mariem Ellouze Khemakhem & Lamia Hadrich Belguith
Université Paris 13, Sorbonne Paris Cité, 93430, Villetaneuse, France
Thierry Hamon

Authors

Wafa Neifar
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Hamon
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Zweigenbaum
View author publications
You can also search for this author in PubMed Google Scholar
Mariem Ellouze Khemakhem
View author publications
You can also search for this author in PubMed Google Scholar
Lamia Hadrich Belguith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thierry Hamon .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neifar, W., Hamon, T., Zweigenbaum, P., Khemakhem, M.E., Belguith, L.H. (2018). Adaptation of a Term Extractor to Arabic Specialised Texts: First Experiments and Limits. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-75477-2_16
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics