Hostname: page-component-8448b6f56d-tj2md Total loading time: 0 Render date: 2024-04-17T12:40:33.069Z Has data issue: false hasContentIssue false

ISO standard modeling of a large Arabic dictionary

Published online by Cambridge University Press:  07 September 2015

AIDA KHEMAKHEM
Affiliation:
MIRACL Laboratory, FSEGS, University of Sfax, B.P. 1088, 3018 Sfax, Tunisia e-mail: khemakhem.aida@gmail.com, bilel.gargouri@fsegs.rnu.tn
BILEL GARGOURI
Affiliation:
MIRACL Laboratory, FSEGS, University of Sfax, B.P. 1088, 3018 Sfax, Tunisia e-mail: khemakhem.aida@gmail.com, bilel.gargouri@fsegs.rnu.tn
ABDELMAJID BEN HAMADOU
Affiliation:
MIRACL Laboratory, ISIMS, University of Sfax, B.P. 242, 3021 Sakiet-Ezzit, Sfax, Tunisia e-mail: abdelmajid.benhamadou@isimsf.rnu.tn
GIL FRANCOPOULO
Affiliation:
IMMI-CNRS and Tagmatica, Rue John von Neumann, 91405 Orsay, France e-mail: gil.francopoulo@wanadoo.fr

Abstract

In this paper, we address the problem of the large coverage dictionaries of Arabic language usable both for direct human reading and automatic Natural Language Processing. For these purposes, we propose a normalized and implemented modeling, based on Lexical Markup Framework (LMF-ISO 24613) and Data Registry Category (DCR-ISO 12620), which allows a stable and well-defined interoperability of lexical resources through a unification of the linguistic concepts. Starting from the features of the Arabic language, and due to the fact that a large range of details and refinements need to be described specifically for Arabic, we follow a finely structuring strategy. Besides its richness in morphology, syntax and semantics knowledge, our model includes all the Arabic morphological patterns to generate the inflected forms from a given lemma and highlights the syntactic–semantic relations. In addition, an appropriate codification has been designed for the management of all types of relationships among lexical entries and their related knowledge. According to this model, a dictionary named El Madar1 has been built and is now publicly available on line. The data are managed by a user-friendly Web-based lexicographical workstation. This work has not been done in isolation, but is the result of a collaborative effort by an international team mainly within the ISO network during a period of eight years.

Type
Articles
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abbès, R., Dichy, J., and Hassoun, M. 2004. The architecture of a standard Arabic lexical database: some figures, ratios and categories from the DIINAR.1 source program. In Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages - COLING 2004, pp. 15–22. University of Geneva.Google Scholar
Ait Taleb, S. 2005. Dictionnaires électroniques arabes: le modèle des dictionnaires de Sakhr. Revue of the “Association Marocaine des Etudes Lexicographiques”, pp. 15–31. Number 3–4.Google Scholar
Alouani, M. 2008. « , Number 11, pp. 227–259. [online] available on www.alukah.net/Articles/Article.aspx?CategoryID=78&ArticleID=417. (Accessed in 2008).Google Scholar
Antoni-Lay, M., Francopoulo, G., and Zaysser, L. 1994. A generic model for reusable lexicons: the Genelex project. Literary and Linguistic Computing 9 (1), pp. 4754. (Report Project Eureka GENELEX).Google Scholar
Attia, M., Pecina, P., Lamia, T., Toral, A., and Van Genabith, J. 2011. A lexical atabase for modern standard Arabic interoperable with a finite state morphological transducer. In Mahlow, C., and Piotrowski, M. (eds.), Systems and Frameworks for Computational Morphology. Second International Workshop, SFCM, vol. 100, 1st ed., pp. 98118. Series: Communications in Computer and Information Science. Zurich, Switzerland.Google Scholar
Baccar, F., Gargouri, B., and Ben Hamadou, A., 2010. Towards generation of domain ontology from LMF standardized dictionaries. In The 22nd International Conference on Software Engineering and Knowledge Engineering (SEKE), Redwood City, San Francisco Bay, USA, pp. 515520.Google Scholar
Baccar, F., Gargouri, B., and Ben Hamadou, A. 2011. Domain ontology generation using LMF standardized dictionary structure. The 6th International Conference on Software Paradigm Trends (ICSOFT), Seville, Spain, pp. 396401.Google Scholar
Baccar, F., Gargouri, B., and Ben Hamadou, A. 2012. LMF dictionary-based approach for domain ontology generation. In Semi-Automatic Ontology Development: Processes and Resources, pp. 106130. IGI Global editions.Google Scholar
Baklouti, N., Fakhfakh, F., Gargouri, B., and Jmaiel, M., 2013. OWL-LingS editor: a tool for semantic description of linguistic Web services. In 3rd International Conference on Cloud Computing and Services Science (CLOSER), Aachen, Germany, pp. 224227.Google Scholar
Ben Abderrahmen, M., Chaari, F., Gargouri, B., and Jmaiel, M. 2006. Des services orientés besoin pour l’exploitation des bases lexicales normalisées. In 10th Maghrebian Conference on Software Engineering and Artificial Intelligence MCSEAI, Agadir, Morocco, pp. 451456.Google Scholar
Ben Abderrahmen, M., Gargouri, B., and Jmaiel, M. 2009. LMF-QL: a graphical tool to query LMF databases for NLP and editorial uses. In Human Language Technology. Challenges of the Information Society, vol. 5603, pp. 279–290. Lecture Notes in Computer Science.Google Scholar
Ben Mrad, I. 1987. . Dar Al-Gharb Al-Islami, Beyrouth, Liban.Google Scholar
Bertagna, F., Calzolari, N., Lenci, A., and Zampolli, A. 2000. ISLE - computational lexicons working group. In The Multilingual ISLE Lexical Entry (MILE): a discussion paper. http://www.tagmatica.fr/ doc.htm.Google Scholar
Bogurev, B., Briscoe, E. J., Calzolari, N., Cater, A., Meijs, W., and Zampolli, A. 1988. Acquisition of lexical knowledge for natural language processing systems (ACQUILEX), Proposal for ESPRIT Basic Research Actions N° 3030. Cambridge, United Kingdom.Google Scholar
Boudelaa, S., and Marslen-Wilson, W. 2010. Aralex: a lexical database for modern standard Arabic. Behaviour Research Methods 42 (2): 481487.Google Scholar
Buckwalter, T. 2004. Buckwalter Arabic morphological analyzer version 2.0. LDC catalog number LDC2004L02.Google Scholar
Calzolari, N., Mc Naught, J., and Zampolli, A. 1996. EAGLES Final Report, EAG-EB-EI, Pisa, Italy.Google Scholar
Calzolari, N., and Monachini, M. 1996. Multext - common specifications and notation for lexicon encoding. http://www.lpl.univ-aix.fr/projects/multext/LEX/LEX1.html Google Scholar
Calzolari, N., Monachini, M., and Soria, C. 2013. LMF-historical context and perspectives. In LMF Lexical Markup Framework, pp. 118. Wiley-ISTE, London.Google Scholar
Chaâben, N., Hadrich, L., and Ben Hamadou, A. 2010. The MORPH2 new version: a robust morphological analyzer for Arabic texts. In 10th International Conference on the Statistical Analysis of Textual Data (JADT), Rome, Italy. http://jadt2010.uniroma1.it/ Google Scholar
Darwish, A. 1956. , Egypt.Google Scholar
Diab, M., Habash, N., Rambow, O., Al Tantawy, M., and Benajiba, Y. 2010. COLABA: Arabic dialect annotation and processing. In Proceedings of the Language Resources (LRs) and Human Language Technologies (HLT) for Semitic Languages at LREC, Malta, pp. 6674.Google Scholar
Doumi, N., Lehireche, A., Maurel, D., and Alicherif, M. 2013. Conception d’un jeu de ressources libres pour le TAL arabe sous Unitex. In the 6th International Conference on Traductologie and TAL – [hal-01024409-version 1].Google Scholar
Elkateb, S., Black, W., Vossen, P., Farwell, D., Rodríguez, H., Pease, A., and Alkhalifa, M., 2006. Arabic WordNet and the challenges of Arabic. In Proceedings of Arabic NLP/MTConference, London, U.K., pp. 1524.Google Scholar
Francopoulo, G. 2003. Proposition de normalisation de norme des lexiques pour le traitement automatique du langage. INRIA/LORIA-ACTION SYNTAXE, Version-1.3.Google Scholar
Francopoulo, G., and George, M. 2008. ISO/TC 37/SC 4 N453 (N330 Rev.16). Language resource management- Lexical Markup Framework (LMF).Google Scholar
Francopoulo, G. 2013 (editor). LMF Lexical Markup Framework. Wiley-ISTE. London.Google Scholar
Habash, N., Soudi, A., and Buckwalter, T. 2007. On Arabic transliteration. In van den Bosch, A., and Soudi, A. (eds.), Arabic Computational Morphology: Knowledge-based and Empirical Methods, pp. 1522. Berlin: Springer-Verlag.Google Scholar
Haywood, J. A. 1960. Arabic Lexicography: Its History, and its Place in the General History of Lexicography, Leiden, E. J. Brill.Google Scholar
Ide, N., and Romary, L. 2004. A registry of standard data categories for linguistic annotation. In Proceedings of LREC, Lisbon, pp. 135138.Google Scholar
Khemakhem, A., Gargouri, B., and Abdelwahed, A. 2006. LMF est-il convenable pour la langue arabe?, Journées sur le Traitement Automatique de la Langue Arabe JTALA, Rabat, Morocco.Google Scholar
Khemakhem, A., Gargouri, B., Abdelwahed, A., and Francopoulo, G., 2007. Modélisation des paradigmes de flexion des verbes arabes selon la norme LMF - ISO 24613. In The Conference Traitement Automatique des Langues Naturelles, Toulouse, France, pp. 133142.Google Scholar
Khemakhem, A., Elleuch, I., Gargouri, B., and Ben Hamadou, A. 2009. Towards an automatic conversion approach of editorial Arabic dictionaries into LMF-ISO 24613 standardized mode. In 2nd International Conference on Arabic Language Resources and Tools- MEDAR, Cairo, Egypt.Google Scholar
Khemakhem, A., Gargouri, B., and Ben Hamadou, A., 2011. Modélisation syntaxico-sémantique normalisée pour la langue arabe. In 30ème Colloque International sur le Lexique et la Grammaire LGC, Nicosie, Cyprus, pp. 453464.Google Scholar
Khemakhem, A., Gargouri, B., and Ben Hamadou, A. 2012. LMF standardized dictionary for Arabic language. In International Conference on Computing and Information Technology, Al-Madinah Al-Munawarah, Saudi Arabia.Google Scholar
Landau, S. 2001. Dictionaries: The Art and Craft of Lexicography, 2nd ed. Cambridge, United Kingdom: Cambridge University Press.Google Scholar
Lenci, A., Busa, F., Ruimy, N., Gola, E., Monachini, M., Calzolari, N., and Zampolli, A. 2000. SIMPLE linguistic specifications, SIMPLE LE4–8346 EC Project, Deliverable D2.1 & D2.2, WP02, Final version, ILC et Université de Pisa. http://www.ub.es/gilcub/SIMPLE/simple.html#Specifications.Google Scholar
Loukil, N., Haddar, K., and Ben Hamadou, A. 2008. Towards a syntactic lexicon of Arabic verbs. In HLT& NPL within the Arabic World: Arabic Language &Local Languages Processing – Status Updates &Prospects-LREC, pp. 9396. Marrakech, Morocco.Google Scholar
Maamouri, M., Bies, A., Buckwalter, T., Diab, M., Habash, N., Rambow, O., and Tabessi, D. 2006. Developing and using a pilot dialectal Arabic treebank. In Proceedings of LREC, pp. 1734. Genoa, Italy.Google Scholar
Maks, I., Tiberius, C., and Van Veenendaal, R. 2008. Standardising bilingual lexical resources according to the Lexicon Markup Framework. In: Proceedings of LREC, Marrakech, Morocco, pp. 17231727.Google Scholar
Mesfar, S., and Silberztein, M., 2008. Transducer minimization and information compression for NooJ dictionaries. In International Conference Finite-State Methods and Natural Language Processing – FSMNLP, Joint Research Centre of the EC, Ispra, Italy, pp. 110121.Google Scholar
Rey-Debove, J. 1971. Etude Linguistique et Sémiotique des Dictionnaires Français Contemporains, pp. 317323. Mouton. The Hague, Netherlands.Google Scholar
Romary, L., Salmon-Al, S., and Francopoulo, G., 2004. Standards going concrete: from LMF to Morphalou. In Workshop on Electronic Dictionaries, Coling, Geneva, Switzerland, pp. 2228.Google Scholar
Salmon-Alt, S., Akrout, A., and Romary, L. 2005. Proposals for a normalized representation of standard Arabic full form lexica. In Second International Conference on Machine Intelligence (ACIDCA-ICMI 2005), Tozeur, Tunisia.Google Scholar
Sawalha, M., Atwell, E., and Abushariah, M. A. M., 2013. SALMA: Standard Arabic Language Morphological Analysis. In Proceedings of the 1st International Conference on Communications, Signal Processing, and their Application, Sharjah, UAE, pp. 16.Google Scholar
Smrž, O. 2007. ElixirFM - implementation of functional Arabic morphology. In Proceedings of the Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, Prague, pp. 18. Czech Republic.Google Scholar
Véronis, J., and Ide, N. 1996. Encodage des dictionnaires électroniques: problèmes et propositions de la TEI. In Piotrowsky, D. (ed.), Lexicographie et informatique - Autour de l’informatisation du Trésor de la Langue Française. Actes du Colloque International de Nancy, pp. 239261. Paris, Didier Erudition.Google Scholar