Abstract
Since the age of paper versions, dictionaries are often published with anomalies in their content resulting from lexicographer’s mistakes or from the lack of efficiency of automatic enrichment systems. Many of these anomalies are expensive to manually detect and difficult to automatically control, notably with lightly structured models of dictionaries. In this article, we take advantage of the fine structure proposed by the Lexical Markup Framework (LMF) norm to investigate the detection of anomalies in the content of LMF normalized dictionaries. First, we give a theoretical study on the plausible anomalies, such as inconsistency, incoherence, redundancy, and incompleteness. Second, we detail the approach that we propose for the automatic detection of such anomalies. Finally, we report on an experiment carried out on an available normalized dictionary of the Arabic language. The experiment has shown that the proposed approach gives reasonable results in terms of precision and recall.
- Adnen Alkhatib. 1967. Arabic Dictionary Between the Past and the Present. Nachiroun Library, Liban.Google Scholar
- Mohammad Asfour. 2003. Problems in modern English-Arabic lexicography. Zeitschrift fur Arabische Linguistik 42, 41--52.Google Scholar
- A. Boudlal, A. Lakhouaja, A. Mazroui, A. Meziane, M. Bebah, and M. Shoul. 2010. Alkhalil Morpho Sys1: A morphosyntactic analysis system for Arabic texts. In Proceedings of the International Arab Conference on Information Technology.Google Scholar
- Meilin Chen. 2006. An evaluation of the hand-held electronic dictionaries used by Chinese EFL learners. In Proceedings of the Pacific-Asia Conference on Language, Information, and Computation. 63--66.Google Scholar
- Isabella Chiari. 2006. Performance evaluation of Italian electronic dictionaries: User’s needs and requirements. In Proceedings of Atti del XII Congresso Internazionale di Lessicografia. 141--146.Google Scholar
- Fathi Debili and Emna Souissi. 1998. Etiquetage grammatical de l’Arabe voyellé ou non. In Proceedings of the Workshop on Computational Approaches to Semitic Languages. 16--25. Google ScholarCross Ref
- Andre Dussart. 2005. Faux sens, contresens, non-sens un faux debat? Meta 50, 1, 107--119. Google ScholarCross Ref
- El Hadji Mamadou Nguer, Mouhamadou Khoule, Mouhamad Ndiankho Thiam, Mbaye Baba Thiam, Ousmane Thiare, and Mame-Thierno Cisse. 2015. Dictionnaires wolof en ligne: Etat de l’art et perspectives. In Proceedings of Colloque National sur la Recherche en Informatique et ses Applications (CNRIA’15).Google Scholar
- Diaa Mohamed Fayed, Aly Aly Fahmy, Mohsen Abdelrazek Rashwan, and Wafaa Kamel Fayed. 2014. Towards structuring an Arabic English machine readable dictionary using parsing expression grammars. International Journal of Computational Linguistics Research 5, 1, 1--13.Google Scholar
- Gil Francopoulo (Ed.). 2013. LMF: Lexical Markup Framework. Wiley-ISTE. Google ScholarCross Ref
- Al-Tahir A. Hafiz. 1996. A Critical Study of the Structure of Bilingual Dictionaries from the Point of View of Their Usefulness to Translators, with Special Reference to English and Arabic. Ph.D. Dissertation. University of Manchester.Google Scholar
- Mohamed Rached Hamzaoui. 1986. Arab Dictionary Issues in the Past and the Present. Dar Algarb alislmi, Beirut, Liban.Google Scholar
- Nancy Ide and Laurent Romary. 2004. International standard for a linguistic annotation framework. Natural Language Engineering 10, 3--4, 211--225. Google ScholarDigital Library
- Maks Isa, Tiberius Carole, and van Veenendaal Remco. 2008. Standardising bilingual lexical resources according to the Lexicon Markup Framework. In Proceedings of the International Conference on Language Resource and Evaluation (LREC’08). 1723--1727.Google Scholar
- Aminul Islam and Diana Inkpen. 2008. Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data 2, 2, 10.Google Scholar
- Aida Khemakhem, Imen Elleuch, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2009. Towards an automatic conversion approach of editorial Arabic dictionaries into LMF-ISO 24613 standardized model. In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools.Google Scholar
- Aïda Khemakhem, Bilel Gargouri, Abdelhamid Abdelwahed, and Gil Francopoulo. 2007. Modélisation des paradigmes de flexion des verbes arabes selon la norme LMF-ISO 24613. In Proceedings of la Conference sur le Traitement Automatique des Langues Naturelles (TALN’07).Google Scholar
- Aida Khemakhem, Bilel Gargouri, Abdelmajid Ben Hamadou, and Gil Francopoulo. 2016. ISO standard modeling of a large Arabic dictionary. Natural Language Engineering 22, 849--879. Issue 6. DOI:http://dx.doi.org/10.1017/S1351324915000224 Google ScholarCross Ref
- Marielle Khoury. 1996. Dictionnaires Arabes Bilingues: Presentation Historique et etude Comparative. University of Ottawa, Canada.Google Scholar
- Che Ming Lee, Jia Wei Chang, Tung Cheng Hsieh, Hui Hui Chen, and Ching Hui Chen. 2012. Similarity measure based on semantic patterns. Advances in Information Sciences and Service Sciences 4, 18, 10.Google ScholarDigital Library
- Yuhua Li, David McLean, Zuhair Bandar, James D. O’Shea, and Keeley Crockett. 2006. Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering 18, 8, 1138--1150. Google ScholarDigital Library
- Xiaoying Liu, Yiming Zhou, and Ruoshi Zheng. 2007. Sentence similarity based on dynamic time warping. In Proceedings of the 2007 International Conference on Semantic Computing (ICSC’07). IEEE, Los Alamitos, CA, 250--256. Google ScholarDigital Library
- Mathieu Mangeot and Chantal Enguehard. 2011. Informatisation de dictionnaires langues Africaines-Francais. In Proceedings of Actes de l’atelier Traitment Automatique des Langues Africaines (TALAF’11). 1--11.Google Scholar
- Igor Melcuk and Alain Polguere. 2008. Predicats et quasi-predicats semantiques dans une perspective lexicographique. Revue de Linguistique et de Didactique des Langues 37, 99--114.Google Scholar
- Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed El Kholy, Ramy Eskander, Nizar Habash, Manoj Pooleery, Owen Rambow, and Ryan M. Roth. 2014. Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC’14).Google Scholar
- Alain Polguere. 1992. Remarques sur les reseaux semantiques Sens-Texte. In Le mot, Les mots, Les bons mots, A. Clas (Ed.). Presses de l’Universite de Montreal, Montreal, Canada, 109--148.Google Scholar
- Mohammed Reqqass, Abdelhak Lakhouaja, Azzedine Mazroui, and Mohamed Bebah. 2014. Conception et réalisation dun système de production de dictionnaires arabes respectant la norme LMF. In Proceedings of the International Conference on Arabic Language Processing. 1--10.Google Scholar
- Max Silberztein. 2005. NooJ: a linguistic annotation system for corpus processing. In Proceedings of HLT/EMNLP on Interactive Demonstrations (HLT-Demo’05). 10--11. Google ScholarDigital Library
- Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2013a. LMF-based approach for detecting semantic anomalies in electronic dictionaries. In Proceedings of the ASIALEX 8th International Conference. 242--252.Google Scholar
- Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2013b. Towards detecting anomalies in the content of standardized LMF dictionaries. In Proceedings of the 2013 International Conference on Recent Advances in Natural Language Processing (RANLP’13). 719--726.Google Scholar
- Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2014. Using standardized lexical semantic knowledge to measure similarity. In Knowledge Science, Engineering and Management. Springer, 93--104. Google ScholarCross Ref
- Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2014. A system for evaluating the content of LMF Arabic dictionaries. In Proceedings of the 5th International Conference on Arabic Language Processing (CITALA’14).Google Scholar
- Wafa Wali, Bilel Gargouri, and Abelmajid Ben Hamadou. 2015. Supervised learning to measure the semantic similarity between Arabic sentences. In Computational Collective Intelligence. Springer, 158--167. Google ScholarCross Ref
- David Zajic, Michael Maxwell, David Doermann, Paul Rodrigues, and Michael Bloodgood. 2014. Correcting errors in digital lexicographic resources using a dictionary manipulation language. arXiv:1410.7787.Google Scholar
Index Terms
- Evaluating the Content of LMF Standardized Dictionaries: A Practical Experiment on Arabic Language
Recommendations
SEWAR: A corpus-based N-gram approach for extracting semantically-related words from Arabic medical corpus
AbstractAutomatic aggregation of similar words into semantically related groups (or clusters) is of interest to many natural language processing (NLP) applications. Extracting semantically related words and quasi-synonyms from text is a relatively new ...
Highlights- Extraction of multiword terms from an Arabic medical corpus is illustrated.
- A corpus-based multiword terms extraction algorithm is implemented.
- Extraction of lexical sequences of n-grams is discussed.
- FastText word embedding ...
The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense Disambiguation
Word sense disambiguation (WSD) is the task of automatically determining the meaning of a polysemous word in a specific context. Word sense induction is the unsupervised clustering of word usages in a different context to distinguish senses and perform ...
Developing a tagset for automated POS tagging in Arabic
ICCOMP'06: Proceedings of the 10th WSEAS international conference on ComputersArabic language has much more syntactical and morphological information. Diacritics, which are marks placed over and below the letters of Arabic word, play a great role in adding linguistic attributes to Arabic word in part-of-speech tagging system. ...
Comments