skip to main content
research-article

Evaluating the Content of LMF Standardized Dictionaries: A Practical Experiment on Arabic Language

Published:19 May 2017Publication History
Skip Abstract Section

Abstract

Since the age of paper versions, dictionaries are often published with anomalies in their content resulting from lexicographer’s mistakes or from the lack of efficiency of automatic enrichment systems. Many of these anomalies are expensive to manually detect and difficult to automatically control, notably with lightly structured models of dictionaries. In this article, we take advantage of the fine structure proposed by the Lexical Markup Framework (LMF) norm to investigate the detection of anomalies in the content of LMF normalized dictionaries. First, we give a theoretical study on the plausible anomalies, such as inconsistency, incoherence, redundancy, and incompleteness. Second, we detail the approach that we propose for the automatic detection of such anomalies. Finally, we report on an experiment carried out on an available normalized dictionary of the Arabic language. The experiment has shown that the proposed approach gives reasonable results in terms of precision and recall.

References

  1. Adnen Alkhatib. 1967. Arabic Dictionary Between the Past and the Present. Nachiroun Library, Liban.Google ScholarGoogle Scholar
  2. Mohammad Asfour. 2003. Problems in modern English-Arabic lexicography. Zeitschrift fur Arabische Linguistik 42, 41--52.Google ScholarGoogle Scholar
  3. A. Boudlal, A. Lakhouaja, A. Mazroui, A. Meziane, M. Bebah, and M. Shoul. 2010. Alkhalil Morpho Sys1: A morphosyntactic analysis system for Arabic texts. In Proceedings of the International Arab Conference on Information Technology.Google ScholarGoogle Scholar
  4. Meilin Chen. 2006. An evaluation of the hand-held electronic dictionaries used by Chinese EFL learners. In Proceedings of the Pacific-Asia Conference on Language, Information, and Computation. 63--66.Google ScholarGoogle Scholar
  5. Isabella Chiari. 2006. Performance evaluation of Italian electronic dictionaries: User’s needs and requirements. In Proceedings of Atti del XII Congresso Internazionale di Lessicografia. 141--146.Google ScholarGoogle Scholar
  6. Fathi Debili and Emna Souissi. 1998. Etiquetage grammatical de l’Arabe voyellé ou non. In Proceedings of the Workshop on Computational Approaches to Semitic Languages. 16--25. Google ScholarGoogle ScholarCross RefCross Ref
  7. Andre Dussart. 2005. Faux sens, contresens, non-sens un faux debat? Meta 50, 1, 107--119. Google ScholarGoogle ScholarCross RefCross Ref
  8. El Hadji Mamadou Nguer, Mouhamadou Khoule, Mouhamad Ndiankho Thiam, Mbaye Baba Thiam, Ousmane Thiare, and Mame-Thierno Cisse. 2015. Dictionnaires wolof en ligne: Etat de l’art et perspectives. In Proceedings of Colloque National sur la Recherche en Informatique et ses Applications (CNRIA’15).Google ScholarGoogle Scholar
  9. Diaa Mohamed Fayed, Aly Aly Fahmy, Mohsen Abdelrazek Rashwan, and Wafaa Kamel Fayed. 2014. Towards structuring an Arabic English machine readable dictionary using parsing expression grammars. International Journal of Computational Linguistics Research 5, 1, 1--13.Google ScholarGoogle Scholar
  10. Gil Francopoulo (Ed.). 2013. LMF: Lexical Markup Framework. Wiley-ISTE. Google ScholarGoogle ScholarCross RefCross Ref
  11. Al-Tahir A. Hafiz. 1996. A Critical Study of the Structure of Bilingual Dictionaries from the Point of View of Their Usefulness to Translators, with Special Reference to English and Arabic. Ph.D. Dissertation. University of Manchester.Google ScholarGoogle Scholar
  12. Mohamed Rached Hamzaoui. 1986. Arab Dictionary Issues in the Past and the Present. Dar Algarb alislmi, Beirut, Liban.Google ScholarGoogle Scholar
  13. Nancy Ide and Laurent Romary. 2004. International standard for a linguistic annotation framework. Natural Language Engineering 10, 3--4, 211--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Maks Isa, Tiberius Carole, and van Veenendaal Remco. 2008. Standardising bilingual lexical resources according to the Lexicon Markup Framework. In Proceedings of the International Conference on Language Resource and Evaluation (LREC’08). 1723--1727.Google ScholarGoogle Scholar
  15. Aminul Islam and Diana Inkpen. 2008. Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data 2, 2, 10.Google ScholarGoogle Scholar
  16. Aida Khemakhem, Imen Elleuch, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2009. Towards an automatic conversion approach of editorial Arabic dictionaries into LMF-ISO 24613 standardized model. In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools.Google ScholarGoogle Scholar
  17. Aïda Khemakhem, Bilel Gargouri, Abdelhamid Abdelwahed, and Gil Francopoulo. 2007. Modélisation des paradigmes de flexion des verbes arabes selon la norme LMF-ISO 24613. In Proceedings of la Conference sur le Traitement Automatique des Langues Naturelles (TALN’07).Google ScholarGoogle Scholar
  18. Aida Khemakhem, Bilel Gargouri, Abdelmajid Ben Hamadou, and Gil Francopoulo. 2016. ISO standard modeling of a large Arabic dictionary. Natural Language Engineering 22, 849--879. Issue 6. DOI:http://dx.doi.org/10.1017/S1351324915000224 Google ScholarGoogle ScholarCross RefCross Ref
  19. Marielle Khoury. 1996. Dictionnaires Arabes Bilingues: Presentation Historique et etude Comparative. University of Ottawa, Canada.Google ScholarGoogle Scholar
  20. Che Ming Lee, Jia Wei Chang, Tung Cheng Hsieh, Hui Hui Chen, and Ching Hui Chen. 2012. Similarity measure based on semantic patterns. Advances in Information Sciences and Service Sciences 4, 18, 10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yuhua Li, David McLean, Zuhair Bandar, James D. O’Shea, and Keeley Crockett. 2006. Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering 18, 8, 1138--1150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xiaoying Liu, Yiming Zhou, and Ruoshi Zheng. 2007. Sentence similarity based on dynamic time warping. In Proceedings of the 2007 International Conference on Semantic Computing (ICSC’07). IEEE, Los Alamitos, CA, 250--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mathieu Mangeot and Chantal Enguehard. 2011. Informatisation de dictionnaires langues Africaines-Francais. In Proceedings of Actes de l’atelier Traitment Automatique des Langues Africaines (TALAF’11). 1--11.Google ScholarGoogle Scholar
  24. Igor Melcuk and Alain Polguere. 2008. Predicats et quasi-predicats semantiques dans une perspective lexicographique. Revue de Linguistique et de Didactique des Langues 37, 99--114.Google ScholarGoogle Scholar
  25. Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed El Kholy, Ramy Eskander, Nizar Habash, Manoj Pooleery, Owen Rambow, and Ryan M. Roth. 2014. Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC’14).Google ScholarGoogle Scholar
  26. Alain Polguere. 1992. Remarques sur les reseaux semantiques Sens-Texte. In Le mot, Les mots, Les bons mots, A. Clas (Ed.). Presses de l’Universite de Montreal, Montreal, Canada, 109--148.Google ScholarGoogle Scholar
  27. Mohammed Reqqass, Abdelhak Lakhouaja, Azzedine Mazroui, and Mohamed Bebah. 2014. Conception et réalisation dun système de production de dictionnaires arabes respectant la norme LMF. In Proceedings of the International Conference on Arabic Language Processing. 1--10.Google ScholarGoogle Scholar
  28. Max Silberztein. 2005. NooJ: a linguistic annotation system for corpus processing. In Proceedings of HLT/EMNLP on Interactive Demonstrations (HLT-Demo’05). 10--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2013a. LMF-based approach for detecting semantic anomalies in electronic dictionaries. In Proceedings of the ASIALEX 8th International Conference. 242--252.Google ScholarGoogle Scholar
  30. Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2013b. Towards detecting anomalies in the content of standardized LMF dictionaries. In Proceedings of the 2013 International Conference on Recent Advances in Natural Language Processing (RANLP’13). 719--726.Google ScholarGoogle Scholar
  31. Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2014. Using standardized lexical semantic knowledge to measure similarity. In Knowledge Science, Engineering and Management. Springer, 93--104. Google ScholarGoogle ScholarCross RefCross Ref
  32. Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2014. A system for evaluating the content of LMF Arabic dictionaries. In Proceedings of the 5th International Conference on Arabic Language Processing (CITALA’14).Google ScholarGoogle Scholar
  33. Wafa Wali, Bilel Gargouri, and Abelmajid Ben Hamadou. 2015. Supervised learning to measure the semantic similarity between Arabic sentences. In Computational Collective Intelligence. Springer, 158--167. Google ScholarGoogle ScholarCross RefCross Ref
  34. David Zajic, Michael Maxwell, David Doermann, Paul Rodrigues, and Michael Bloodgood. 2014. Correcting errors in digital lexicographic resources using a dictionary manipulation language. arXiv:1410.7787.Google ScholarGoogle Scholar

Index Terms

  1. Evaluating the Content of LMF Standardized Dictionaries: A Practical Experiment on Arabic Language

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Asian and Low-Resource Language Information Processing
            ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 16, Issue 4
            December 2017
            146 pages
            ISSN:2375-4699
            EISSN:2375-4702
            DOI:10.1145/3097269
            Issue’s Table of Contents

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 May 2017
            • Accepted: 1 January 2017
            • Revised: 1 October 2016
            • Received: 1 May 2016
            Published in tallip Volume 16, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed
          • Article Metrics

            • Downloads (Last 12 months)2
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader