research-article

Evaluating the Content of LMF Standardized Dictionaries: A Practical Experiment on Arabic Language

Authors:
Wafa Wali

MIRACL Laboratory-University of Sfax

MIRACL Laboratory-University of Sfax
View Profile

,
Bilel Gargouri

MIRACL Laboratory-University of Sfax

MIRACL Laboratory-University of Sfax
View Profile

,
Adelmajid Ben Hamadou

MIRACL Laboratory-University of Sfax

MIRACL Laboratory-University of Sfax
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 16 Issue 4Article No.: 22pp 1–20https://doi.org/10.1145/3047406

Published:19 May 2017Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Since the age of paper versions, dictionaries are often published with anomalies in their content resulting from lexicographer’s mistakes or from the lack of efficiency of automatic enrichment systems. Many of these anomalies are expensive to manually detect and difficult to automatically control, notably with lightly structured models of dictionaries. In this article, we take advantage of the fine structure proposed by the Lexical Markup Framework (LMF) norm to investigate the detection of anomalies in the content of LMF normalized dictionaries. First, we give a theoretical study on the plausible anomalies, such as inconsistency, incoherence, redundancy, and incompleteness. Second, we detail the approach that we propose for the automatic detection of such anomalies. Finally, we report on an experiment carried out on an available normalized dictionary of the Arabic language. The experiment has shown that the proposed approach gives reasonable results in terms of precision and recall.

References

Adnen Alkhatib. 1967. Arabic Dictionary Between the Past and the Present. Nachiroun Library, Liban.Google Scholar
Mohammad Asfour. 2003. Problems in modern English-Arabic lexicography. Zeitschrift fur Arabische Linguistik 42, 41--52.Google Scholar
A. Boudlal, A. Lakhouaja, A. Mazroui, A. Meziane, M. Bebah, and M. Shoul. 2010. Alkhalil Morpho Sys¹: A morphosyntactic analysis system for Arabic texts. In Proceedings of the International Arab Conference on Information Technology.Google Scholar
Meilin Chen. 2006. An evaluation of the hand-held electronic dictionaries used by Chinese EFL learners. In Proceedings of the Pacific-Asia Conference on Language, Information, and Computation. 63--66.Google Scholar
Isabella Chiari. 2006. Performance evaluation of Italian electronic dictionaries: User’s needs and requirements. In Proceedings of Atti del XII Congresso Internazionale di Lessicografia. 141--146.Google Scholar
Fathi Debili and Emna Souissi. 1998. Etiquetage grammatical de l’Arabe voyellé ou non. In Proceedings of the Workshop on Computational Approaches to Semitic Languages. 16--25. Google ScholarCross Ref
Andre Dussart. 2005. Faux sens, contresens, non-sens un faux debat? Meta 50, 1, 107--119. Google ScholarCross Ref
El Hadji Mamadou Nguer, Mouhamadou Khoule, Mouhamad Ndiankho Thiam, Mbaye Baba Thiam, Ousmane Thiare, and Mame-Thierno Cisse. 2015. Dictionnaires wolof en ligne: Etat de l’art et perspectives. In Proceedings of Colloque National sur la Recherche en Informatique et ses Applications (CNRIA’15).Google Scholar
Diaa Mohamed Fayed, Aly Aly Fahmy, Mohsen Abdelrazek Rashwan, and Wafaa Kamel Fayed. 2014. Towards structuring an Arabic English machine readable dictionary using parsing expression grammars. International Journal of Computational Linguistics Research 5, 1, 1--13.Google Scholar
Gil Francopoulo (Ed.). 2013. LMF: Lexical Markup Framework. Wiley-ISTE. Google ScholarCross Ref
Al-Tahir A. Hafiz. 1996. A Critical Study of the Structure of Bilingual Dictionaries from the Point of View of Their Usefulness to Translators, with Special Reference to English and Arabic. Ph.D. Dissertation. University of Manchester.Google Scholar
Mohamed Rached Hamzaoui. 1986. Arab Dictionary Issues in the Past and the Present. Dar Algarb alislmi, Beirut, Liban.Google Scholar
Nancy Ide and Laurent Romary. 2004. International standard for a linguistic annotation framework. Natural Language Engineering 10, 3--4, 211--225. Google ScholarDigital Library
Maks Isa, Tiberius Carole, and van Veenendaal Remco. 2008. Standardising bilingual lexical resources according to the Lexicon Markup Framework. In Proceedings of the International Conference on Language Resource and Evaluation (LREC’08). 1723--1727.Google Scholar
Aminul Islam and Diana Inkpen. 2008. Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data 2, 2, 10.Google Scholar
Aida Khemakhem, Imen Elleuch, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2009. Towards an automatic conversion approach of editorial Arabic dictionaries into LMF-ISO 24613 standardized model. In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools.Google Scholar
Aïda Khemakhem, Bilel Gargouri, Abdelhamid Abdelwahed, and Gil Francopoulo. 2007. Modélisation des paradigmes de flexion des verbes arabes selon la norme LMF-ISO 24613. In Proceedings of la Conference sur le Traitement Automatique des Langues Naturelles (TALN’07).Google Scholar
Aida Khemakhem, Bilel Gargouri, Abdelmajid Ben Hamadou, and Gil Francopoulo. 2016. ISO standard modeling of a large Arabic dictionary. Natural Language Engineering 22, 849--879. Issue 6. DOI:http://dx.doi.org/10.1017/S1351324915000224 Google ScholarCross Ref
Marielle Khoury. 1996. Dictionnaires Arabes Bilingues: Presentation Historique et etude Comparative. University of Ottawa, Canada.Google Scholar
Che Ming Lee, Jia Wei Chang, Tung Cheng Hsieh, Hui Hui Chen, and Ching Hui Chen. 2012. Similarity measure based on semantic patterns. Advances in Information Sciences and Service Sciences 4, 18, 10.Google ScholarDigital Library
Yuhua Li, David McLean, Zuhair Bandar, James D. O’Shea, and Keeley Crockett. 2006. Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering 18, 8, 1138--1150. Google ScholarDigital Library
Xiaoying Liu, Yiming Zhou, and Ruoshi Zheng. 2007. Sentence similarity based on dynamic time warping. In Proceedings of the 2007 International Conference on Semantic Computing (ICSC’07). IEEE, Los Alamitos, CA, 250--256. Google ScholarDigital Library
Mathieu Mangeot and Chantal Enguehard. 2011. Informatisation de dictionnaires langues Africaines-Francais. In Proceedings of Actes de l’atelier Traitment Automatique des Langues Africaines (TALAF’11). 1--11.Google Scholar
Igor Melcuk and Alain Polguere. 2008. Predicats et quasi-predicats semantiques dans une perspective lexicographique. Revue de Linguistique et de Didactique des Langues 37, 99--114.Google Scholar
Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed El Kholy, Ramy Eskander, Nizar Habash, Manoj Pooleery, Owen Rambow, and Ryan M. Roth. 2014. Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC’14).Google Scholar
Alain Polguere. 1992. Remarques sur les reseaux semantiques Sens-Texte. In Le mot, Les mots, Les bons mots, A. Clas (Ed.). Presses de l’Universite de Montreal, Montreal, Canada, 109--148.Google Scholar
Mohammed Reqqass, Abdelhak Lakhouaja, Azzedine Mazroui, and Mohamed Bebah. 2014. Conception et réalisation dun système de production de dictionnaires arabes respectant la norme LMF. In Proceedings of the International Conference on Arabic Language Processing. 1--10.Google Scholar
Max Silberztein. 2005. NooJ: a linguistic annotation system for corpus processing. In Proceedings of HLT/EMNLP on Interactive Demonstrations (HLT-Demo’05). 10--11. Google ScholarDigital Library
Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2013a. LMF-based approach for detecting semantic anomalies in electronic dictionaries. In Proceedings of the ASIALEX 8th International Conference. 242--252.Google Scholar
Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2013b. Towards detecting anomalies in the content of standardized LMF dictionaries. In Proceedings of the 2013 International Conference on Recent Advances in Natural Language Processing (RANLP’13). 719--726.Google Scholar
Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2014. Using standardized lexical semantic knowledge to measure similarity. In Knowledge Science, Engineering and Management. Springer, 93--104. Google ScholarCross Ref
Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2014. A system for evaluating the content of LMF Arabic dictionaries. In Proceedings of the 5th International Conference on Arabic Language Processing (CITALA’14).Google Scholar
Wafa Wali, Bilel Gargouri, and Abelmajid Ben Hamadou. 2015. Supervised learning to measure the semantic similarity between Arabic sentences. In Computational Collective Intelligence. Springer, 158--167. Google ScholarCross Ref
David Zajic, Michael Maxwell, David Doermann, Paul Rodrigues, and Michael Bloodgood. 2014. Correcting errors in digital lexicographic resources using a dictionary manipulation language. arXiv:1410.7787.Google Scholar

Index Terms

Evaluating the Content of LMF Standardized Dictionaries: A Practical Experiment on Arabic Language

Recommendations

SEWAR: A corpus-based N-gram approach for extracting semantically-related words from Arabic medical corpus
Abstract
Automatic aggregation of similar words into semantically related groups (or clusters) is of interest to many natural language processing (NLP) applications. Extracting semantically related words and quasi-synonyms from text is a relatively new ...
Highlights
- Extraction of multiword terms from an Arabic medical corpus is illustrated.
- A corpus-based multiword terms extraction algorithm is implemented.
- Extraction of lexical sequences of n-grams is discussed.
- FastText word embedding ...
Read More
The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense Disambiguation
Word sense disambiguation (WSD) is the task of automatically determining the meaning of a polysemous word in a specific context. Word sense induction is the unsupervised clustering of word usages in a different context to distinguish senses and perform ...
Read More
Developing a tagset for automated POS tagging in Arabic
ICCOMP'06: Proceedings of the 10th WSEAS international conference on Computers

Arabic language has much more syntactical and morphological information. Diacritics, which are marks placed over and below the letters of Arabic word, play a great role in adding linguistic attributes to Arabic word in part-of-speech tagging system. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 16, Issue 4
December 2017
146 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3097269
Editor:
Nianwen Xue
Brandeis University, Waltham, USA
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 May 2017
- Accepted: 1 January 2017
- Revised: 1 October 2016
- Received: 1 May 2016
Published in tallip Volume 16, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Arabic language
LMF standardized dictionaries
LMF-ISO 24613
anomalies’ detection
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 196
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evaluating the Content of LMF Standardized Dictionaries: A Practical Experiment on Arabic Language

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

SEWAR: A corpus-based N-gram approach for extracting semantically-related words from Arabic medical corpus

The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense Disambiguation

Developing a tagset for automated POS tagging in Arabic

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Evaluating the Content of LMF Standardized Dictionaries: A Practical Experiment on Arabic Language

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

SEWAR: A corpus-based N-gram approach for extracting semantically-related words from Arabic medical corpus

The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense Disambiguation

Developing a tagset for automated POS tagging in Arabic

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media