research-article

Arabic Word Sense Disambiguation for Information Retrieval

Authors:
Mohammed Alaeddine Abderrahim

Laboratoire de Traitement Automatique de la Langue Arabe (LTALA), Tlemcen University, Tlemcen, Algeria

Laboratoire de Traitement Automatique de la Langue Arabe (LTALA), Tlemcen University, Tlemcen, Algeria

0000-0001-7819-2354
View Profile

,
Mohammed El-Amine Abderrahim

Laboratoire de Traitement Automatique de la Langue Arabe (LTALA), Tlemcen University, Tlemcen, Algeria

Laboratoire de Traitement Automatique de la Langue Arabe (LTALA), Tlemcen University, Tlemcen, Algeria
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21 Issue 4Article No.: 69pp 1–19https://doi.org/10.1145/3510451

Published:19 January 2022Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

In the context of using semantic resources for information retrieval, the relationship and distance between concepts are considered important for word sense disambiguation. In this article, we experiment with Conceptual Density and Random Walk with graph methods to enhance the performance of the Arabic Information Retrieval System. To do this, a medium-sized corpus was used. The results proved that Random Walk can enhance the performance of the information retrieval system by achieving a mean improvement of 13%, 16%, and 12% in terms of recall, precision, and F-score, respectively.

REFERENCES

[1] Abderrahim M. A.. 2016. Exploitation des Ontologies dans les Systèmes de recherche d’informations Arabes. Thése de doctorat, Universitè de Tlemcen, Algerie.Google Scholar
[2] Abderrahim M. A., Dib M., Abderrahim M. A., and Chikh M. A.. 2016. Semantic indexing of Arabic texts for information retrieval system. Int. J. Speech Technol. 19, 2 (June 2016), 229–236. https://doi.org/10.1007/s10772-015-9307-3 Google ScholarDigital Library
[3] Agirre E., Arregi X., Artola X., Ilarraza A. Díaz de, and Sarasola K.. 1994. Conceptual distance and automatic spelling correction. Technical Report. Retrieved from http://www.researchgate.net/publication/2273948_Conceptual_Distance_and_Automatic_Spelling_Correction/file/79e4150b4d909428bc.pdf%5Cnhttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.9218.Google Scholar
[4] Agirre E., Lacalle O. López de, and Soroa A.. 2014. Random walks for knowledge-based word sense disambiguation. Comput. Linguist. 40, 1 (Mar. 2014), 57–84. https://doi.org/10.1162/COLI_a_00164 Google ScholarDigital Library
[5] Agirre E. and Rigau G.. 1995. A proposal for word sense disambiguation using conceptual distance. Technical Report. https://doi.org/10.1075/cilt.136.16agiGoogle Scholar
[6] Agirre E. and Rigau G.. 1996. An Experiment in Word Sense Disambiguation of the Brown Corpus Using WordNet. Technical Report.Google Scholar
[7] Agirre E. and Soroa A.. 2008. Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation. Technical Report. 1388–1392. Retrieved from http://nipadio.lsi.upc.es/nlp/meaning.Google Scholar
[8] Agirre E. and Soroa A.. 2009. Personalizing PageRank for word sense disambiguation. 33–41. Retrieved from https://dl.acm.org/citation.cfm?id=1609070. Google ScholarDigital Library
[9] Al-Shalabi R., Kanaan G., Yaseen M., AlSarayreh B., and Al-Naji N.. 2009. Arabic query expansion using interactive word sense disambiguation. In Proceedings of the 2nd International Conference in Arabic Language Resources and Tools.Google Scholar
[10] Alkhatlan A., Kalita J., and Alhaddad A.. 2018. Word sense disambiguation for arabic exploiting arabic WordNet and word embedding. Procedia Comput. Sci. 142 (2018), 50–60.Google ScholarCross Ref
[11] Banerjee S. and Pedersen T.. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03). 805–810. Google ScholarDigital Library
[12] Basile P., Caputo A., and Semeraro G.. 2014. An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model. Technical Report. 1591–1600. Retrieved from https://www.aclweb.org/anthology/C14-1151.Google Scholar
[13] F. Boubekeur, M. Boughanem, L. Tamine, and M. Daoud. 2010. De l’utilisation deWordNet pour l’indexation conceptuelle des documents. In le 13 ème Colloque International sur le Document Electronique (CIDE 13), 16-17 Décembre 2010, INHA, Paris. France. https://lorexplor.istex.fr/Wicri/Ticri/CIDE/fr/images/d/db/CIDE_%282010%29_Boubekeur.pdf.Google Scholar
[14] Bouhriz N., Benabbou F., and Lahmar H. Ben. 2016. Word sense disambiguation approach for arabic text. Int. J. Adv. Comput. Sci. Appl. 7, 4 (2016), 381–385.Google Scholar
[15] Brin S. and Page L.. 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine The Anatomy of a Search Engine. Technical Report. 107–117. https://doi.org/10.1016/S0169-7552(98)00110-XGoogle Scholar
[16] Buscaldi D. and Rosso P.. 2008. A conceptual density based approach for the disambiguation of toponyms. Int. J. Geogr. Info. Sci. 22, 3 (Mar. 2008), 301–313. https://doi.org/10.1080/13658810701626251 Google ScholarDigital Library
[17] Buscaldi D., Rosso P., and Masulli F.. 2004. Integrating conceptual density with WordNet Domains and CALD glosses for noun sense disambiguation. In Advances in Natural Language Processing. Springer, Berlin, 183–194. https://doi.org/10.1007/978-3-540-30228-5_17Google ScholarCross Ref
[18] Camacho-Collados J., Pilehvar M. T., and Navigli R.. 2016. Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artific. Intell. 240 (Nov. 2016), 36–64. https://doi.org/10.1016/J.ARTINT.2016.07.005Google ScholarCross Ref
[19] Che W., Liu Y., Wang Y., Zheng B., and Liu T.. 2018. Towards better UD parsing: Deep contextualized word embeddings, ensemble, and treebank concatenation. In Proceedings of the CoNLL Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Association for Computational Linguistics, Brussels, Belgium, 55–64. Retrieved from http://www.aclweb.org/anthology/K18-2005.Google Scholar
[20] Cowie J., Guthrie J., and Guthrie L.. 1992. Lexical disambiguation using simulated annealing. In Proceedings of the 14th conference on Computational linguistics, Vol. 1. Association for Computational Linguistics, 359–365. https://doi.org/10.3115/992066.992125 Google ScholarDigital Library
[21] Devlin J., Chang M., Lee K., and Toutanova K.. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Association for Computational Linguistics, 4171–4186.Google Scholar
[22] Farghaly A. and Shaalan K.. 2009. Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Inform. Process (2009). Google ScholarDigital Library
[23] Gliozzo A., Magnini B., and Strapparava C.. 2004. Unsupervised Domain Relevance Estimation for Word Sense Disambiguation. Technical Report. 380–387. Retrieved from http://wndomains.itc.it.Google Scholar
[24] Hadni M., Ouatik S. E. A., and Lachkar A.. 2016. Word sense disambiguation for arabic text categorization. Int. Arab J. Inf. Technol. 13(1A) (2016), 215–222.Google Scholar
[25] Lesk M. E.. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a nice cream cone. In Proceedings of the SIGDOC Conference. Google ScholarDigital Library
[26] Magnini B. and Cavaglia G.. 2000. Integrating subject field codes into WordNet. Proceedings of the 2nd International Conference on Language Resources and Evaluation Theoretical Aspects of Computer Software. 1413–1418.Google Scholar
[27] Magnini B., Strapparava C., Pezzulo G., and Gliozzo A.. 2002. The role of domain information in Word Sense Disambiguation. Natural Lang. Eng. 8, 4 (Dec. 2002), 359–373. https://doi.org/10.1017/S1351324902003029 Google ScholarDigital Library
[28] Menai M. B.. 2014. Word sense disambiguation using an evolutionary approach. Informatica 38, 2 (2014), 155–169. Google ScholarDigital Library
[29] Mihalcea R., Chklovski T., and Kilgarriff A.. 2004. The SENSEVAL-3 English Lexical Sample Task. Technical Report. 25–28. Retrieved from http://digital.library.unt.edu/ark:/67531/metadc30963/.Google Scholar
[30] Mihalcea R. and Moldovan D. I.. 1999. A method for word sense disambiguation of unrestricted text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics. 152–158. https://doi.org/10.3115/1034678.1034709 Google ScholarDigital Library
[31] Mihalcea R. and Moldovan D. I.. 2000. Semantic indexing using WordNet senses. In Proceedings of the ACL Workshop on IR & NLP. 35–45. Retrieved from http://www.seas.smu.edu/rada/papers/acl00.nlp_ir.ps.gz. Google ScholarDigital Library
[32] Mohammad S. and Hirst G.. 2006. Determining Word Sense Dominance Using a Thesaurus. Technical Report. 121–128. Retrieved from https://www.aclweb.org/anthology/E06-1016.Google Scholar
[33] Moro A., Raganato A., and Navigli R.. 2014. Entity Linking meets Word Sense Disambiguation. Technical Report. 231–244. https://doi.org/:10.1371/journal.pone.0098221Google Scholar
[34] Navigli R. and Lapata M.. 2007. Graph connectivity measures for unsupervised word sense disambiguation. Technical Report. 1683–1688. Retrieved from https://www.research.ed.ac.uk/portal/files/24353052/IJCAI07_272.pdf. Google ScholarDigital Library
[35] Peters M., Neumann M., Iyyer M., Gardner M., Clark C., Lee K., and Zettlemoyer L.. 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1. Association for Computational Linguistics, 2227–2237.Google ScholarCross Ref
[36] Ponzetto S. P. and Navigli R.. 2010. Knowledge-rich Word Sense Disambiguation rivaling supervised systems. 1522–1531. Retrieved from https://dl.acm.org/citation.cfm?id=1858835. Google ScholarDigital Library
[37] Rigau G., Atserias J., and Agirre E.. 1997. Combining unsupervised lexical knowledge methods for word sense disambiguation. In Proceedings of the 35th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 48–55. https://doi.org/10.3115/976909.979624 Google ScholarDigital Library
[38] Rijsbergen C. J. Van. 1979. Information retrieval. J. Amer. Soc. Info. Sci. 30, 6 (1979), 374–375. Google ScholarDigital Library
[39] Rosso P., Masulli F., Buscaldi D., Pla F., and Molina A.. 2003. Automatic noun sense disambiguation. In Lecture Notes Computer Science. Springer, Berlin, 273–276. https://doi.org/10.1007/3-540-36456-0_27 Google ScholarDigital Library
[40] Schutze H.. 1998. Automatic word sense discrimination. Comput. Linguist. 24, 1 (1998), 97–123. Google ScholarDigital Library
[41] Sinha R. and Mihalcea R.. 2007. Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. Technical Report. 363–369. https://doi.org/10.1109/ICSC.2007.87 Google ScholarDigital Library
[42] Sussna M.. 1993. Word sense disambiguation for free-text indexing using a massive semantic network. Technical Report. 67–74. https://doi.org/10.1145/170088.170106 Google ScholarDigital Library
[43] Tripodi R. and Pelillo M.. 2017. A game-theoretic approach to word sense disambiguation. Comput. Linguist. 43, 1 (2017), 31–70. https://doi.org/10.1162/COLI_a_00274 Google ScholarDigital Library
[44] Voorhees E. and Ellen M.. 1993. Using WordNet to disambiguate word senses for text retrieval. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’93). ACM Press, 171–180. https://doi.org/10.1145/160688.160715 Google ScholarDigital Library
[45] Weissenborn D., Hennig L., Xu F., and Uszkoreit H.. 2015. Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities. Technical Report. 596–605. https://doi.org/10.3115/v1/p15-1058Google Scholar
[46] Wilks Y., Fass D., Guo C., McDonald J. E., Plate T., and Slator B. M.. 1990. Providing machine tractable dictionary tools. Mach. Translat. 5, 2 (June 1990), 99–154. https://doi.org/10.1007/BF00393758Google ScholarCross Ref
[47] Yarowsky D.. 1992. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. Technical Report. 454–460. https://doi.org/10.3115/992133.992140 Google ScholarDigital Library
[48] Zouaghi A., Marhbène L., and Zrigui M.. 2012. A hybrid approach for Arabic word sense disambiguation. Int. J. Comput. Process. Lang. 24, 2 (2012), 133–151.Google ScholarCross Ref
[49] Zouaghi A., Merhbene L., and Zrigui M.. 2011. Word Sense disambiguation for Arabic language using the variants of the Lesk algorithm. Technical Report. 561–567. Retrieved from http://www.lidi.info.unlp.edu.ar/WorldComp2011-Mirror/ICA4686.pdf.Google Scholar
[50] Zouaghi A., Zrigui M., Antoniadis G., and Merhbene L.. 2012. Contribution to semantic analysis of arabic language. In Advances in Artificial Intelligence. Springer 1–8. https://doi.org/10.1155/2012/620461 Google ScholarDigital Library

Index Terms

Arabic Word Sense Disambiguation for Information Retrieval
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding
Abstract
Word Sense Disambiguation (WSD) is a task which aims to identify the meaning of a word given its context. This problem has been investigated and analyzed in depth in English. However, work in Arabic has been limited despite the fact that there are ...
Read More
Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary

Word sense disambiguation (WSD) is meant to assign the most appropriate sense to a polysemous word according to its context. We present a method for automatic WSD using only two resources: a raw text corpus and a machine-readable dictionary (MRD). The ...
Read More
Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary

Word sense disambiguation (WSD) is meant to assign the most appropriate sense to a polysemous word according to its context. We present a method for automatic WSD using only two resources: a raw text corpus and a machine-readable dictionary (MRD). The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 4
July 2022
464 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3511099
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 January 2022
- Accepted: 1 October 2021
- Revised: 1 May 2021
- Received: 1 January 2020
Published in tallip Volume 21, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
IRS
conceptual density
PageRank
Arabic WordNet
WSD
information retrieval
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 501
  Total Downloads
- Downloads (Last 12 months)62
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Arabic Word Sense Disambiguation for Information Retrieval

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding

Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary

Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

Arabic Word Sense Disambiguation for Information Retrieval

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding

Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary

Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media