WSDTN a Novel Dataset for Arabic Word Sense Disambiguation

Saidi, Rakia; Jarray, Fethi; Akacha, Asma; Aribi, Wissem

doi:10.1007/978-3-031-41774-0_16

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1864))

Included in the following conference series:

International Conference on Computational Collective Intelligence

404 Accesses
1 Citations

Abstract

Word sense disambiguation (WSD) task aims to find the exact sense of an ambiguous word in a particular context. It is crucial for many applications, including machine translation, information retrieval, and semantic textual similarity. Arabic WSD faces significant challenges, primarily due to the scarcity of resources, which hinders the development of robust deep learning models. Additionally, the semantic sparsity of context further complicates the task, as Arabic words often exhibit multiple meanings. In this paper, we propose WSDTN, a manually annotated corpus, designed to fill this gap and to enable the automatic disambiguation of Arabic words. It consists of 27530 sentences collected from different resources and spanning different domains, each with a target word and its appropriate sense. We present the novel corpus itself, its creation procedure for reproducibility and a transformer based model to disambiguate new words and evaluate the performance of the corpus. The experimental results show that the baseline approach achieves an accuracy of around 90%. The corpus is publically available upon request and is open for extension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ma, J., Li, L.: Data augmentation for Chinese text classification using back-translation. In: Journal of Physics: Conference Series, vol. 1651, no. 1, p. 012039. IOP Publishing (2020)
Google Scholar
Elmougy, S., Taher, H., Noaman, H.: Naïve Bayes classifier for Arabic word sense disambiguation. In: Proceeding of the 6th International Conference on Informatics and Systems, pp. 16–21 (2008)
Google Scholar
El-Gedawy, M.N.: Using fuzzifiers to solve word sense ambiguation in Arabic language. Int. J. Comput. Appl. 79(2) (2013)
Google Scholar
Alkhatlan, A., Kalita, J., Alhaddad, A.: Word sense disambiguation for Arabic exploiting Arabic wordnet and word embedding. Procedia Comput. Sci. 142, 50–60 (2018)
Article Google Scholar
Hadni, M., Ouatik, S.E.A., Lachkar, A.: Word sense disambiguation for Arabic text categorization. Int. Arab J. Inf. Technol. 13(1A), 215–222 (2016)
Google Scholar
Merhbene, L., Zouaghi, A., Zrigui, M.: An experimental study for some supervised lexical disambiguation methods of Arabic language. In: Fourth International Conference on Information and Communication Technology and Accessibility (ICTA), pp. 1–6. IEEE (2013)
Google Scholar
Laatar, R., Aloulou, C., Belghuith, L.H.: Word2vec for Arabic word sense disambiguation. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds.) NLDB 2018. LNCS, vol. 10859, pp. 308–311. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91947-8_32
Chapter Google Scholar
El-Razzaz, M., Fakhr, M.W., Maghraby, F.A.: Arabic gloss WSD using BERT. Appl. Sci. 11(6), 2567 (2021)
Article Google Scholar
Antoun, W., Baly, F., Hajj, H.: AraBERT: transformer-based model for Arabic language understanding. arXiv preprint arXiv:2003.00104 (2020)
Abdul-Mageed, M., Elmadany, A., Nagoudi, E.M.B.: ARBERT & MARBERT: deep bidirectional transformers for Arabic. arXiv preprint arXiv:2101.01785 (2020)
Safaya, A., Abdullatif, M., Yuret, D.: KUISAIL at SemEval-2020 Task 12: BERT-CNN for offensive speech identification in social media. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 2054–2059 (2020)
Google Scholar
Libovický, J., Rosa, R., Fraser, A.: How language-neutral is multilingual BERT? arXiv preprint arXiv:1911.03310 (2019)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Vial, L., Lecouteux, B., Schwab, D.: UFSAC: unification of sense annotated corpora and tools. In: Language Resources and Evaluation Conference (LREC) (2018)
Google Scholar
Saidi, R., Jarray, F.: Combining BERT representation and POS tagger for Arabic word sense disambiguation. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 676–685. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96308-8_63
Chapter Google Scholar
El-Gamml, M.M., Fakhr, M.W., Rashwan, M.A., Al-Said, A.B.: A comparative study for Arabic word sense disambiguation using document preprocessing and machine learning techniques. In: Arabic Language Technology International Conference, Bibliotheca Alexandrina, CBA, vol. 11 (2011)
Google Scholar
Al-Hajj, M., Jarrar, M.: ArabGlossBERT: fine-tuning BERT on context-gloss pairs for WSD. arXiv preprint arXiv:2205.09685 (2022)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Saidi, R., Jarray, F., Kang, J., Schwab, D.: GPT-2 contextual data augmentation for word sense disambiguation. In: Pacific Asia Conference on Language, Information and Computation (2022)
Google Scholar
Saidi, R., Jarray, F., Alsuhaibani, M.: Comparative analysis of recurrent neural network architectures for Arabic word sense disambiguation. In: Proceedings of the 18th International Conference on Web Information Systems and Technologies, WEBIST 2022, 25–27 October 2022 (2022)
Google Scholar
MarBERRT model. https://huggingface.co/UBC-NLP/MARBERT. Accessed 10 Nov 2022
Camel Bert. https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-ca. Accessed 10 Nov 2022
Arabic WordNet. http://globalwordnet.org/resources/arabic-wordnet/awn-browser/. Accessed 20 Mar 2021
Ontonotes. https://goo.gl/peHdKQ. Accessed 10 Feb 2023
Doha dictionnaries. https://www.dohadictionary.org/. Accessed 14 Dec 2022
Arabic Digital dictionnaries. https://www.almaany.com/. Accessed 18 Jan 2023
ArBERT. https://huggingface.co/UBC-NLP/ARBERT. Accessed 10 Nov 2022

Download references

Author information

Authors and Affiliations

LIMTIC Laboratory, UTM University, Tunis, Tunisia
Rakia Saidi
ESLI Laboratory, Faculty of Letters Arts and Humanities of Manouba, UMA University, Tunis, Tunisia
Asma Akacha & Wissem Aribi
Higher Institute of Computer Science of Medenine, Gabes University, Medenine, Tunisia
Fethi Jarray

Authors

Rakia Saidi
View author publications
You can also search for this author in PubMed Google Scholar
Fethi Jarray
View author publications
You can also search for this author in PubMed Google Scholar
Asma Akacha
View author publications
You can also search for this author in PubMed Google Scholar
Wissem Aribi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rakia Saidi .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Eötvös Loránd University, Budapest, Hungary
János Botzheim
Eötvös Loránd University, Budapest, Hungary
László Gulyás
Universidad Complutense de Madrid, Madrid, Spain
Manuel Nunez
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Jan Treur
University of Münster, Münster, Germany
Gottfried Vossen
Wrocław University of Science and Technology, Wrocław, Poland
Adrianna Kozierkiewicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saidi, R., Jarray, F., Akacha, A., Aribi, W. (2023). WSDTN a Novel Dataset for Arabic Word Sense Disambiguation. In: Nguyen, N.T., et al. Advances in Computational Collective Intelligence. ICCCI 2023. Communications in Computer and Information Science, vol 1864. Springer, Cham. https://doi.org/10.1007/978-3-031-41774-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-41774-0_16
Published: 22 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41773-3
Online ISBN: 978-3-031-41774-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

WSDTN a Novel Dataset for Arabic Word Sense Disambiguation