Abstract
Word sense disambiguation (WSD) task aims to find the exact sense of an ambiguous word in a particular context. It is crucial for many applications, including machine translation, information retrieval, and semantic textual similarity. Arabic WSD faces significant challenges, primarily due to the scarcity of resources, which hinders the development of robust deep learning models. Additionally, the semantic sparsity of context further complicates the task, as Arabic words often exhibit multiple meanings. In this paper, we propose WSDTN, a manually annotated corpus, designed to fill this gap and to enable the automatic disambiguation of Arabic words. It consists of 27530 sentences collected from different resources and spanning different domains, each with a target word and its appropriate sense. We present the novel corpus itself, its creation procedure for reproducibility and a transformer based model to disambiguate new words and evaluate the performance of the corpus. The experimental results show that the baseline approach achieves an accuracy of around 90%. The corpus is publically available upon request and is open for extension.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ma, J., Li, L.: Data augmentation for Chinese text classification using back-translation. In: Journal of Physics: Conference Series, vol. 1651, no. 1, p. 012039. IOP Publishing (2020)
Elmougy, S., Taher, H., Noaman, H.: Naïve Bayes classifier for Arabic word sense disambiguation. In: Proceeding of the 6th International Conference on Informatics and Systems, pp. 16–21 (2008)
El-Gedawy, M.N.: Using fuzzifiers to solve word sense ambiguation in Arabic language. Int. J. Comput. Appl. 79(2) (2013)
Alkhatlan, A., Kalita, J., Alhaddad, A.: Word sense disambiguation for Arabic exploiting Arabic wordnet and word embedding. Procedia Comput. Sci. 142, 50–60 (2018)
Hadni, M., Ouatik, S.E.A., Lachkar, A.: Word sense disambiguation for Arabic text categorization. Int. Arab J. Inf. Technol. 13(1A), 215–222 (2016)
Merhbene, L., Zouaghi, A., Zrigui, M.: An experimental study for some supervised lexical disambiguation methods of Arabic language. In: Fourth International Conference on Information and Communication Technology and Accessibility (ICTA), pp. 1–6. IEEE (2013)
Laatar, R., Aloulou, C., Belghuith, L.H.: Word2vec for Arabic word sense disambiguation. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds.) NLDB 2018. LNCS, vol. 10859, pp. 308–311. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91947-8_32
El-Razzaz, M., Fakhr, M.W., Maghraby, F.A.: Arabic gloss WSD using BERT. Appl. Sci. 11(6), 2567 (2021)
Antoun, W., Baly, F., Hajj, H.: AraBERT: transformer-based model for Arabic language understanding. arXiv preprint arXiv:2003.00104 (2020)
Abdul-Mageed, M., Elmadany, A., Nagoudi, E.M.B.: ARBERT & MARBERT: deep bidirectional transformers for Arabic. arXiv preprint arXiv:2101.01785 (2020)
Safaya, A., Abdullatif, M., Yuret, D.: KUISAIL at SemEval-2020 Task 12: BERT-CNN for offensive speech identification in social media. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 2054–2059 (2020)
Libovický, J., Rosa, R., Fraser, A.: How language-neutral is multilingual BERT? arXiv preprint arXiv:1911.03310 (2019)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Vial, L., Lecouteux, B., Schwab, D.: UFSAC: unification of sense annotated corpora and tools. In: Language Resources and Evaluation Conference (LREC) (2018)
Saidi, R., Jarray, F.: Combining BERT representation and POS tagger for Arabic word sense disambiguation. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 676–685. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96308-8_63
El-Gamml, M.M., Fakhr, M.W., Rashwan, M.A., Al-Said, A.B.: A comparative study for Arabic word sense disambiguation using document preprocessing and machine learning techniques. In: Arabic Language Technology International Conference, Bibliotheca Alexandrina, CBA, vol. 11 (2011)
Al-Hajj, M., Jarrar, M.: ArabGlossBERT: fine-tuning BERT on context-gloss pairs for WSD. arXiv preprint arXiv:2205.09685 (2022)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Saidi, R., Jarray, F., Kang, J., Schwab, D.: GPT-2 contextual data augmentation for word sense disambiguation. In: Pacific Asia Conference on Language, Information and Computation (2022)
Saidi, R., Jarray, F., Alsuhaibani, M.: Comparative analysis of recurrent neural network architectures for Arabic word sense disambiguation. In: Proceedings of the 18th International Conference on Web Information Systems and Technologies, WEBIST 2022, 25–27 October 2022 (2022)
MarBERRT model. https://huggingface.co/UBC-NLP/MARBERT. Accessed 10 Nov 2022
Camel Bert. https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-ca. Accessed 10 Nov 2022
Arabic WordNet. http://globalwordnet.org/resources/arabic-wordnet/awn-browser/. Accessed 20 Mar 2021
Ontonotes. https://goo.gl/peHdKQ. Accessed 10 Feb 2023
Doha dictionnaries. https://www.dohadictionary.org/. Accessed 14 Dec 2022
Arabic Digital dictionnaries. https://www.almaany.com/. Accessed 18 Jan 2023
ArBERT. https://huggingface.co/UBC-NLP/ARBERT. Accessed 10 Nov 2022
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Saidi, R., Jarray, F., Akacha, A., Aribi, W. (2023). WSDTN a Novel Dataset for Arabic Word Sense Disambiguation. In: Nguyen, N.T., et al. Advances in Computational Collective Intelligence. ICCCI 2023. Communications in Computer and Information Science, vol 1864. Springer, Cham. https://doi.org/10.1007/978-3-031-41774-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-41774-0_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41773-3
Online ISBN: 978-3-031-41774-0
eBook Packages: Computer ScienceComputer Science (R0)