In the Identification of Arabic Dialects: A Loss Function Ensemble Learning Based-Approach

Jamal, Salma; Khaled, Salma; Kassem, Aly M.; Eltabey, Ayaalla; Osama, Alaa; Mohamed, Samah; Elattar, Mustafa A.

doi:10.1007/978-3-031-21595-7_7

Salma Jamal¹⁰,
Salma Khaled¹⁰,
Aly M. Kassem¹¹,
Ayaalla Eltabey¹⁰,
Alaa Osama¹⁰,
Samah Mohamed¹⁰ &
…
Mustafa A. Elattar¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13761))

Included in the following conference series:

International Conference on Model and Data Engineering

601 Accesses
1 Citations

Abstract

The automation of a system to accurately identify Arabic dialects many natural language processing tasks, including sentiment analysis, medical chatbots, Arabic speech recognition, machine translation, etc., will greatly benefit because it’s useful to understand the text’s dialect before performing different tasks to it. Different Arabic-speaking nations have adopted various dialects and writing systems. Most of the Arab countries understand modern standard Arabic (MSA), which is the native language of all other Arabic dialects. In this paper we propose a method for identifying Arabic dialects Using the Arabic Online Commentary dataset (AOC), which includes three Arabic dialects-Gulf, Levantine, and egyptian-alongside MSA. Our approach includes two ensemble learning strategies using two BERT-based models and different loss functions such as focal loss, dice loss, and weighted cross-entropy loss. The first strategy is between the two proposed models using the loss function that performed best on the models, and the other is between the same model but using different loss functions, which resulted in 83.3%, 80.1%, 85.8%, 81.45%, Precision, Recall, Accuracy and Macro-F1 on the test set respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdelali, A., Mubarak, H., Samih, Y., Hassan, S., Darwish, K.: QADI: arabic dialect identification in the wild. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp. 1–10 (2021)
Google Scholar
Abdul-Mageed, M., Elmadany, A., Nagoudi, E.M.B.: Arbert & marbert: deep bidirectional transformers for arabic. arXiv preprint arXiv:2101.01785 (2020)
Abdul-Mageed, M., Zhang, C., Bouamor, H., Habash, N.: Nadi 2020: The first nuanced Arabic dialect identification shared task. In: Proceedings of the Fifth Arabic Natural Language Processing Workshop, pp. 97–110 (2020)
Google Scholar
Ali, M.: Character level convolutional neural network for Arabic dialect identification. In: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pp. 122–127 (2018)
Google Scholar
Antoun, W., Baly, F., Hajj, H.: Arabert: transformer-based model for Arabic language understanding. arXiv preprint arXiv:2003.00104 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Article Google Scholar
El-Khair, I.A.: 1.5 billion words Arabic corpus. arXiv preprint arXiv:1611.04033 (2016)
El Mekki, A., Alami, A., Alami, H., Khoumsi, A., Berrada, I.: Weighted combination of BERT and n-gram features for nuanced Arabic dialect identification. In: Proceedings of the Fifth Arabic Natural Language Processing Workshop, pp. 268–274 (2020)
Google Scholar
Elaraby, M., Abdul-Mageed, M.: Deep models for Arabic dialect identification on benchmarked data. In: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pp. 263–274 (2018)
Google Scholar
Elfardy, H., Diab, M.: Sentence level dialect identification in Arabic. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 456–461 (2013)
Google Scholar
Hajiabadi, H., Molla-Aliod, D., Monsefi, R., Yazdi, H.S.: Combination of loss functions for deep text classification. Int. J. Mach. Learn. Cybern. 11(4), 751–761 (2020)
Article Google Scholar
Issa, E., AlShakhori, M., Al-Bahrani, R., Hahn-Powell, G.: Country-level Arabic dialect identification using RNNs with and without linguistic features. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp. 276–281 (2021)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, W., Madhavi, M., Das, R.K., Li, H.: Transformer-based Arabic dialect identification. In: 2020 International Conference on Asian Language Processing (IALP), pp. 192–196. IEEE (2020)
Google Scholar
Lulu, L., Elnagar, A.: Automatic Arabic dialect classification using deep learning models. Proc. Comput. Sci. 142, 262–269 (2018)
Article Google Scholar
Malmasi, S., Zampieri, M.: Arabic dialect identification in speech transcripts. In: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pp. 106–113 (2016)
Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE (2016)
Google Scholar
Mostafa, A., Mohamed, O., Ashraf, A.: GOF at Arabic hate speech 2022: breaking the loss function convention for data-imbalanced Arabic offensive text detection. In: Proceedings of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection, pp. 167–175. European Language Resources Association, Marseille, France, June 2022. http://www.lrec-conf.org/proceedings/lrec2022/workshops/OSACT/pdf/2022.osact-1.21.pdf
Obeid, O., Salameh, M., Bouamor, H., Habash, N.: ADIDA: automatic dialect identification for Arabic. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 6–11 (2019)
Google Scholar
Salameh, M., Bouamor, H., Habash, N.: Fine-grained Arabic dialect identification. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1332–1344 (2018)
Google Scholar
Shon, S., Ali, A., Samih, Y., Mubarak, H., Glass, J.: Adi17: a fine-grained Arabic dialect identification dataset. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8244–8248 (2020)
Google Scholar
Shoufan, A., Alameri, S.: Natural language processing for dialectical Arabic: a survey. In: Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 36–48 (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Zaidan, O.F., Callison-Burch, C.: Arabic dialect identification. Comput. Linguist. 40(1), 171–202 (2014)
Article Google Scholar
Zeroual, I., Goldhahn, D., Eckart, T., Lakhouaja, A.: OSIAN: open source international Arabic news corpus-preparation and integration into the Clarin-infrastructure. In: Proceedings of the Fourth Arabic Natural Language Processing Workshop, pp. 175–182 (2019)
Google Scholar

Download references

Acknowledgments

This research is supported by the Vector Scholarship in Artificial Intelligence, provided through the Vector Institute.

Author information

Authors and Affiliations

School of Information Technology and Computer Science, Nile University, Giza, Egypt
Salma Jamal, Salma Khaled, Ayaalla Eltabey, Alaa Osama, Samah Mohamed & Mustafa A. Elattar
School of Computer Science, University of Windsor, Windsor, Canada
Aly M. Kassem

Authors

Salma Jamal
View author publications
You can also search for this author in PubMed Google Scholar
Salma Khaled
View author publications
You can also search for this author in PubMed Google Scholar
Aly M. Kassem
View author publications
You can also search for this author in PubMed Google Scholar
Ayaalla Eltabey
View author publications
You can also search for this author in PubMed Google Scholar
Alaa Osama
View author publications
You can also search for this author in PubMed Google Scholar
Samah Mohamed
View author publications
You can also search for this author in PubMed Google Scholar
Mustafa A. Elattar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salma Jamal .

Editor information

Editors and Affiliations

Shenzhen University, Shenzhen, Guangdong, China
Philippe Fournier-Viger
Nile University, Giza, Egypt
Ahmed Hassan
ISAE-ENSMA, Poitiers, France
Ladjel Bellatreche

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jamal, S. et al. (2023). In the Identification of Arabic Dialects: A Loss Function Ensemble Learning Based-Approach. In: Fournier-Viger, P., Hassan, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2022. Lecture Notes in Computer Science, vol 13761. Springer, Cham. https://doi.org/10.1007/978-3-031-21595-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-21595-7_7
Published: 19 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21594-0
Online ISBN: 978-3-031-21595-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

In the Identification of Arabic Dialects: A Loss Function Ensemble Learning Based-Approach