Abstract
The automation of a system to accurately identify Arabic dialects many natural language processing tasks, including sentiment analysis, medical chatbots, Arabic speech recognition, machine translation, etc., will greatly benefit because it’s useful to understand the text’s dialect before performing different tasks to it. Different Arabic-speaking nations have adopted various dialects and writing systems. Most of the Arab countries understand modern standard Arabic (MSA), which is the native language of all other Arabic dialects. In this paper we propose a method for identifying Arabic dialects Using the Arabic Online Commentary dataset (AOC), which includes three Arabic dialects-Gulf, Levantine, and egyptian-alongside MSA. Our approach includes two ensemble learning strategies using two BERT-based models and different loss functions such as focal loss, dice loss, and weighted cross-entropy loss. The first strategy is between the two proposed models using the loss function that performed best on the models, and the other is between the same model but using different loss functions, which resulted in 83.3%, 80.1%, 85.8%, 81.45%, Precision, Recall, Accuracy and Macro-F1 on the test set respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdelali, A., Mubarak, H., Samih, Y., Hassan, S., Darwish, K.: QADI: arabic dialect identification in the wild. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp. 1–10 (2021)
Abdul-Mageed, M., Elmadany, A., Nagoudi, E.M.B.: Arbert & marbert: deep bidirectional transformers for arabic. arXiv preprint arXiv:2101.01785 (2020)
Abdul-Mageed, M., Zhang, C., Bouamor, H., Habash, N.: Nadi 2020: The first nuanced Arabic dialect identification shared task. In: Proceedings of the Fifth Arabic Natural Language Processing Workshop, pp. 97–110 (2020)
Ali, M.: Character level convolutional neural network for Arabic dialect identification. In: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pp. 122–127 (2018)
Antoun, W., Baly, F., Hajj, H.: Arabert: transformer-based model for Arabic language understanding. arXiv preprint arXiv:2003.00104 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
El-Khair, I.A.: 1.5 billion words Arabic corpus. arXiv preprint arXiv:1611.04033 (2016)
El Mekki, A., Alami, A., Alami, H., Khoumsi, A., Berrada, I.: Weighted combination of BERT and n-gram features for nuanced Arabic dialect identification. In: Proceedings of the Fifth Arabic Natural Language Processing Workshop, pp. 268–274 (2020)
Elaraby, M., Abdul-Mageed, M.: Deep models for Arabic dialect identification on benchmarked data. In: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pp. 263–274 (2018)
Elfardy, H., Diab, M.: Sentence level dialect identification in Arabic. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 456–461 (2013)
Hajiabadi, H., Molla-Aliod, D., Monsefi, R., Yazdi, H.S.: Combination of loss functions for deep text classification. Int. J. Mach. Learn. Cybern. 11(4), 751–761 (2020)
Issa, E., AlShakhori, M., Al-Bahrani, R., Hahn-Powell, G.: Country-level Arabic dialect identification using RNNs with and without linguistic features. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp. 276–281 (2021)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Lin, W., Madhavi, M., Das, R.K., Li, H.: Transformer-based Arabic dialect identification. In: 2020 International Conference on Asian Language Processing (IALP), pp. 192–196. IEEE (2020)
Lulu, L., Elnagar, A.: Automatic Arabic dialect classification using deep learning models. Proc. Comput. Sci. 142, 262–269 (2018)
Malmasi, S., Zampieri, M.: Arabic dialect identification in speech transcripts. In: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pp. 106–113 (2016)
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE (2016)
Mostafa, A., Mohamed, O., Ashraf, A.: GOF at Arabic hate speech 2022: breaking the loss function convention for data-imbalanced Arabic offensive text detection. In: Proceedings of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection, pp. 167–175. European Language Resources Association, Marseille, France, June 2022. http://www.lrec-conf.org/proceedings/lrec2022/workshops/OSACT/pdf/2022.osact-1.21.pdf
Obeid, O., Salameh, M., Bouamor, H., Habash, N.: ADIDA: automatic dialect identification for Arabic. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 6–11 (2019)
Salameh, M., Bouamor, H., Habash, N.: Fine-grained Arabic dialect identification. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1332–1344 (2018)
Shon, S., Ali, A., Samih, Y., Mubarak, H., Glass, J.: Adi17: a fine-grained Arabic dialect identification dataset. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8244–8248 (2020)
Shoufan, A., Alameri, S.: Natural language processing for dialectical Arabic: a survey. In: Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 36–48 (2015)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Zaidan, O.F., Callison-Burch, C.: Arabic dialect identification. Comput. Linguist. 40(1), 171–202 (2014)
Zeroual, I., Goldhahn, D., Eckart, T., Lakhouaja, A.: OSIAN: open source international Arabic news corpus-preparation and integration into the Clarin-infrastructure. In: Proceedings of the Fourth Arabic Natural Language Processing Workshop, pp. 175–182 (2019)
Acknowledgments
This research is supported by the Vector Scholarship in Artificial Intelligence, provided through the Vector Institute.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jamal, S. et al. (2023). In the Identification of Arabic Dialects: A Loss Function Ensemble Learning Based-Approach. In: Fournier-Viger, P., Hassan, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2022. Lecture Notes in Computer Science, vol 13761. Springer, Cham. https://doi.org/10.1007/978-3-031-21595-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-21595-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21594-0
Online ISBN: 978-3-031-21595-7
eBook Packages: Computer ScienceComputer Science (R0)