Abstract
Natural language processing (NLP) captured the attention of researchers for the last years. NLP is applied in various applications and several disciplines. Arabic is a language that also benefited from NLP. However, only few Arabic datasets are available for researchers. For that, applying the Arabic NLP is limited in these datasets. Hence, this paper introduces a new dataset, SNAD. SNAD is collected to fill the gap in Arabic datasets, especially for classification using deep learning. The dataset has more than 45,000 records. Each record consists of the news title, news details, in addition to the news class. The dataset has six different classes. Moreover, cleaning and preprocessing are applied to the raw data to make it more efficient for classification purpose. Finally, the dataset is validated using the Convolutional Neural Networks and the result is efficient. The dataset is freely available online.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zitouni, I.: Natural Language Processing of Semitic Languages. Springer, Heidelberg (2014)
Comrie, B.: The World’s Major Languages. Routledge, Abingdon (2009)
Shah, M.: The Arabic language (2008)
Einea, O., Elnagar, A., Al Debsi, R.: SANAD: single-label Arabic news articles dataset for automatic text categorization. Data Brief 25, 104076 (2019)
Boukil, S., Biniz, M., El Adnani, F., Cherrat, L., El Moutaouakkil, A.E.: Arabic text classification using deep learning technics. Int. J. Grid Distrib. Comput. 11(9), 103–114 (2018)
Alalyani, N., Marie-Sainte, S.L.: NADA: new arabic dataset for text classification. Int. J. Adv. Comput. Sci. Appl. 9(9) (2018)
Belkebir, R., Guessoum, A.: TALAA-ASC: a sentence compression corpus for Arabic. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), pp. 1–8. IEEE (2015)
Abuaiadah, D.,El Sana, J., Abusalah, W.: On the impact of dataset characteristics on Arabic document classification. Int. J. Comput. Appl. 101(7) (2014)
Sabbah, T., Ayyash, M., Ashraf, M.: Hybrid support vector machine based feature selection method for text classification. Int. Arab J. Inf. Technol. 15(3A), 599–609 (2018)
Abuaiadah, D.: Arabic document classification using multiword features. Int. J. Comput. Commun. Eng. 2(6), 659 (2013)
Alhawarat, M., Hegazi, M.: Revisiting K-means and topic modeling, a comparison study to cluster arabic documents. IEEE Access 6, 42740–42749 (2018)
The official Saudi press agency, May 2019. https://www.spa.gov.sa/
Alriyadh newspaper, May 2019. http://www.alriyadh.com/
Parsehub, May 2019. https://www.parsehub.com/, May 2019
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
AlSaleh, D., AlAmir, M.B., Larabi-Marie-Sainte, S. (2021). SNAD Arabic Dataset for Deep Learning. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1250. Springer, Cham. https://doi.org/10.1007/978-3-030-55180-3_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-55180-3_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55179-7
Online ISBN: 978-3-030-55180-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)