SNAD Arabic Dataset for Deep Learning

AlSaleh, Deem; AlAmir, Mashael Bin; Larabi-Marie-Sainte, Souad

doi:10.1007/978-3-030-55180-3_47

Deem AlSaleh¹⁷,
Mashael Bin AlAmir¹⁷ &
Souad Larabi-Marie-Sainte¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1250))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

1113 Accesses
2 Citations

Abstract

Natural language processing (NLP) captured the attention of researchers for the last years. NLP is applied in various applications and several disciplines. Arabic is a language that also benefited from NLP. However, only few Arabic datasets are available for researchers. For that, applying the Arabic NLP is limited in these datasets. Hence, this paper introduces a new dataset, SNAD. SNAD is collected to fill the gap in Arabic datasets, especially for classification using deep learning. The dataset has more than 45,000 records. Each record consists of the news title, news details, in addition to the news class. The dataset has six different classes. Moreover, cleaning and preprocessing are applied to the raw data to make it more efficient for classification purpose. Finally, the dataset is validated using the Convolutional Neural Networks and the result is efficient. The dataset is freely available online.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zitouni, I.: Natural Language Processing of Semitic Languages. Springer, Heidelberg (2014)
Book Google Scholar
Comrie, B.: The World’s Major Languages. Routledge, Abingdon (2009)
Google Scholar
Shah, M.: The Arabic language (2008)
Google Scholar
Einea, O., Elnagar, A., Al Debsi, R.: SANAD: single-label Arabic news articles dataset for automatic text categorization. Data Brief 25, 104076 (2019)
Article Google Scholar
Boukil, S., Biniz, M., El Adnani, F., Cherrat, L., El Moutaouakkil, A.E.: Arabic text classification using deep learning technics. Int. J. Grid Distrib. Comput. 11(9), 103–114 (2018)
Article Google Scholar
Alalyani, N., Marie-Sainte, S.L.: NADA: new arabic dataset for text classification. Int. J. Adv. Comput. Sci. Appl. 9(9) (2018)
Google Scholar
Belkebir, R., Guessoum, A.: TALAA-ASC: a sentence compression corpus for Arabic. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), pp. 1–8. IEEE (2015)
Google Scholar
Abuaiadah, D.,El Sana, J., Abusalah, W.: On the impact of dataset characteristics on Arabic document classification. Int. J. Comput. Appl. 101(7) (2014)
Google Scholar
Sabbah, T., Ayyash, M., Ashraf, M.: Hybrid support vector machine based feature selection method for text classification. Int. Arab J. Inf. Technol. 15(3A), 599–609 (2018)
Google Scholar
Abuaiadah, D.: Arabic document classification using multiword features. Int. J. Comput. Commun. Eng. 2(6), 659 (2013)
Article Google Scholar
Alhawarat, M., Hegazi, M.: Revisiting K-means and topic modeling, a comparison study to cluster arabic documents. IEEE Access 6, 42740–42749 (2018)
Article Google Scholar
The official Saudi press agency, May 2019. https://www.spa.gov.sa/
Alriyadh newspaper, May 2019. http://www.alriyadh.com/
Parsehub, May 2019. https://www.parsehub.com/, May 2019
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia
Deem AlSaleh, Mashael Bin AlAmir & Souad Larabi-Marie-Sainte

Authors

Deem AlSaleh
View author publications
You can also search for this author in PubMed Google Scholar
Mashael Bin AlAmir
View author publications
You can also search for this author in PubMed Google Scholar
Souad Larabi-Marie-Sainte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Deem AlSaleh , Mashael Bin AlAmir or Souad Larabi-Marie-Sainte .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

AlSaleh, D., AlAmir, M.B., Larabi-Marie-Sainte, S. (2021). SNAD Arabic Dataset for Deep Learning. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1250. Springer, Cham. https://doi.org/10.1007/978-3-030-55180-3_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-55180-3_47
Published: 25 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55179-7
Online ISBN: 978-3-030-55180-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics