Abstract
Social networking sites have spread widely in recent years, and through them, a large amount of data is shared in all its forms: text, photo, voice, and video. It also allows communication with users through different forms such as chat, comments, and Posts, and the most exchanged content is in the form of text data. These results a large volume of data displayed to each user. This encouraged and attracted the attention of researchers to make an effort to analyze and work on this large amount of data available for free on the online social networks, most efforts focus on Twitter and English data. Building dataset is the most time-consuming and the most important part of the text classification process. Despite the increase in the number of Arabic users and the increase in Arabic content on online social Networks (OSN), there is a scarcity in Arabic datasets collected from social networks for text classification purpose. So In this paper, Arabic social dataset was built to be used in text classification purpose. our dataset was gathered from Facebook, it consists of 25,000 posts were collected from different Facebook pages and were classified into ten categories, politics, economics, sport, religion, technology, TV, ads, foods, health, and porno. The dataset was assessed to ten Arabic local speakers and Facebook users to evaluate the validity of the dataset made. We used a RapidMiner tool to evaluate and compute the performance of our dataset. We obtained a classification accuracy of 95.12%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bodkhe, R., Ghorpade, T., Jethani, V.: A novel methodology to filter out unwanted messages from OSN user’s wall using trust value calculation. In: Proceedings of the Second International Conference on Computer and Communication Technologies, pp. 755–764. Springer (2016)
Ghosh, S., Roy, S., Bandyopadhyay, S.: A tutorial review on text mining algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 1(4), 7 (2012)
Internet World Stats: Internet World Users by Language (2017). http://www.internetworldstats.com/stats7.htm. Accessed 13 Sep 2017
Al-Tahrawi, M.M., Al-Khatib, S.N.: Arabic text classification using polynomial networks. J. King Saud Univ. Comput. Inf. Sci. 27(4), 437–449 (2015)
Al-Sallab, A., Baly, R., Hajj, H., Shaban, K.B., El-Hajj, W., Badaro, G.: AROMA: a recursive deep learning model for opinion mining in Arabic as a low resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 16(4), 25 (2017)
Al-Kabi, M., Al-Qudah, N.M., Alsmadi, I., Dabour, M., Wahsheh, H.: Arabic/English sentiment analysis: an empirical study. In: The Fourth International Conference on Information and Communication Systems (ICICS 2013), pp. 23–25 (2013)
Abdul-Mageed, M., Diab, M., Kübler, S.: SAMAR: subjectivity and sentiment analysis for Arabic social media. Comput. Speech Lang. 28(1), 20–37 (2014)
Yin, C., Xiang, J., Zhang, H., Wang, J., Yin, Z., Kim, J.-U.: A new SVM method for short text classification based on semi-supervised learning. In: 2015 4th International Conference on Advanced Information Technology and Sensor Application (AITS), pp. 100–103. IEEE (2015)
Al Mukhaiti, A.J.S., Siddiqui, S., Shaalan, K.: Dataset built for Arabic sentiment analysis. In: International Conference on Advanced Intelligent Systems and Informatics, pp. 406–416. Springer (2017)
Siddiqui, S., Monem, A.A., Shaalan, K.: Sentiment analysis in Arabic. In: International Conference on Applications of Natural Language to Information Systems, pp. 409–414. Springer (2016)
Alayba, A.M., Palade, V., England, M., Iqbal, R.: Arabic language sentiment analysis on health services. arXiv preprint arXiv:1702.03197 (2017)
Borges, L.C., Marques, V.M., Bernardino, J.: Comparison of data mining techniques and tools for data classification. In: Proceedings of the International C* Conference on Computer Science and Software Engineering, pp. 113–116. ACM (2013)
RapidMiner Documentation. https://docs.rapidminer.com/. Accessed 10 Sep 2017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Omar, A., Mahmoud, T.M., Abd-El-Hafeez, T. (2018). Building Online Social Network Dataset for Arabic Text Classification. In: Hassanien, A., Tolba, M., Elhoseny, M., Mostafa, M. (eds) The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018). AMLTA 2018. Advances in Intelligent Systems and Computing, vol 723. Springer, Cham. https://doi.org/10.1007/978-3-319-74690-6_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-74690-6_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74689-0
Online ISBN: 978-3-319-74690-6
eBook Packages: EngineeringEngineering (R0)