Skip to main content

Building Online Social Network Dataset for Arabic Text Classification

  • Conference paper
  • First Online:
The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018) (AMLTA 2018)

Abstract

Social networking sites have spread widely in recent years, and through them, a large amount of data is shared in all its forms: text, photo, voice, and video. It also allows communication with users through different forms such as chat, comments, and Posts, and the most exchanged content is in the form of text data. These results a large volume of data displayed to each user. This encouraged and attracted the attention of researchers to make an effort to analyze and work on this large amount of data available for free on the online social networks, most efforts focus on Twitter and English data. Building dataset is the most time-consuming and the most important part of the text classification process. Despite the increase in the number of Arabic users and the increase in Arabic content on online social Networks (OSN), there is a scarcity in Arabic datasets collected from social networks for text classification purpose. So In this paper, Arabic social dataset was built to be used in text classification purpose. our dataset was gathered from Facebook, it consists of 25,000 posts were collected from different Facebook pages and were classified into ten categories, politics, economics, sport, religion, technology, TV, ads, foods, health, and porno. The dataset was assessed to ten Arabic local speakers and Facebook users to evaluate the validity of the dataset made. We used a RapidMiner tool to evaluate and compute the performance of our dataset. We obtained a classification accuracy of 95.12%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bodkhe, R., Ghorpade, T., Jethani, V.: A novel methodology to filter out unwanted messages from OSN user’s wall using trust value calculation. In: Proceedings of the Second International Conference on Computer and Communication Technologies, pp. 755–764. Springer (2016)

    Google Scholar 

  2. Ghosh, S., Roy, S., Bandyopadhyay, S.: A tutorial review on text mining algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 1(4), 7 (2012)

    Google Scholar 

  3. Internet World Stats: Internet World Users by Language (2017). http://www.internetworldstats.com/stats7.htm. Accessed 13 Sep 2017

  4. Al-Tahrawi, M.M., Al-Khatib, S.N.: Arabic text classification using polynomial networks. J. King Saud Univ. Comput. Inf. Sci. 27(4), 437–449 (2015)

    Google Scholar 

  5. Al-Sallab, A., Baly, R., Hajj, H., Shaban, K.B., El-Hajj, W., Badaro, G.: AROMA: a recursive deep learning model for opinion mining in Arabic as a low resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 16(4), 25 (2017)

    Google Scholar 

  6. Al-Kabi, M., Al-Qudah, N.M., Alsmadi, I., Dabour, M., Wahsheh, H.: Arabic/English sentiment analysis: an empirical study. In: The Fourth International Conference on Information and Communication Systems (ICICS 2013), pp. 23–25 (2013)

    Google Scholar 

  7. Abdul-Mageed, M., Diab, M., Kübler, S.: SAMAR: subjectivity and sentiment analysis for Arabic social media. Comput. Speech Lang. 28(1), 20–37 (2014)

    Article  Google Scholar 

  8. Yin, C., Xiang, J., Zhang, H., Wang, J., Yin, Z., Kim, J.-U.: A new SVM method for short text classification based on semi-supervised learning. In: 2015 4th International Conference on Advanced Information Technology and Sensor Application (AITS), pp. 100–103. IEEE (2015)

    Google Scholar 

  9. Al Mukhaiti, A.J.S., Siddiqui, S., Shaalan, K.: Dataset built for Arabic sentiment analysis. In: International Conference on Advanced Intelligent Systems and Informatics, pp. 406–416. Springer (2017)

    Google Scholar 

  10. Siddiqui, S., Monem, A.A., Shaalan, K.: Sentiment analysis in Arabic. In: International Conference on Applications of Natural Language to Information Systems, pp. 409–414. Springer (2016)

    Google Scholar 

  11. Alayba, A.M., Palade, V., England, M., Iqbal, R.: Arabic language sentiment analysis on health services. arXiv preprint arXiv:1702.03197 (2017)

  12. Borges, L.C., Marques, V.M., Bernardino, J.: Comparison of data mining techniques and tools for data classification. In: Proceedings of the International C* Conference on Computer Science and Software Engineering, pp. 113–116. ACM (2013)

    Google Scholar 

  13. RapidMiner Documentation. https://docs.rapidminer.com/. Accessed 10 Sep 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Omar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Omar, A., Mahmoud, T.M., Abd-El-Hafeez, T. (2018). Building Online Social Network Dataset for Arabic Text Classification. In: Hassanien, A., Tolba, M., Elhoseny, M., Mostafa, M. (eds) The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018). AMLTA 2018. Advances in Intelligent Systems and Computing, vol 723. Springer, Cham. https://doi.org/10.1007/978-3-319-74690-6_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-74690-6_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-74689-0

  • Online ISBN: 978-3-319-74690-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics