TunTap: A Tunisian Dataset for Topic and Polarity Extraction in Social Media

Djebbi, Mohamed Amine; Ouersighni, Riadh

doi:10.1007/978-3-031-16014-1_40

Mohamed Amine Djebbi^12,13 &
Riadh Ouersighni^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13501))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1224 Accesses

Abstract

The massive usage of social networks has recently opened up new research avenues in the fields of data mining and decision-making. One of the most relevant forms of data generated by users in social media is an unstructured text that identifies their emotions on a given topic. Analyzing this new form of writing to extract valuable information is a challenging task, and could be of great interest in several fields such as healthcare, business intelligence, marketing strategies,$\ldots $ to name but a few. This article considers topic and polarity extraction in application to Online Social Media (OSM) analysis, in the benefit of numerous domain applications. Implementing sentiment analysis and topic extraction algorithms for the purpose of detecting the polarity of a given comment towards a certain topic requires a sophisticated machine and deep learning supervised models and, at the same time, collecting, preparing and annotating a huge amount of data to train those models.

In this paper, we propose a special dataset that can be used to extract both topic and polarity features from dialectical messages used in Tunisian daily electronic writing across the most popular OSM networks. We collected our data by crawling posts and comments’ text from Facebook, Twitter and YouTube using related network graph API. In this work, we describe the whole pipeline used to prepare our corpus as well as the several extensive experiments setup and results conducted to evaluate the generated dataset. Up to our knowledge, the proposed multivariate Arabic dataset (Topic and Polarity) of Tunisian dialect is a first-time introduced in the NLP community up to now, and we made it publicly available on GitHub (https://github.com/DescoveryAmine/TunTap).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Contribution to the Moroccan Darija sentiment analysis in social networks

Article 20 October 2023

Leveraging ParsBERT for cross-domain polarity sentiment classification of Persian social media comments

Article 24 June 2023

Resources building for sentiment analysis of content disseminated by Tunisian medias in social networks

Article 02 December 2023

Notes

1.
http://www.socialbakers.com/website/data/industry-report.

References

Abu Kwaik, K., Chatzikyriakidis, S., Dobnik, S., Saad, M., Johansson, R.: An arabic tweets sentiment analysis dataset (ATSAD) using distant supervision and self training. In: Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pp. 1–8. European Language Resource Association (05 2020)
Google Scholar
Al-khurayji, R., Sameh, A.: An effective Arabic text classification approach based on kernel Naive Bayes classifier (2017). https://doi.org/10.5121/IJAIA.2017.8601
Article Google Scholar
Alayba, A.M., Palade, V., England, M., Iqbal, R.: A combined CNN and LSTM model for Arabic sentiment analysis. arXiv:1807.02911 [cs] 11015, 179–191 (2018). https://doi.org/10.1007/978-3-319-99740-7_12
Baly, R., et al.: Comparative evaluation of sentiment analysis methods across arabic dialects. Procedia Comput. Sci. 117, 266–273 (2017). https://doi.org/10.1016/j.procs.2017.10.118
Baly, R., Khaddaj, A., Hajj, H., El-Hajj, W., Shaban, K.B.: ArSentD-LEV: a multi-topic corpus for target-based sentiment analysis in Arabic levantine tweets. arXiv:1906.01830 [cs, stat], 25 May 2019
Cotterell, R., Callison-Burch, C.: A multi-dialect, multi-genre corpus of informal written Arabic. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 241–245. European Language Resources Association (ELRA), May 2014
Google Scholar
Fairon, C., Klein, J., Sébastien, P.: Le langage SMS : révélateur d’1compétence, 01 January 2006
Google Scholar
Fourati, C., Messaoudi, A., Haddad, H.: TUNIZI: a tunisian arabizi sentiment analysis dataset. arXiv:2004.14303 [cs] (2020–04-29)
Meftouh, K., Bouchemal, N., Smaïli, K.: A study of a non-resourced language: an Algerian dialect. In: SLTU (2012)
Google Scholar
Mohammed, A., Kora, R.: Deep learning approaches for Arabic sentiment analysis. Soc. Netw. Anal. Min. 9(1), 1–12 (2019). https://doi.org/10.1007/s13278-019-0596-4
Article Google Scholar
Moudjari, L., Aklii Astouati, K.: An experimental study on sentiment classification of Algerian dialect texts. Procedia Comput. Sci. 176, 1151–1159 (2020). https://doi.org/10.1016/j.procs.2020.09.111
Nabil, M., Aly, M., Atiya, A.: ASTD: Arabic sentiment tweets dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2515–2519. Association for Computational Linguistics, September 2015. https://doi.org/10.18653/v1/D15-1299
Taoufiq, Z., Chiheb, R., Moumen, R., Faizi, R., El Afia, A.: Topic and sentiment model applied to the colloquial Arabic: a case study of Maghrebi Arabic, 21 June 2017. https://doi.org/10.1145/3128128.3128155
Wahdan, A., Hantoobi, S., Salloum, S., Shaalan, K.: A systematic review of text classification research based on deep learning models in Arabic language, pp. 6629–6643, 12 January 2020. https://doi.org/10.11591/ijece.v10i6.pp6629-6643
Younes, J., Hadhémi, A., Souissi, E.: Constructing linguistic resources for the Tunisian dialect using textual user-generated contents on the social web. vol. 9396, pp. 3–14, 23 June 2015. https://doi.org/10.1007/978-3-319-24800-4_1

Download references

Author information

Authors and Affiliations

Sciences and Technologies of Defense LR19DN01 (STD), La Marsa, Tunisia
Mohamed Amine Djebbi & Riadh Ouersighni
CRM Military Research Center, 2045, Taieb Mhiri street, ElAouina, Tunisia
Mohamed Amine Djebbi & Riadh Ouersighni

Authors

Mohamed Amine Djebbi
View author publications
You can also search for this author in PubMed Google Scholar
Riadh Ouersighni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riadh Ouersighni .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Open University of Cyprus, Nicosia, Cyprus
Yannis Manolopoulos
University of Pau and Pays de l'Adour, Anglet, France
Richard Chbeir
Wrocław University of Science and Technology, Wrocław, Poland
Adrianna Kozierkiewicz
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Djebbi, M.A., Ouersighni, R. (2022). TunTap: A Tunisian Dataset for Topic and Polarity Extraction in Social Media. In: Nguyen, N.T., Manolopoulos, Y., Chbeir, R., Kozierkiewicz, A., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2022. Lecture Notes in Computer Science(), vol 13501. Springer, Cham. https://doi.org/10.1007/978-3-031-16014-1_40

Download citation

DOI: https://doi.org/10.1007/978-3-031-16014-1_40
Published: 21 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16013-4
Online ISBN: 978-3-031-16014-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TunTap: A Tunisian Dataset for Topic and Polarity Extraction in Social Media