Assessing in real-time the credibility of Arabic blog posts using traditional and deep learning models

El-Hajj, Wassim; Brahim, Ghassen Ben; Zaatari, Ayman

doi:10.1007/s13278-021-00782-8

Assessing in real-time the credibility of Arabic blog posts using traditional and deep learning models

Original Article
Published: 09 August 2021

Volume 11, article number 72, (2021)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

404 Accesses
Explore all metrics

Abstract

Blogging websites are growing globally with a fast pace allowing online users to express their views and engage in discussions related to various domains such as politics, technology, and lifestyle. While some blog posts state facts and genuine personal views, others tend to spread rumors or support certain propagandas. This has triggered the need to develop models to automatically rate the credibility of blog posts. Arabic blog posts in particular, have recently drawn a lot of attention following the recent uprisings in the Arab world. To the best of our knowledge, little work has been done to predict the credibility of Arab blogs, which faces many challenges including: the subjectivity and complexity inherent in assessing credibility, the rich morphology of the Arabic language, and the lack of the appropriate lexicons and corpora to conduct credibility analysis. In this paper, we focus on developing a fully automated system to assess the credibility of Arabic blog posts. We collected Arabic blog posts, annotated them, extracted and reduced the important features, then employed various machine learning models (e.g., Support Vector Machines) and deep learning models (e.g., Long Short-Term Memory—LSTM and Convolution Neural Network—CNN), under various input settings. We conclude that LSTM performs the best with accuracy reaching 74%, when the input is composed of the full blog posts along with a set of syntactic and morphological features. The incorporation of hand-crafted features and the addition of CNN to try and extract complex features did not improve the accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fake news, disinformation and misinformation in social media: a review

Article 09 February 2023

Esma Aïmeur, Sabrine Amri & Gilles Brassard

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Article 09 April 2024

Pranati Rakshit & Avik Sarkar

Mental Health Analysis in Social Media Posts: A Survey

Article 03 January 2023

Muskan Garg

References

Anderson P (2001) Consumer health web site evaluation checklist. University of Michigan, Ann Arbor
Google Scholar
Al-Eidan RMB, Al-Khalifa HS, Al-Salman AS (2009) bTowards the measurement of arabic weblogs credibility automatically. In: Proceedings of the 11th international conference on information integration and web-based applications & services, 2009, 618–622
Badaro G, Baly R, Hajj H, Habash N, El-Hajj W (2014) A large scale Arabic sentiment lexicon for Arabic opinion mining. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Langauge Processing (ANLP), October 25, 2014. Association for Computational Linguistics, Doha, Qatar, pp 165–173
Canini KR, Suh B, Pirolli PL (2011) Finding credible information sources in social networks based on content and social structure. In: Privacy, security, risk and trust (PASSAT) and IEEE third inernational conference on social computing (SocialCom), 1–8
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76:378
Article Google Scholar
Fogg BJ (2003) Prominence-interpretation theory: Explaining how people assess credibility online. In: CHI'03 extended abstracts on human factors in computing systems pp 722–723
Fogg B, Tseng H (1999) The elements of computer credibility. In: Proceedings of the SIGCHI conference on human factors in computing systems, 80–87
Flanagin AJ, Metzger MJ (2000) Perceptions of Internet information credibility. Journal Mass Commun Q 77:515–540
Article Google Scholar
Fritch JW, Cromwell RL (2001) Evaluating Internet resources: identity, affiliation, and cognitive authority in a networked world. J Am Soc Inf Sci Technol 52:499–507
Article Google Scholar
Fogg B, Soohoo C, Danielson DR, Marable L, Stanford J, Tauber ER (2003) How do users evaluate the credibility of web sites?: a study with over 2,500 participants. In: Proceedings of the 2003 conference on designing for user experiences, 1–15.
Francis, JW Pennebaker Martha E, Booth RJ (1993) Linguistic Inquiry and Word Count. Technical Report, Dallas, TX: Southern Methodist University
Gayo-Avello D, Metaxas PT, Mustafaraj E, Strohmaier M, Schoen H, Gloor. In: Daniel, C. Castillo, M. Mendoza, B Poblete (2013) Predicting information credibility in time-sensitive social media. Internet Research 23: 560–588
Helwe, C, Elbassuoni S, Al Zaatari A, El-Hajj W (2019) assessing arabic weblog credibility via deep co-learning. In: Proceedings of the fourth Arabic natural language processing workshop, 130–136
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18
Article Google Scholar
Juffinger A, Granitzer M ,Lex E (2009) Blog credibility ranking by exploiting verified content. In: Proceedings of the 3rd workshop on information credibility on the Web, 51–58.
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174
Article Google Scholar
Meola M (2004) Chucking the checklist: a contextual approach to teaching undergraduates web-site evaluation. Portal libr Acad 4:331–344
Article Google Scholar
Metzger MJ (2007) Making sense of credibility on the Web: Models for evaluating online information and recommendations for future research. J Am Soc Inf Sci Technol 58:2078–2091
Article Google Scholar
Mukherjee S, Weikum G, Danescu-Niculescu-Mizil C (2014) People on drugs: credibility of user statements in health communities. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM
Nakamura A, Suzuki Y, Ishikawa Y (2013) “Clustering editors of wikipedia by editor’s biases,” in Web Intelligence (WI) and intelligent agent technologies (IAT). IEEE/WIC/ACM Int Joint Conf on 2013:351–358
Google Scholar
Olteanu A, Peshterliev S, Liu X, Aberer K (2013) Web credibility: features exploration and credibility prediction. In: European conference on information retrieval. Springer, Berlin, pp 557–568
Pasha A, et al. (2014) MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic." LREC, 14
Ren Y, Zhang Y (2016) Deceptive opinion spam detection using neural network. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers. 2016
Rubin VL, Liddy ED (2006) Assessing credibility of weblogs. In: AAAI spring symposium: computational approaches to analyzing weblogs, 187–190
Scholz-Crane A (1998) Evaluating the future: a preliminary study of the process of how undergraduate students evaluate Web sources. Ref Serv Rev 26:53–60
Article Google Scholar
Stoyanov V, Cardie C (2008) Annotating topics of opinions. In: Proceedings of the sixth international conference on Language Resources and Evaluation (LREC 2008)
Schwarz J, Morris M (2011) Augmenting web pages and search results to support credibility assessment. In: Proceedings of the SIGCHI conference on human factors in computing systems, 1245–1254
Thelwall M et al (2010) Sentiment strength detection in short informal text. J Am Soc Inf Sci Technol 61(12):2544–2558
Article Google Scholar
Ulicny B, Baclawski K (2007) New metrics for newsblog credibility. In: Proceedings of the first international conference on weblogs and social media, ICWSM 2007
Ulicny B, Baclawski K, Magnus A (2007) New metrics for blog mining. In: Defense and Security Symposium, 65700I-65700I-12
Wathen CN, Burkell J (2002) Believe it or not: factors influencing credibility on the Web. J Am Soc Inf Sci Technol 53:134–144
Article Google Scholar
AAL, Zaatari, El Ballouli R, Elbassuoni S, El-Hajj W, Hajj H, Shaban K, Habash N, Yehya E (2016) Arabic corpora for credibility analysis,” LREC 2016, 23–28 Portorož (Slovenia)

Download references

Author information

Authors and Affiliations

Computer Science Department, American University of Beirut, Beirut, Lebanon
Wassim El-Hajj & Ayman Zaatari
Computer Science Department, Prince Mohammad Bin Fahd University, Al-Khobar, Saudi Arabia
Ghassen Ben Brahim

Authors

Wassim El-Hajj
View author publications
You can also search for this author in PubMed Google Scholar
Ghassen Ben Brahim
View author publications
You can also search for this author in PubMed Google Scholar
Ayman Zaatari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wassim El-Hajj.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

El-Hajj, W., Brahim, G.B. & Zaatari, A. Assessing in real-time the credibility of Arabic blog posts using traditional and deep learning models. Soc. Netw. Anal. Min. 11, 72 (2021). https://doi.org/10.1007/s13278-021-00782-8

Download citation

Received: 22 January 2021
Revised: 04 July 2021
Accepted: 24 July 2021
Published: 09 August 2021
DOI: https://doi.org/10.1007/s13278-021-00782-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Assessing in real-time the credibility of Arabic blog posts using traditional and deep learning models

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Mental Health Analysis in Social Media Posts: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Assessing in real-time the credibility of Arabic blog posts using traditional and deep learning models

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Mental Health Analysis in Social Media Posts: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation