Skip to main content
Log in

Assessing in real-time the credibility of Arabic blog posts using traditional and deep learning models

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Blogging websites are growing globally with a fast pace allowing online users to express their views and engage in discussions related to various domains such as politics, technology, and lifestyle. While some blog posts state facts and genuine personal views, others tend to spread rumors or support certain propagandas. This has triggered the need to develop models to automatically rate the credibility of blog posts. Arabic blog posts in particular, have recently drawn a lot of attention following the recent uprisings in the Arab world. To the best of our knowledge, little work has been done to predict the credibility of Arab blogs, which faces many challenges including: the subjectivity and complexity inherent in assessing credibility, the rich morphology of the Arabic language, and the lack of the appropriate lexicons and corpora to conduct credibility analysis. In this paper, we focus on developing a fully automated system to assess the credibility of Arabic blog posts. We collected Arabic blog posts, annotated them, extracted and reduced the important features, then employed various machine learning models (e.g., Support Vector Machines) and deep learning models (e.g., Long Short-Term Memory—LSTM and Convolution Neural Network—CNN), under various input settings. We conclude that LSTM performs the best with accuracy reaching 74%, when the input is composed of the full blog posts along with a set of syntactic and morphological features. The incorporation of hand-crafted features and the addition of CNN to try and extract complex features did not improve the accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Anderson P (2001) Consumer health web site evaluation checklist. University of Michigan, Ann Arbor

    Google Scholar 

  • Al-Eidan RMB, Al-Khalifa HS, Al-Salman AS (2009) bTowards the measurement of arabic weblogs credibility automatically. In: Proceedings of the 11th international conference on information integration and web-based applications & services, 2009, 618–622

  • Badaro G, Baly R, Hajj H, Habash N, El-Hajj W (2014) A large scale Arabic sentiment lexicon for Arabic opinion mining. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Langauge Processing (ANLP), October 25, 2014. Association for Computational Linguistics, Doha, Qatar, pp 165–173

  • Canini KR, Suh B, Pirolli PL (2011) Finding credible information sources in social networks based on content and social structure. In: Privacy, security, risk and trust (PASSAT) and IEEE third inernational conference on social computing (SocialCom), 1–8

  • Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76:378

    Article  Google Scholar 

  • Fogg BJ (2003) Prominence-interpretation theory: Explaining how people assess credibility online. In: CHI'03 extended abstracts on human factors in computing systems pp 722–723

  • Fogg B, Tseng H (1999) The elements of computer credibility. In: Proceedings of the SIGCHI conference on human factors in computing systems, 80–87

  • Flanagin AJ, Metzger MJ (2000) Perceptions of Internet information credibility. Journal Mass Commun Q 77:515–540

    Article  Google Scholar 

  • Fritch JW, Cromwell RL (2001) Evaluating Internet resources: identity, affiliation, and cognitive authority in a networked world. J Am Soc Inf Sci Technol 52:499–507

    Article  Google Scholar 

  • Fogg B, Soohoo C, Danielson DR, Marable L, Stanford J, Tauber ER (2003) How do users evaluate the credibility of web sites?: a study with over 2,500 participants. In: Proceedings of the 2003 conference on designing for user experiences, 1–15.

  • Francis, JW Pennebaker Martha E, Booth RJ (1993) Linguistic Inquiry and Word Count. Technical Report, Dallas, TX: Southern Methodist University

  • Gayo-Avello D, Metaxas PT, Mustafaraj E, Strohmaier M, Schoen H, Gloor. In: Daniel, C. Castillo, M. Mendoza, B Poblete (2013) Predicting information credibility in time-sensitive social media. Internet Research 23: 560–588

  • Helwe, C, Elbassuoni S, Al Zaatari A, El-Hajj W (2019) assessing arabic weblog credibility via deep co-learning. In: Proceedings of the fourth Arabic natural language processing workshop, 130–136

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18

    Article  Google Scholar 

  • Juffinger A, Granitzer M ,Lex E (2009) Blog credibility ranking by exploiting verified content. In: Proceedings of the 3rd workshop on information credibility on the Web, 51–58.

  • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174

    Article  Google Scholar 

  • Meola M (2004) Chucking the checklist: a contextual approach to teaching undergraduates web-site evaluation. Portal libr Acad 4:331–344

    Article  Google Scholar 

  • Metzger MJ (2007) Making sense of credibility on the Web: Models for evaluating online information and recommendations for future research. J Am Soc Inf Sci Technol 58:2078–2091

    Article  Google Scholar 

  • Mukherjee S, Weikum G, Danescu-Niculescu-Mizil C (2014) People on drugs: credibility of user statements in health communities. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM

  • Nakamura A, Suzuki Y, Ishikawa Y (2013) “Clustering editors of wikipedia by editor’s biases,” in Web Intelligence (WI) and intelligent agent technologies (IAT). IEEE/WIC/ACM Int Joint Conf on 2013:351–358

    Google Scholar 

  • Olteanu A, Peshterliev S, Liu X, Aberer K (2013) Web credibility: features exploration and credibility prediction. In: European conference on information retrieval. Springer, Berlin, pp 557–568

  • Pasha A, et al. (2014) MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic." LREC, 14

  • Ren Y, Zhang Y (2016) Deceptive opinion spam detection using neural network. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers. 2016

  • Rubin VL, Liddy ED (2006) Assessing credibility of weblogs. In: AAAI spring symposium: computational approaches to analyzing weblogs, 187–190

  • Scholz-Crane A (1998) Evaluating the future: a preliminary study of the process of how undergraduate students evaluate Web sources. Ref Serv Rev 26:53–60

    Article  Google Scholar 

  • Stoyanov V, Cardie C (2008) Annotating topics of opinions. In: Proceedings of the sixth international conference on Language Resources and Evaluation (LREC 2008)

  • Schwarz J, Morris M (2011) Augmenting web pages and search results to support credibility assessment. In: Proceedings of the SIGCHI conference on human factors in computing systems, 1245–1254

  • Thelwall M et al (2010) Sentiment strength detection in short informal text. J Am Soc Inf Sci Technol 61(12):2544–2558

    Article  Google Scholar 

  • Ulicny B, Baclawski K (2007) New metrics for newsblog credibility. In: Proceedings of the first international conference on weblogs and social media, ICWSM 2007

  • Ulicny B, Baclawski K, Magnus A (2007) New metrics for blog mining. In: Defense and Security Symposium, 65700I-65700I-12

  • Wathen CN, Burkell J (2002) Believe it or not: factors influencing credibility on the Web. J Am Soc Inf Sci Technol 53:134–144

    Article  Google Scholar 

  • AAL, Zaatari, El Ballouli R, Elbassuoni S, El-Hajj W, Hajj H, Shaban K, Habash N, Yehya E (2016) Arabic corpora for credibility analysis,” LREC 2016, 23–28 Portorož (Slovenia)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wassim El-Hajj.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

El-Hajj, W., Brahim, G.B. & Zaatari, A. Assessing in real-time the credibility of Arabic blog posts using traditional and deep learning models. Soc. Netw. Anal. Min. 11, 72 (2021). https://doi.org/10.1007/s13278-021-00782-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-021-00782-8

Keywords

Navigation