Abstract
Blogging websites are growing globally with a fast pace allowing online users to express their views and engage in discussions related to various domains such as politics, technology, and lifestyle. While some blog posts state facts and genuine personal views, others tend to spread rumors or support certain propagandas. This has triggered the need to develop models to automatically rate the credibility of blog posts. Arabic blog posts in particular, have recently drawn a lot of attention following the recent uprisings in the Arab world. To the best of our knowledge, little work has been done to predict the credibility of Arab blogs, which faces many challenges including: the subjectivity and complexity inherent in assessing credibility, the rich morphology of the Arabic language, and the lack of the appropriate lexicons and corpora to conduct credibility analysis. In this paper, we focus on developing a fully automated system to assess the credibility of Arabic blog posts. We collected Arabic blog posts, annotated them, extracted and reduced the important features, then employed various machine learning models (e.g., Support Vector Machines) and deep learning models (e.g., Long Short-Term Memory—LSTM and Convolution Neural Network—CNN), under various input settings. We conclude that LSTM performs the best with accuracy reaching 74%, when the input is composed of the full blog posts along with a set of syntactic and morphological features. The incorporation of hand-crafted features and the addition of CNN to try and extract complex features did not improve the accuracy.
Similar content being viewed by others
References
Anderson P (2001) Consumer health web site evaluation checklist. University of Michigan, Ann Arbor
Al-Eidan RMB, Al-Khalifa HS, Al-Salman AS (2009) bTowards the measurement of arabic weblogs credibility automatically. In: Proceedings of the 11th international conference on information integration and web-based applications & services, 2009, 618–622
Badaro G, Baly R, Hajj H, Habash N, El-Hajj W (2014) A large scale Arabic sentiment lexicon for Arabic opinion mining. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Langauge Processing (ANLP), October 25, 2014. Association for Computational Linguistics, Doha, Qatar, pp 165–173
Canini KR, Suh B, Pirolli PL (2011) Finding credible information sources in social networks based on content and social structure. In: Privacy, security, risk and trust (PASSAT) and IEEE third inernational conference on social computing (SocialCom), 1–8
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76:378
Fogg BJ (2003) Prominence-interpretation theory: Explaining how people assess credibility online. In: CHI'03 extended abstracts on human factors in computing systems pp 722–723
Fogg B, Tseng H (1999) The elements of computer credibility. In: Proceedings of the SIGCHI conference on human factors in computing systems, 80–87
Flanagin AJ, Metzger MJ (2000) Perceptions of Internet information credibility. Journal Mass Commun Q 77:515–540
Fritch JW, Cromwell RL (2001) Evaluating Internet resources: identity, affiliation, and cognitive authority in a networked world. J Am Soc Inf Sci Technol 52:499–507
Fogg B, Soohoo C, Danielson DR, Marable L, Stanford J, Tauber ER (2003) How do users evaluate the credibility of web sites?: a study with over 2,500 participants. In: Proceedings of the 2003 conference on designing for user experiences, 1–15.
Francis, JW Pennebaker Martha E, Booth RJ (1993) Linguistic Inquiry and Word Count. Technical Report, Dallas, TX: Southern Methodist University
Gayo-Avello D, Metaxas PT, Mustafaraj E, Strohmaier M, Schoen H, Gloor. In: Daniel, C. Castillo, M. Mendoza, B Poblete (2013) Predicting information credibility in time-sensitive social media. Internet Research 23: 560–588
Helwe, C, Elbassuoni S, Al Zaatari A, El-Hajj W (2019) assessing arabic weblog credibility via deep co-learning. In: Proceedings of the fourth Arabic natural language processing workshop, 130–136
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18
Juffinger A, Granitzer M ,Lex E (2009) Blog credibility ranking by exploiting verified content. In: Proceedings of the 3rd workshop on information credibility on the Web, 51–58.
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174
Meola M (2004) Chucking the checklist: a contextual approach to teaching undergraduates web-site evaluation. Portal libr Acad 4:331–344
Metzger MJ (2007) Making sense of credibility on the Web: Models for evaluating online information and recommendations for future research. J Am Soc Inf Sci Technol 58:2078–2091
Mukherjee S, Weikum G, Danescu-Niculescu-Mizil C (2014) People on drugs: credibility of user statements in health communities. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM
Nakamura A, Suzuki Y, Ishikawa Y (2013) “Clustering editors of wikipedia by editor’s biases,” in Web Intelligence (WI) and intelligent agent technologies (IAT). IEEE/WIC/ACM Int Joint Conf on 2013:351–358
Olteanu A, Peshterliev S, Liu X, Aberer K (2013) Web credibility: features exploration and credibility prediction. In: European conference on information retrieval. Springer, Berlin, pp 557–568
Pasha A, et al. (2014) MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic." LREC, 14
Ren Y, Zhang Y (2016) Deceptive opinion spam detection using neural network. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers. 2016
Rubin VL, Liddy ED (2006) Assessing credibility of weblogs. In: AAAI spring symposium: computational approaches to analyzing weblogs, 187–190
Scholz-Crane A (1998) Evaluating the future: a preliminary study of the process of how undergraduate students evaluate Web sources. Ref Serv Rev 26:53–60
Stoyanov V, Cardie C (2008) Annotating topics of opinions. In: Proceedings of the sixth international conference on Language Resources and Evaluation (LREC 2008)
Schwarz J, Morris M (2011) Augmenting web pages and search results to support credibility assessment. In: Proceedings of the SIGCHI conference on human factors in computing systems, 1245–1254
Thelwall M et al (2010) Sentiment strength detection in short informal text. J Am Soc Inf Sci Technol 61(12):2544–2558
Ulicny B, Baclawski K (2007) New metrics for newsblog credibility. In: Proceedings of the first international conference on weblogs and social media, ICWSM 2007
Ulicny B, Baclawski K, Magnus A (2007) New metrics for blog mining. In: Defense and Security Symposium, 65700I-65700I-12
Wathen CN, Burkell J (2002) Believe it or not: factors influencing credibility on the Web. J Am Soc Inf Sci Technol 53:134–144
AAL, Zaatari, El Ballouli R, Elbassuoni S, El-Hajj W, Hajj H, Shaban K, Habash N, Yehya E (2016) Arabic corpora for credibility analysis,” LREC 2016, 23–28 Portorož (Slovenia)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
El-Hajj, W., Brahim, G.B. & Zaatari, A. Assessing in real-time the credibility of Arabic blog posts using traditional and deep learning models. Soc. Netw. Anal. Min. 11, 72 (2021). https://doi.org/10.1007/s13278-021-00782-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-021-00782-8