Abstract
We present an extensive study on disinformation, which is defined as information that is false and misleading and intentionally shared to cause harm. Through this work, we aim to answer the following questions:
-
Can we automatically and accurately classify a news article as containing disinformation?
-
What characteristics of disinformation differentiate it from other types of benign information?
We conduct this study in the context of two significant events: the US elections of 2016 and the 2020 COVID pandemic. We build a series of classifiers to (i) examine linguistic clues exhibited by different types of fake news articles, (ii) analyze “clickbaityness” of disinformation headlines, and (iii) finally, perform fine-grained, veracity-based article classification through a natural language inference (NLI) module for automated disinformation verification; this utilizes a manually curated set of evidence sources. For the latter, we built a new dataset that is annotated with generic, veracity-based labels and ground truth evidence supporting each label. The veracity labels were formulated based on examining standards used by reputable fact-checking organizations. We show that disinformation derives features from both propaganda and mainstream news, making it more challenging to detect. However, there is significant potential for automating the fact-checking process to incorporate the degree of veracity. We provide error analysis that illustrates the challenges involved in the automated fact-checking task and identifies factors that may improve this process in future work. Finally, we also describe the implementation of a web app that extracts important entities and actions from a given article and searches the web to gather evidence from credible sources. The evidence articles are then used to generate a veracity label that can assist manual fact-checkers engaged in combating disinformation.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The code and data used for the study in this section are available at https://github.com/architapathak/Disinformation-Analysis-and-Identification.
References
Allcott H, Gentzkow M (2017) Social media and fake news in the 2016 election. J Econ Perspect 31(2):211–36
Anand A, Chakraborty T, Park N (2017) We used neural networks to detect clickbaits: you won’t believe what happened next! In: European conference on information retrieval. Springer, pp 541–547
Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 632–642. https://doi.org/10.18653/v1/D15-1075, https://www.aclweb.org/anthology/D15-1075
Chakraborty A, Paranjape B, Kakarla S, Ganguly N (2016) Stop clickbait: Detecting and preventing clickbaits in online news media. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining. IEEE, pp 9–16
de Cock Buning M (2018) A multi-dimensional approach to disinformation: Report of the independent high level group on fake news and online disinformation. Publications Office of the European Union
Da San Martino G, Barrón-Cedeño A, Nakov P (2019) Findings of the NLP4IF-2019 shared task on fine-grained propaganda detection. In: Proceedings of the second workshop on natural language processing for internet freedom: censorship, disinformation, and propaganda. Association for Computational Linguistics, Hong Kong, China, pp 162–170. https://doi.org/10.18653/v1/D19-5024, https://www.aclweb.org/anthology/D19-5024
Devlin J, Chang M, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805, arXiv:1810.04805
Fallis D (2009) A conceptual analysis of disinformation. Proceedings of iConference, http://hdl.handle.net/2142/15205
Graves D (2018) Understanding the promise and limits of automated fact-checking
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
MacCartney B, Manning CD (2009) An extended model of natural logic. In: Proceedings of the eight international conference on computational semantics, pp 140–156
Nørregaard J, Horne BD, Adalı S (2019) Nela-gt-2018: a large multi-labelled news dataset for the study of misinformation in news articles. In: Proceedings of the international AAAI conference on web and social media, vol 13, pp 630–638
Pathak A, Srihari R (2019) BREAKING! presenting fake news corpus for automated fact checking. In: Proceedings of the 57th annual meeting of the association for computational linguistics: student research workshop. Association for Computational Linguistics, Florence, Italy, pp 357–362. https://doi.org/10.18653/v1/P19-2050, https://www.aclweb.org/anthology/P19-2050
Rashkin H, Choi E, Jang JY, Volkova S, Choi Y (2017) Truth of varying shades: analyzing language in fake news and political fact-checking. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2931–2937
Sang EF, De Meulder F (2003) Introduction to the conll-2003 shared task: language-independent named entity recognition. arXiv preprint cs/0306050
Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor Newsl 19(1):22–36
Thorne J, Vlachos A, Christodoulopoulos C, Mittal A (2018) FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: Human Language Technologies, vol 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, pp 809–819. https://doi.org/10.18653/v1/N18-1074, https://www.aclweb.org/anthology/N18-1074
Ungku F, Fernandez C, Brock J (2019) Factbox: ’fake news’ laws around the world. Reuters
Vlachos A, Riedel S (2014) Fact checking: task definition and dataset construction. In: Proceedings of the ACL 2014 workshop on language technologies and computational social science. Association for Computational Linguistics, Baltimore, MD, USA, pp 18–22. https://doi.org/10.3115/v1/W14-2508, https://www.aclweb.org/anthology/W14-2508
Wang WY (2017) “liar, liar pants on fire”: a new benchmark dataset for fake news detection. In: Proceedings of the 55th annual meeting of the association for computational linguistics, vol. 2: Short Papers. Association for Computational Linguistics, Vancouver, Canada, pp. 422–426. https://doi.org/10.18653/v1/P17-2067, https://www.aclweb.org/anthology/P17-2067
Wardle C, Derakhshan H (2017) Information disorder: toward an interdisciplinary framework for research and policy making. Council of Europe report, DGI (2017) 9
Welleck S, Weston J, Szlam A, Cho K (2019) Dialogue natural language inference. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 3731–3741 https://doi.org/10.18653/v1/P19-1363, https://www.aclweb.org/anthology/P19-1363
Williams A, Nangia N, Bowman S (2018) A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 conference of the North American Chapter of the association for computational Linguistics: Human Language Technologies, vol 1 (Long Papers). Association for computational linguistics, New Orleans, Louisiana, pp 1112–1122. https://doi.org/10.18653/v1/N18-1101, https://www.aclweb.org/anthology/N18-1101
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pathak, A., Srihari, R.K. & Natu, N. Disinformation: analysis and identification. Comput Math Organ Theory 27, 357–375 (2021). https://doi.org/10.1007/s10588-021-09336-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10588-021-09336-x