Disinformation: analysis and identification

Pathak, Archita; Srihari, Rohini K.; Natu, Nihit

doi:10.1007/s10588-021-09336-x

Disinformation: analysis and identification

S.I. : SBP-BRiMS2020
Published: 18 June 2021

Volume 27, pages 357–375, (2021)
Cite this article

Computational and Mathematical Organization Theory Aims and scope Submit manuscript

12k Accesses
9 Citations
14 Altmetric
1 Mention
Explore all metrics

Abstract

We present an extensive study on disinformation, which is defined as information that is false and misleading and intentionally shared to cause harm. Through this work, we aim to answer the following questions:

Can we automatically and accurately classify a news article as containing disinformation?
What characteristics of disinformation differentiate it from other types of benign information?

We conduct this study in the context of two significant events: the US elections of 2016 and the 2020 COVID pandemic. We build a series of classifiers to (i) examine linguistic clues exhibited by different types of fake news articles, (ii) analyze “clickbaityness” of disinformation headlines, and (iii) finally, perform fine-grained, veracity-based article classification through a natural language inference (NLI) module for automated disinformation verification; this utilizes a manually curated set of evidence sources. For the latter, we built a new dataset that is annotated with generic, veracity-based labels and ground truth evidence supporting each label. The veracity labels were formulated based on examining standards used by reputable fact-checking organizations. We show that disinformation derives features from both propaganda and mainstream news, making it more challenging to detect. However, there is significant potential for automating the fact-checking process to incorporate the degree of veracity. We provide error analysis that illustrates the challenges involved in the automated fact-checking task and identifies factors that may improve this process in future work. Finally, we also describe the implementation of a web app that extracts important entities and actions from a given article and searches the web to gather evidence from credible sources. The evidence articles are then used to generate a veracity label that can assist manual fact-checkers engaged in combating disinformation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Understanding archetypes of fake news via fine-grained classification

Article 22 July 2019

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Involving Society to Protect Society from Fake News and Disinformation: Crowdsourced Datasets and Text Reliability Assessment

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Allcott H, Gentzkow M (2017) Social media and fake news in the 2016 election. J Econ Perspect 31(2):211–36
Article Google Scholar
Anand A, Chakraborty T, Park N (2017) We used neural networks to detect clickbaits: you won’t believe what happened next! In: European conference on information retrieval. Springer, pp 541–547
Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 632–642. https://doi.org/10.18653/v1/D15-1075, https://www.aclweb.org/anthology/D15-1075
Chakraborty A, Paranjape B, Kakarla S, Ganguly N (2016) Stop clickbait: Detecting and preventing clickbaits in online news media. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining. IEEE, pp 9–16
de Cock Buning M (2018) A multi-dimensional approach to disinformation: Report of the independent high level group on fake news and online disinformation. Publications Office of the European Union
Da San Martino G, Barrón-Cedeño A, Nakov P (2019) Findings of the NLP4IF-2019 shared task on fine-grained propaganda detection. In: Proceedings of the second workshop on natural language processing for internet freedom: censorship, disinformation, and propaganda. Association for Computational Linguistics, Hong Kong, China, pp 162–170. https://doi.org/10.18653/v1/D19-5024, https://www.aclweb.org/anthology/D19-5024
Devlin J, Chang M, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805, arXiv:1810.04805
Fallis D (2009) A conceptual analysis of disinformation. Proceedings of iConference, http://hdl.handle.net/2142/15205
Graves D (2018) Understanding the promise and limits of automated fact-checking
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
MacCartney B, Manning CD (2009) An extended model of natural logic. In: Proceedings of the eight international conference on computational semantics, pp 140–156
Nørregaard J, Horne BD, Adalı S (2019) Nela-gt-2018: a large multi-labelled news dataset for the study of misinformation in news articles. In: Proceedings of the international AAAI conference on web and social media, vol 13, pp 630–638
Pathak A, Srihari R (2019) BREAKING! presenting fake news corpus for automated fact checking. In: Proceedings of the 57th annual meeting of the association for computational linguistics: student research workshop. Association for Computational Linguistics, Florence, Italy, pp 357–362. https://doi.org/10.18653/v1/P19-2050, https://www.aclweb.org/anthology/P19-2050
Rashkin H, Choi E, Jang JY, Volkova S, Choi Y (2017) Truth of varying shades: analyzing language in fake news and political fact-checking. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2931–2937
Sang EF, De Meulder F (2003) Introduction to the conll-2003 shared task: language-independent named entity recognition. arXiv preprint cs/0306050
Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor Newsl 19(1):22–36
Article Google Scholar
Thorne J, Vlachos A, Christodoulopoulos C, Mittal A (2018) FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: Human Language Technologies, vol 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, pp 809–819. https://doi.org/10.18653/v1/N18-1074, https://www.aclweb.org/anthology/N18-1074
Ungku F, Fernandez C, Brock J (2019) Factbox: ’fake news’ laws around the world. Reuters
Vlachos A, Riedel S (2014) Fact checking: task definition and dataset construction. In: Proceedings of the ACL 2014 workshop on language technologies and computational social science. Association for Computational Linguistics, Baltimore, MD, USA, pp 18–22. https://doi.org/10.3115/v1/W14-2508, https://www.aclweb.org/anthology/W14-2508
Wang WY (2017) “liar, liar pants on fire”: a new benchmark dataset for fake news detection. In: Proceedings of the 55th annual meeting of the association for computational linguistics, vol. 2: Short Papers. Association for Computational Linguistics, Vancouver, Canada, pp. 422–426. https://doi.org/10.18653/v1/P17-2067, https://www.aclweb.org/anthology/P17-2067
Wardle C, Derakhshan H (2017) Information disorder: toward an interdisciplinary framework for research and policy making. Council of Europe report, DGI (2017) 9
Welleck S, Weston J, Szlam A, Cho K (2019) Dialogue natural language inference. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 3731–3741 https://doi.org/10.18653/v1/P19-1363, https://www.aclweb.org/anthology/P19-1363
Williams A, Nangia N, Bowman S (2018) A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 conference of the North American Chapter of the association for computational Linguistics: Human Language Technologies, vol 1 (Long Papers). Association for computational linguistics, New Orleans, Louisiana, pp 1112–1122. https://doi.org/10.18653/v1/N18-1101, https://www.aclweb.org/anthology/N18-1101
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657

Download references

Author information

Authors and Affiliations

University at Buffalo (SUNY), 338D Davis Hall, Buffalo , NY, 14260-2500, USA
Archita Pathak, Rohini K. Srihari & Nihit Natu
Farmington Hills, USA
Archita Pathak
Buffalo, USA
Nihit Natu

Authors

Archita Pathak
View author publications
You can also search for this author inPubMed Google Scholar
Rohini K. Srihari
View author publications
You can also search for this author inPubMed Google Scholar
Nihit Natu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Archita Pathak.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pathak, A., Srihari, R.K. & Natu, N. Disinformation: analysis and identification. Comput Math Organ Theory 27, 357–375 (2021). https://doi.org/10.1007/s10588-021-09336-x

Download citation

Published: 18 June 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10588-021-09336-x

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Disinformation: analysis and identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Understanding archetypes of fake news via fine-grained classification

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Involving Society to Protect Society from Fake News and Disinformation: Crowdsourced Datasets and Text Reliability Assessment

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now