Skip to main content

Advertisement

Log in

Disinformation: analysis and identification

  • S.I. : SBP-BRiMS2020
  • Published:
Computational and Mathematical Organization Theory Aims and scope Submit manuscript

Abstract

We present an extensive study on disinformation, which is defined as information that is false and misleading and intentionally shared to cause harm. Through this work, we aim to answer the following questions:

  • Can we automatically and accurately classify a news article as containing disinformation?

  • What characteristics of disinformation differentiate it from other types of benign information?

We conduct this study in the context of two significant events: the US elections of 2016 and the 2020 COVID pandemic. We build a series of classifiers to (i) examine linguistic clues exhibited by different types of fake news articles, (ii) analyze “clickbaityness” of disinformation headlines, and (iii) finally, perform fine-grained, veracity-based article classification through a natural language inference (NLI) module for automated disinformation verification; this utilizes a manually curated set of evidence sources. For the latter, we built a new dataset that is annotated with generic, veracity-based labels and ground truth evidence supporting each label. The veracity labels were formulated based on examining standards used by reputable fact-checking organizations. We show that disinformation derives features from both propaganda and mainstream news, making it more challenging to detect. However, there is significant potential for automating the fact-checking process to incorporate the degree of veracity. We provide error analysis that illustrates the challenges involved in the automated fact-checking task and identifies factors that may improve this process in future work. Finally, we also describe the implementation of a web app that extracts important entities and actions from a given article and searches the web to gather evidence from credible sources. The evidence articles are then used to generate a veracity label that can assist manual fact-checkers engaged in combating disinformation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://www.journalism.org/2019/06/05/many-americans-say-made-up-news-is-a-critical-problem-that-needs-to-be-fixed/.

  2. https://www.nature.com/articles/d41586-020-01409-2.

  3. https://news.un.org/en/story/2020/04/1061592.

  4. https://www.cnet.com/news/fake-5g-coronavirus-theories-have-real-world-consequences/.

  5. The code and data used for the study in this section are available at https://github.com/architapathak/Disinformation-Analysis-and-Identification.

  6. https://www.kaggle.com/mrisdal/fake-news.

  7. https://www.politifact.com/article/2017/apr/20/politifacts-guide-fake-news-websites-and-what-they/.

  8. https://euvsdisinfo.eu/about/.

  9. https://components.one/datasets/all-the-news-2-news-articles-dataset/.

  10. https://www.cits.ucsb.edu/fake-news/what-is-fake-news.

  11. https://mediabiasfactcheck.com/.

  12. https://mediabiasfactcheck.com/center/.

References

  • Allcott H, Gentzkow M (2017) Social media and fake news in the 2016 election. J Econ Perspect 31(2):211–36

    Article  Google Scholar 

  • Anand A, Chakraborty T, Park N (2017) We used neural networks to detect clickbaits: you won’t believe what happened next! In: European conference on information retrieval. Springer, pp 541–547

  • Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 632–642. https://doi.org/10.18653/v1/D15-1075, https://www.aclweb.org/anthology/D15-1075

  • Chakraborty A, Paranjape B, Kakarla S, Ganguly N (2016) Stop clickbait: Detecting and preventing clickbaits in online news media. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining. IEEE, pp 9–16

  • de Cock Buning M (2018) A multi-dimensional approach to disinformation: Report of the independent high level group on fake news and online disinformation. Publications Office of the European Union

  • Da San Martino G, Barrón-Cedeño A, Nakov P (2019) Findings of the NLP4IF-2019 shared task on fine-grained propaganda detection. In: Proceedings of the second workshop on natural language processing for internet freedom: censorship, disinformation, and propaganda. Association for Computational Linguistics, Hong Kong, China, pp 162–170. https://doi.org/10.18653/v1/D19-5024, https://www.aclweb.org/anthology/D19-5024

  • Devlin J, Chang M, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805, arXiv:1810.04805

  • Fallis D (2009) A conceptual analysis of disinformation. Proceedings of iConference, http://hdl.handle.net/2142/15205

  • Graves D (2018) Understanding the promise and limits of automated fact-checking

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • MacCartney B, Manning CD (2009) An extended model of natural logic. In: Proceedings of the eight international conference on computational semantics, pp 140–156

  • Nørregaard J, Horne BD, Adalı S (2019) Nela-gt-2018: a large multi-labelled news dataset for the study of misinformation in news articles. In: Proceedings of the international AAAI conference on web and social media, vol 13, pp 630–638

  • Pathak A, Srihari R (2019) BREAKING! presenting fake news corpus for automated fact checking. In: Proceedings of the 57th annual meeting of the association for computational linguistics: student research workshop. Association for Computational Linguistics, Florence, Italy, pp 357–362. https://doi.org/10.18653/v1/P19-2050, https://www.aclweb.org/anthology/P19-2050

  • Rashkin H, Choi E, Jang JY, Volkova S, Choi Y (2017) Truth of varying shades: analyzing language in fake news and political fact-checking. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2931–2937

  • Sang EF, De Meulder F (2003) Introduction to the conll-2003 shared task: language-independent named entity recognition. arXiv preprint cs/0306050

  • Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor Newsl 19(1):22–36

    Article  Google Scholar 

  • Thorne J, Vlachos A, Christodoulopoulos C, Mittal A (2018) FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: Human Language Technologies, vol 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, pp 809–819. https://doi.org/10.18653/v1/N18-1074, https://www.aclweb.org/anthology/N18-1074

  • Ungku F, Fernandez C, Brock J (2019) Factbox: ’fake news’ laws around the world. Reuters

  • Vlachos A, Riedel S (2014) Fact checking: task definition and dataset construction. In: Proceedings of the ACL 2014 workshop on language technologies and computational social science. Association for Computational Linguistics, Baltimore, MD, USA, pp 18–22. https://doi.org/10.3115/v1/W14-2508, https://www.aclweb.org/anthology/W14-2508

  • Wang WY (2017) “liar, liar pants on fire”: a new benchmark dataset for fake news detection. In: Proceedings of the 55th annual meeting of the association for computational linguistics, vol. 2: Short Papers. Association for Computational Linguistics, Vancouver, Canada, pp. 422–426. https://doi.org/10.18653/v1/P17-2067, https://www.aclweb.org/anthology/P17-2067

  • Wardle C, Derakhshan H (2017) Information disorder: toward an interdisciplinary framework for research and policy making. Council of Europe report, DGI (2017) 9

  • Welleck S, Weston J, Szlam A, Cho K (2019) Dialogue natural language inference. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 3731–3741 https://doi.org/10.18653/v1/P19-1363, https://www.aclweb.org/anthology/P19-1363

  • Williams A, Nangia N, Bowman S (2018) A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 conference of the North American Chapter of the association for computational Linguistics: Human Language Technologies, vol 1 (Long Papers). Association for computational linguistics, New Orleans, Louisiana, pp 1112–1122. https://doi.org/10.18653/v1/N18-1101, https://www.aclweb.org/anthology/N18-1101

  • Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Archita Pathak.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pathak, A., Srihari, R.K. & Natu, N. Disinformation: analysis and identification. Comput Math Organ Theory 27, 357–375 (2021). https://doi.org/10.1007/s10588-021-09336-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10588-021-09336-x