Skip to main content

Multilingual Evidence Retrieval and Fact Verification to Combat Global Disinformation: The Power of Polyglotism

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12657))

Abstract

This article investigates multilingual evidence retrieval and fact verification as a step to combat global disinformation, a first effort of this kind, to the best of our knowledge. The goal is building multilingual systems that retrieve in evidence - rich languages to verify claims in evidence - poor languages that are more commonly targeted by disinformation. To this end, our EnmBERT fact verification system shows evidence of transfer learning ability and a 400 example mixed English - Romanian dataset is made available for cross - lingual transfer learning evaluation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.pewresearch.org/fact-tank/2020/04/09/most-european-students-learn-english-in-school/.

  2. 2.

    https://meta.wikimedia.org/wiki/List_of_Wikipedias.

  3. 3.

    https://twitter.com/MsAmericanPie_/status/1287969874036379649.

  4. 4.

    https://twitter.com/hashtag/Pacepa.

  5. 5.

    https://github.com/D-Roberts/multilingual_nli_ECIR2021.

  6. 6.

    https://github.com/google-research/bert/blob/master/multilingual.md.

  7. 7.

    https://www.mediawiki.org/wiki/API:Main_page.

  8. 8.

    https://github.com/huggingface/transformers.

  9. 9.

    https://github.com/sheffieldnlp/fever-scorer.

  10. 10.

    https://github.com/D-Roberts/multilingual_nli_ECIR2021.

References

  1. Andrei, A.: impact.ro (2020). https://www.impact.ro/exclusiv-ce-se-intampla-acum-cu-ion-mihai-pacepa. Accessed 28 Oct 2020

  2. Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)

    Article  Google Scholar 

  3. Bastos, M.T., Mercea, D.: The Brexit botnet and user-generated hyperpartisan news. Soc. Sci. Comput. Rev. 37(1), 38–54 (2019)

    Article  Google Scholar 

  4. Bessi, A., Ferrara, E.: Social bots distort the 2016 US Presidential election online discussion. First Monday 21(11–7), 56 (2016)

    Google Scholar 

  5. Brachten, F., Stieglitz, S., Hofeditz, L., Kloppenborg, K., Reimann, A.: Strategies and influence of social bots in a 2017 German state election-a case study on Twitter. arXiv preprint arXiv:1710.07562 (2017)

  6. Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning, pp. 129–136 (2007)

    Google Scholar 

  7. Clark, J.H., et al.: TyDi QA: a benchmark for information-seeking question answering in typologically diverse languages. arXiv preprint arXiv:2003.05002 (2020)

  8. Cohen, N.: Conspiracy videos? Fake news? Enter Wikipedia, the ‘good cop’ of the Internet. The Washington Post (2018)

    Google Scholar 

  9. Conneau, A., et al.: XNLI: evaluating cross-lingual sentence representations. arXiv preprint arXiv:1809.05053 (2018)

  10. Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 708–716 (2007)

    Google Scholar 

  11. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805

  12. Fallis, D.: What is disinformation? Library Trends 63(3), 401–426 (2015)

    Article  Google Scholar 

  13. Fetzer, J.H.: Disinformation: the use of false information. Mind. Mach. 14(2), 231–240 (2004)

    Article  Google Scholar 

  14. Gardner, M., et al.: AllenNLP: a deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640 (2018)

  15. Grinberg, N., Joseph, K., Friedland, L., Swire-Thompson, B., Lazer, D.: Fake news on Twitter during the 2016 US presidential election. Science 363(6425), 374–378 (2019)

    Article  Google Scholar 

  16. Hanselowski, A., et al.: UKP-Athene: multi-sentence textual entailment for claim verification. In: Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pp. 103–108 (2018)

    Google Scholar 

  17. Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017)

    Article  Google Scholar 

  18. Kar, D., Bhardwaj, M., Samanta, S., Azad, A.P.: No rumours please! A multi-Indic-lingual approach for COVID fake-tweet detection. arXiv preprint arXiv:2010.06906 (2020)

  19. Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906 (2020)

  20. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  21. Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv preprint arXiv:2005.11401 (2020)

  22. Liu, Y., et al.: Multilingual denoising pre-training for neural machine translation. arXiv preprint arXiv:2001.08210 (2020)

  23. Liu, Z., Xiong, C., Sun, M., Liu, Z.: Fine-grained fact verification with kernel graph attention network. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7342–7351 (2020)

    Google Scholar 

  24. Malon, C.: Team Papelo: transformer networks at FEVER. In: Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pp. 109–113 (2018)

    Google Scholar 

  25. Pacepa, I.M.: Red Horizons: Chronicles of a Communist Spy Chief. Gateway Books (1987)

    Google Scholar 

  26. Pacepa, I.M., Rychlak, R.J.: Disinformation: Former Spy Chief Reveals Secret Strategy for Undermining Freedom, Attacking Religion, and Promoting Terrorism. Wnd Books (2013)

    Google Scholar 

  27. Rogers, K., Longoria, J.: Why a Gamer Started a Web of Disinformation Sites Aimed at Latino Americans (2020). https://fivethirtyeight.com/features/why-a-gamer-started-a-web-of-disinformation-sites-aimed-at-latino-americans. Accessed 18 Jan 2021

  28. Sakata, W., Shibata, T., Tanaka, R., Kurohashi, S.: FAQ retrieval using query-question similarity and BERT-based query-answer relevance. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1113–1116 (2019)

    Google Scholar 

  29. Schroepfer, M.: Creating a data set and a challenge for deepfakes. Facebook Artificial Intelligence (2019)

    Google Scholar 

  30. Schwenk, H., Li, X.: A corpus for multilingual document classification in eight languages. arXiv preprint arXiv:1805.09821 (2018)

  31. Sennrich, R., Haddow, B., Birch, A.: Edinburgh neural machine translation systems for WMT 16. arXiv preprint arXiv:1606.02891 (2016)

  32. Silverman, C.: This Analysis Shows How Viral Fake Election News Stories Outperformed Real News on Facebook (2016). https://www.buzzfeednews.com/article/craigsilverman/viral-fake-election-news-outperformed-real-news-on-facebook. Accessed 28 Oct 2020

  33. Soleimani, A., Monz, C., Worring, M.: BERT for evidence retrieval and claim verification. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 359–366. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_45

    Chapter  Google Scholar 

  34. Thorne, J., Vlachos, A.: Avoiding catastrophic forgetting in mitigating model biases in sentence-pair classification with elastic weight consolidation. arXiv preprint arXiv:2004.14366 (2020)

  35. Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: FEVER: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355 (2018)

  36. Thorne, J., Vlachos, A., Cocarascu, O., Christodoulopoulos, C., Mittal, A.: The fact extraction and verification (FEVER) shared task. arXiv preprint arXiv:1811.10971 (2018)

  37. Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018)

    Article  Google Scholar 

  38. Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing. arXiv arXiv:1910 (2019)

  39. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, pp. 5753–5763 (2019)

    Google Scholar 

  40. Yoneda, T., Mitchell, J., Welbl, J., Stenetorp, P., Riedel, S.: UCL machine reading group: four factor framework for fact finding (HexaF). In: Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pp. 97–102 (2018)

    Google Scholar 

  41. Zhong, W., et al.: Reasoning over semantic-level graph for fact checking. arXiv preprint arXiv:1909.03745 (2019)

  42. Zhou, J., et al.: GEAR: graph-based evidence aggregating and reasoning for fact verification. arXiv preprint arXiv:1908.01843 (2019)

  43. Zhou, X., Mulay, A., Ferrara, E., Zafarani, R.: ReCOVery: a multimodal repository for COVID-19 news credibility research. arXiv preprint arXiv:2006.05557 (2020)

  44. Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. (CSUR) 53(5), 1–40 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Denisa A. Olteanu Roberts .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Olteanu Roberts, D.A. (2021). Multilingual Evidence Retrieval and Fact Verification to Combat Global Disinformation: The Power of Polyglotism. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12657. Springer, Cham. https://doi.org/10.1007/978-3-030-72240-1_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72240-1_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72239-5

  • Online ISBN: 978-3-030-72240-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics