skip to main content
research-article

A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages

Published: 02 November 2021 Publication History

Abstract

Fake news classification is one of the most interesting problems that has attracted huge attention to the researchers of artificial intelligence, natural language processing, and machine learning (ML). Most of the current works on fake news detection are in the English language, and hence this has limited its widespread usability, especially outside the English literate population. Although there has been a growth in multilingual web content, fake news classification in low-resource languages is still a challenge due to the non-availability of an annotated corpus and tools. This article proposes an effective neural model based on the multilingual Bidirectional Encoder Representations from Transformer (BERT) for domain-agnostic multilingual fake news classification. Large varieties of experiments, including language-specific and domain-specific settings, are conducted. The proposed model achieves high accuracy in domain-specific and domain-agnostic experiments, and it also outperforms the current state-of-the-art models. We perform experiments on zero-shot settings to assess the effectiveness of language-agnostic feature transfer across different languages, showing encouraging results. Cross-domain transfer experiments are also performed to assess language-independent feature transfer of the model. We also offer a multilingual multidomain fake news detection dataset of five languages and seven different domains that could be useful for the research and development in resource-scarce scenarios.

References

[1]
Hugo Queiroz Abonizio, Janaína Ignácio de Morais, Gabriel Marques Tavares, and Sylvio Barbon Junior. 2020. Language-independent fake news detection: English, Portuguese, and Spanish mutual features. Future Internet 12, 5 (2020), 87. DOI:
[2]
Abien Fred Agarap. 2018. Deep learning using rectified linear units (ReLU). CoRR abs/1803.08375 (2018). arxiv:1803.08375. http://arxiv.org/abs/1803.08375.
[3]
Michael Barthel, Amy Mitchell, and Jesse Holcomb. 2016. Many Americans believe fake news is sowing confusion. Pew Research Center 15 (2016), 12.
[4]
Gaurav Bhatt, Aman Sharma, Shivam Sharma, Ankush Nagpal, Balasubramanian Raman, and Ankush Mittal. 2017. On the benefit of combining neural, statistical and external features for fake news identification. CoRR abs/1712.03935 (2017). arxiv:1712.03935. http://arxiv.org/abs/1712.03935.
[5]
Daniel Cer, Yinfei Yang, Sheng-Yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, et al. 2018. Universal sentence encoder. CoRR abs/1803.11175 (2018). arxiv:1803.11175. http://arxiv.org/abs/1803.11175.
[6]
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 670–680. DOI:
[7]
Nadia K. Conroy, Victoria L. Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology 52, 1 (2015), 1–4. DOI:
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers). 4171–4186. DOI:
[9]
Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many rater. Psychological Bulletin 76 (1971), 378–382.
[10]
Michael Goodyear. 2020. Fake news in the time of COVID-19: Inherent powers over public health. Available at SSRN 3740639 (2020). https://ssrn.com/abstract=3740639 or https://doi.org/10.2139/ssrn.3740639.
[11]
Gael Guibon, Liana Ermakova, Hosni Seffih, Anton Firsov, and Guillaume Le Noe-Bienvenu. 2019. Multilingual Fake News Detection with Satire. Retrieved October 17, 2021 from https://halshs.archives-ouvertes.fr/halshs-02391141.
[12]
Andreas Hanselowski, Avinesh PVS, Benjamin Schiller, Felix Caspelherr, Debanjan Chaudhuri, Christian M. Meyer, and Iryna Gurevych. 2018. A retrospective analysis of the fake news challenge stance-detection task. In Proceedings of the 27th International Conference on Computational Linguistics. 1859–1874. https://www.aclweb.org/anthology/C18-1158.
[13]
Kathleen Jamieson and Joseph Cappella. 2008. Echo Chamber: Rush Limbaugh and the Conservative Media Establishment. Oxford University Press.
[14]
Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 785–794. DOI:
[15]
J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159–174.
[16]
Raymond S. Nickerson. 1998. Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology 2, 2 (1998), 175–220. DOI:
[17]
Ofcom. 2020. Covid-19 News and Information: Consumption and Attitudes. Results from Week One of Ofcom’s Online Survey. Retrieved October 17, 2021 from https://www.ofcom.org.uk/__data/assets/pdf_file/0031/193747/covid-19-news-consumption-week-one-findings.pdf.
[18]
World Health Organization. 2020. Coronavirus Disease (COVID-19) Advice for the Public: Mythbusters. World Health Organization.
[19]
Matteo Pagliardini, Prakhar Gupta, and Martin Jaggi. 2018. Unsupervised learning of sentence embeddings using compositional n-gram features. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 528–540. DOI:
[20]
Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. 2018. Automatic detection of fake news. In Proceedings of the 27th International Conference on Computational Linguistics. 3391–3401. https://www.aclweb.org/anthology/C18-1287.
[21]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Retrieved October 17, 2021 from https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.
[22]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392. DOI:
[23]
Victoria L. Rubin, Yimin Chen, and Nadia K. Conroy. 2015. Deception detection for news: Three types of fakes. Proceedings of the Association for Information Science and Technology 52, 1 (2015), 1–4. DOI:
[24]
Tanik Saikh, Arkadipta De, Asif Ekbal, and Pushpak Bhattacharyya. 2019. A deep learning approach for automatic detection of fake news. In Proceedings of the 16th International Conference on Natural Language Processing (ICON’19). 230–238. https://cdn.iiit.ac.in/cdn/ltrc.iiit.ac.in/icon2019/icon2019proceedings.pdf.
[25]
Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. SIGKDD Explorations Newsletter 19, 1 (Sept. 2017), 22–36. DOI:
[26]
Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake News Detection on Social Media: A Data Mining Perspective. Retrieved October 17, 2021 from https://doi.org/10.1145/3137597.3137600
[27]
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: A large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 809–819. DOI:
[28]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates Inc., 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.
[29]
Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science 359, 6380 (2018), 1146–1151. DOI:
[30]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). https://openreview.net/forum?id=rJ4km2R5t7.
[31]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. Google’s neural Machine translation system: Bridging the gap between human and machine translation. arxiv:1609.08144 [cs.CL]
[32]
Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. 2019. Defending against neural fake news. In Advances in Neural Information Processing Systems 32.

Cited By

View all
  • (2025)Adversarial Data Poisoning for Fake News Detection: How to Make a Model Misclassify a Target News Without Modifying itMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74627-7_44(525-530)Online publication date: 1-Jan-2025
  • (2024)Pre-Trained Language Model Ensemble for Arabic Fake News DetectionMathematics10.3390/math1218294112:18(2941)Online publication date: 21-Sep-2024
  • (2024)EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News ArticlesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679167(5380-5384)Online publication date: 21-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 1
January 2022
442 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3494068
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2021
Accepted: 01 June 2021
Revised: 01 June 2021
Received: 01 September 2020
Published in TALLIP Volume 21, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Fake news detection
  2. low-resource languages
  3. multilingual
  4. Hindi
  5. Swahili
  6. Indonesian
  7. Vietnamese

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)646
  • Downloads (Last 6 weeks)40
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Adversarial Data Poisoning for Fake News Detection: How to Make a Model Misclassify a Target News Without Modifying itMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74627-7_44(525-530)Online publication date: 1-Jan-2025
  • (2024)Pre-Trained Language Model Ensemble for Arabic Fake News DetectionMathematics10.3390/math1218294112:18(2941)Online publication date: 21-Sep-2024
  • (2024)EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News ArticlesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679167(5380-5384)Online publication date: 21-Oct-2024
  • (2024)A Novel Natural Language Processing Model Transfer Strategy Tailored for Deep Learning PlatformsJournal of Circuits, Systems and Computers10.1142/S021812662550050134:02Online publication date: 28-Oct-2024
  • (2024)Deep Learning-Based Human Action Recognition in VideosJournal of Circuits, Systems and Computers10.1142/S0218126625500409Online publication date: 28-Sep-2024
  • (2024)A Novel Fake News Detection Model for Context of Mixed Languages Through Multiscale TransformerIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.329848011:4(5079-5089)Online publication date: Aug-2024
  • (2024)Don’t Be Misled by Emotion! Disentangle Emotions and Semantics for Cross-Language and Cross-Domain Rumor DetectionIEEE Transactions on Big Data10.1109/TBDATA.2023.333463410:3(249-259)Online publication date: Jun-2024
  • (2024)Can BERT Learn Evidence-Aware Representation for Low Resource Fake News Detection?2024 International Conference on Computer, Control, Informatics and its Applications (IC3INA)10.1109/IC3INA64086.2024.10732358(231-236)Online publication date: 9-Oct-2024
  • (2024)Improved Fake News Detection by Combining Sentence Transformers, Variational Autoencoders, and Topic Modelling: the VAE-Topic Model Fusion Method2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT)10.1109/IC2PCT60090.2024.10486402(857-863)Online publication date: 9-Feb-2024
  • (2024)Deciphering Deception: Unmasking Fake News in Multilingual Contexts2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT)10.1109/IC2PCT60090.2024.10486302(807-812)Online publication date: 9-Feb-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media