research-article

A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages

Authors:

Dibyanayan Bandyopadhyay,

Asif EkbalAuthors Info & Claims

Transactions on Asian and Low-Resource Language Information Processing, Volume 21, Issue 1

Article No.: 9, Pages 1 - 20

https://doi.org/10.1145/3472619

Published: 02 November 2021 Publication History

Abstract

Fake news classification is one of the most interesting problems that has attracted huge attention to the researchers of artificial intelligence, natural language processing, and machine learning (ML). Most of the current works on fake news detection are in the English language, and hence this has limited its widespread usability, especially outside the English literate population. Although there has been a growth in multilingual web content, fake news classification in low-resource languages is still a challenge due to the non-availability of an annotated corpus and tools. This article proposes an effective neural model based on the multilingual Bidirectional Encoder Representations from Transformer (BERT) for domain-agnostic multilingual fake news classification. Large varieties of experiments, including language-specific and domain-specific settings, are conducted. The proposed model achieves high accuracy in domain-specific and domain-agnostic experiments, and it also outperforms the current state-of-the-art models. We perform experiments on zero-shot settings to assess the effectiveness of language-agnostic feature transfer across different languages, showing encouraging results. Cross-domain transfer experiments are also performed to assess language-independent feature transfer of the model. We also offer a multilingual multidomain fake news detection dataset of five languages and seven different domains that could be useful for the research and development in resource-scarce scenarios.

References

[1]

Hugo Queiroz Abonizio, Janaína Ignácio de Morais, Gabriel Marques Tavares, and Sylvio Barbon Junior. 2020. Language-independent fake news detection: English, Portuguese, and Spanish mutual features. Future Internet 12, 5 (2020), 87. DOI:

[2]

Abien Fred Agarap. 2018. Deep learning using rectified linear units (ReLU). CoRR abs/1803.08375 (2018). arxiv:1803.08375. http://arxiv.org/abs/1803.08375.

[3]

Michael Barthel, Amy Mitchell, and Jesse Holcomb. 2016. Many Americans believe fake news is sowing confusion. Pew Research Center 15 (2016), 12.

[4]

Gaurav Bhatt, Aman Sharma, Shivam Sharma, Ankush Nagpal, Balasubramanian Raman, and Ankush Mittal. 2017. On the benefit of combining neural, statistical and external features for fake news identification. CoRR abs/1712.03935 (2017). arxiv:1712.03935. http://arxiv.org/abs/1712.03935.

[5]

Daniel Cer, Yinfei Yang, Sheng-Yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, et al. 2018. Universal sentence encoder. CoRR abs/1803.11175 (2018). arxiv:1803.11175. http://arxiv.org/abs/1803.11175.

[6]

Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 670–680. DOI:

[7]

Nadia K. Conroy, Victoria L. Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology 52, 1 (2015), 1–4. DOI:

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers). 4171–4186. DOI:

[9]

Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many rater. Psychological Bulletin 76 (1971), 378–382.

[10]

Michael Goodyear. 2020. Fake news in the time of COVID-19: Inherent powers over public health. Available at SSRN 3740639 (2020). https://ssrn.com/abstract=3740639 or https://doi.org/10.2139/ssrn.3740639.

[11]

Gael Guibon, Liana Ermakova, Hosni Seffih, Anton Firsov, and Guillaume Le Noe-Bienvenu. 2019. Multilingual Fake News Detection with Satire. Retrieved October 17, 2021 from https://halshs.archives-ouvertes.fr/halshs-02391141.

[12]

Andreas Hanselowski, Avinesh PVS, Benjamin Schiller, Felix Caspelherr, Debanjan Chaudhuri, Christian M. Meyer, and Iryna Gurevych. 2018. A retrospective analysis of the fake news challenge stance-detection task. In Proceedings of the 27th International Conference on Computational Linguistics. 1859–1874. https://www.aclweb.org/anthology/C18-1158.

[13]

Kathleen Jamieson and Joseph Cappella. 2008. Echo Chamber: Rush Limbaugh and the Conservative Media Establishment. Oxford University Press.

[14]

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 785–794. DOI:

[15]

J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159–174.

[16]

Raymond S. Nickerson. 1998. Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology 2, 2 (1998), 175–220. DOI:

[17]

Ofcom. 2020. Covid-19 News and Information: Consumption and Attitudes. Results from Week One of Ofcom’s Online Survey. Retrieved October 17, 2021 from https://www.ofcom.org.uk/__data/assets/pdf_file/0031/193747/covid-19-news-consumption-week-one-findings.pdf.

[18]

World Health Organization. 2020. Coronavirus Disease (COVID-19) Advice for the Public: Mythbusters. World Health Organization.

[19]

Matteo Pagliardini, Prakhar Gupta, and Martin Jaggi. 2018. Unsupervised learning of sentence embeddings using compositional n-gram features. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 528–540. DOI:

[20]

Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. 2018. Automatic detection of fake news. In Proceedings of the 27th International Conference on Computational Linguistics. 3391–3401. https://www.aclweb.org/anthology/C18-1287.

[21]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Retrieved October 17, 2021 from https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.

[22]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392. DOI:

[23]

Victoria L. Rubin, Yimin Chen, and Nadia K. Conroy. 2015. Deception detection for news: Three types of fakes. Proceedings of the Association for Information Science and Technology 52, 1 (2015), 1–4. DOI:

[24]

Tanik Saikh, Arkadipta De, Asif Ekbal, and Pushpak Bhattacharyya. 2019. A deep learning approach for automatic detection of fake news. In Proceedings of the 16th International Conference on Natural Language Processing (ICON’19). 230–238. https://cdn.iiit.ac.in/cdn/ltrc.iiit.ac.in/icon2019/icon2019proceedings.pdf.

[25]

Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. SIGKDD Explorations Newsletter 19, 1 (Sept. 2017), 22–36. DOI:

Digital Library

[26]

Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake News Detection on Social Media: A Data Mining Perspective. Retrieved October 17, 2021 from https://doi.org/10.1145/3137597.3137600

[27]

James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: A large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 809–819. DOI:

[28]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates Inc., 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.

[29]

Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science 359, 6380 (2018), 1146–1151. DOI:

[30]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). https://openreview.net/forum?id=rJ4km2R5t7.

[31]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. Google’s neural Machine translation system: Bridging the gap between human and machine translation. arxiv:1609.08144 [cs.CL]

[32]

Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. 2019. Defending against neural fake news. In Advances in Neural Information Processing Systems 32.

Cited By

Siciliano FMaiano LPapa LBaccini FAmerini ISilvestri F(2025)Adversarial Data Poisoning for Fake News Detection: How to Make a Model Misclassify a Target News Without Modifying itMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74627-7_44(525-530)Online publication date: 1-Jan-2025
https://doi.org/10.1007/978-3-031-74627-7_44
Al-Zahrani LAl-Yahya M(2024)Pre-Trained Language Model Ensemble for Arabic Fake News DetectionMathematics10.3390/math1218294112:18(2941)Online publication date: 21-Sep-2024
https://doi.org/10.3390/math12182941
Leite JRazuvayevskaya OBontcheva KScarton CSerra ESpezzano F(2024)EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News ArticlesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679167(5380-5384)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679167
Show More Cited By

Index Terms

A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families

The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction a difficult task for low-resource languages. The pivot language and cognate recognition approaches have been proven useful for inducing bilingual lexicons for such ...
Multilingual Offensive Language Identification for Low-resource Languages
Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, ...
A robust transformation-based learning approach using ripple down rules for part-of-speech tagging

In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task. Our approach is based on an incremental knowledge acquisition method where rules are stored in an exception structure and new ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 1

January 2022

442 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3494068

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2021

Accepted: 01 June 2021

Revised: 01 June 2021

Received: 01 September 2020

Published in TALLIP Volume 21, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
2,305
Total Downloads

Downloads (Last 12 months)646
Downloads (Last 6 weeks)40

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Siciliano FMaiano LPapa LBaccini FAmerini ISilvestri F(2025)Adversarial Data Poisoning for Fake News Detection: How to Make a Model Misclassify a Target News Without Modifying itMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74627-7_44(525-530)Online publication date: 1-Jan-2025
https://doi.org/10.1007/978-3-031-74627-7_44
Al-Zahrani LAl-Yahya M(2024)Pre-Trained Language Model Ensemble for Arabic Fake News DetectionMathematics10.3390/math1218294112:18(2941)Online publication date: 21-Sep-2024
https://doi.org/10.3390/math12182941
Leite JRazuvayevskaya OBontcheva KScarton CSerra ESpezzano F(2024)EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News ArticlesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679167(5380-5384)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679167
Wang ZKang K(2024)A Novel Natural Language Processing Model Transfer Strategy Tailored for Deep Learning PlatformsJournal of Circuits, Systems and Computers10.1142/S021812662550050134:02Online publication date: 28-Oct-2024
https://doi.org/10.1142/S0218126625500501
Li SShi Q(2024)Deep Learning-Based Human Action Recognition in VideosJournal of Circuits, Systems and Computers10.1142/S0218126625500409Online publication date: 28-Sep-2024
https://doi.org/10.1142/S0218126625500409
Guo ZZhang QDing FZhu XYu K(2024)A Novel Fake News Detection Model for Context of Mixed Languages Through Multiscale TransformerIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.329848011:4(5079-5089)Online publication date: Aug-2024
https://doi.org/10.1109/TCSS.2023.3298480
Shi YZhang XShang YYu N(2024)Don’t Be Misled by Emotion! Disentangle Emotions and Semantics for Cross-Language and Cross-Domain Rumor DetectionIEEE Transactions on Big Data10.1109/TBDATA.2023.333463410:3(249-259)Online publication date: Jun-2024
https://doi.org/10.1109/TBDATA.2023.3334634
Wijayanti RNi’Mah I(2024)Can BERT Learn Evidence-Aware Representation for Low Resource Fake News Detection?2024 International Conference on Computer, Control, Informatics and its Applications (IC3INA)10.1109/IC3INA64086.2024.10732358(231-236)Online publication date: 9-Oct-2024
https://doi.org/10.1109/IC3INA64086.2024.10732358
Tufchi SYadav AAhmed T(2024)Improved Fake News Detection by Combining Sentence Transformers, Variational Autoencoders, and Topic Modelling: the VAE-Topic Model Fusion Method2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT)10.1109/IC2PCT60090.2024.10486402(857-863)Online publication date: 9-Feb-2024
https://doi.org/10.1109/IC2PCT60090.2024.10486402
Agarwal ASingh YRai V(2024)Deciphering Deception: Unmasking Fake News in Multilingual Contexts2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT)10.1109/IC2PCT60090.2024.10486302(807-812)Online publication date: 9-Feb-2024
https://doi.org/10.1109/IC2PCT60090.2024.10486302
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents