Abstract
This paper presents how we tackled the COVID 19 Fake News Detection in English subtask in the SHARED TASK@ CONSTRAINT 2021 using RoBERTa. We perform extensive analysis to understand the pattern of the data distribution. To achieve an F1 score of 0.96, we incorporate external sources of misinformation and fine tune multiple state of the art pretrained deep learning models. In the end, we visualise the true and false positives predicted by our model as improvement in future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: 27th International Conference on Computational Linguistics, COLING 2018, pp. 1638–1649 (2018)
Al Asaad, B., Erascu, M.: A tool for fake news detection. In: 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 379–386 (2018). https://doi.org/10.1109/SYNASC.2018.00064
Anand, A.: English fake news detection code. https://github.com/zutshianand/EnglishFakeNewsDetection. Accessed 13 Jan 2021
Banik, S.: COVID fake news data (2020). https://doi.org/10.5281/zenodo.4282522
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805. arXiv:1810.04805 (2018)
Dominik, H.: Sentiment analysis. Data Science – Analytics and Applications, pp. 111–112. Springer, Wiesbaden (2017). https://doi.org/10.1007/978-3-658-19287-7_17
Hutto, C.J., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text (2015)
InfoDemConference: Infodemiology conference. https://www.who.int/teams/risk-communication/infodemic-management/1st-who-infodemiology-conference. Accessed 21 July 2020
Lan, Z., et al.: ALBERT: a lite BERT for self-supervised learning of language representations. CORR abs/1911.03310. arXiv:1909.11942 (2019)
Libovický, J., Rosa, R., Fraser, A.: How language-neutral is multilingual BERT? CORR abs/1911.03310. arXiv:1911.03310 (2019)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692. arXiv:1907.11692 (2019)
Mourad, A., Srour, A., Harmanai, H., Jenainati, C., Arafeh, M.: Critical impact of social networks infodemic on defeating coronavirus COVID-19 pandemic: Twitter-based study and research directions. IEEE Trans. Netw. Serv. Manage. 17(4), 2145–2155 (2020)
Nakamura, N., Levy, S., Wang, W.Y.: r/Fakeddit: a new multimodal benchmark dataset for fine-grained fake news detection. CoRR abs/1911.03854. arXiv:1911.03854 (2019)
Patwa, P., et al.: Fighting an infodemic: COVID-19 fake news dataset. arXiv preprint arXiv:2011.03327 (2020)
Patwa, P., et al.: Overview of constraint 2021 shared tasks: detecting English COVID-19 fake news and Hindi hostile posts. In: Chakraborty, T., Shu, K., Bernard, R., Liu, H., Akhtar, M.S. (eds.) Proceedings of the First Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation, CONSTRAINT 2021, CCIS, vol. 1402, pp. 42–53. Springer, Cham (2021)
Poynter: Poynter.org misinformation tweets. https://www.poynter.org/ifcn-covid-19-misinformation. Accessed 1 Feb 2020
PREVIS: Preprocessing python package. https://pypi.org/project/previs/1.01/. Accessed 12 Nov 2020
Radford, A., Jeff, W.: Language models are unsupervised multitask learners (2019)
Sanh, V., et al.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108. arXiv:1910.01108 (2019)
Shu, K., et al.: FakeNewsNet: a data repository with news content, social context and dynamic information for studying fake news on social media. CoRR abs/1809.01286. arXiv:1809.01286 (2018)
Thorne, J., et al.: FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana, vol. 1 (Long Papers), pp. 809–819. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/N18-1074. https://www.aclweb.org/anthology/N18-1074
Wang, W.Y.: “Liar, liar pants on fire”: a new benchmark dataset for fake news detection. ACL (2017)
WHO: Immunizing the public against misinformation. https://www.who.int/news-room/feature-stories/detail/immunizing-the-public-against-misinformation. Accessed 25 Aug 2020
WHO: WHO social listening conference. https://www.who.int/docs/default-source/epi-win/artificial-intelligence-and-social-listening-to-inform-policy.pdf?sfvrsn=4e8e0dbb_2. Accessed 11 June 2020
Yang, K.C., Torres-Lugo, C., Menczer, F.: Prevalence of low-credibility information on Twitter during the COVID-19 outbreak. arXiv e-prints arXiv:2004.14484 [cs.CY], April 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zutshi, A., Raj, A. (2021). Tackling the Infodemic: Analysis Using Transformer Based Models. In: Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., Akhtar, M.S. (eds) Combating Online Hostile Posts in Regional Languages during Emergency Situation. CONSTRAINT 2021. Communications in Computer and Information Science, vol 1402. Springer, Cham. https://doi.org/10.1007/978-3-030-73696-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-73696-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73695-8
Online ISBN: 978-3-030-73696-5
eBook Packages: Computer ScienceComputer Science (R0)