Abstract
Free uncontrolled access to the Internet is the main reason for fake news propagation on the Internet both in social media and in regular Internet publications. In this paper we study the potential of several BERT-based models to detect fake news related to politics. Our contribution to the area consists of testing BERT, RoBERTa and MNLI RoBERTa models with (a) short and long texts; (b) ensembling with the best models; (c) noisy texts. To improve ensembling, we introduce an additional class ‘Doubtful news’. To create noisy data we use cross-translation. For the experiments we consider the well-known FRN (Fake vs. Real News, long texts) and LIAR (short texts) datasets. The results we obtained on the long texts dataset are higher than the results we obtained on the short texts dataset. The proposed approach to ensembling provided significant improvement of the results. The experiments with noisy data demonstrated high noise immunity of the BERT model with long news and the RoBERTa model with short news.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
textattack.readthedocs.io/en/latest/.
- 5.
- 6.
- 7.
References
Hunt, E.: What is fake news? How to spot it and what you can do to stop it. The Guardian (2016). https://www.theguardian.com/media/2016/dec/18/what-is-fake-news-pizzagate
Bandyopadhyay, S., Dutta, S.: Analysis of fake news in social medias for four months during lockdown in COVID-19 (2020). https://doi.org/10.20944/preprints202006.0243.v1
Gravanis, G., Vakali, A., Diamantaras, K., Karadais, P.: Behind the cues: a benchmarking study for fake news detection. Expert Syst. Appl. 128, 201–213 (2019)
Long, Y., Lu, Q., Xiang, R., Li, M., Huang, C.-R.: Fake news detection through multi-perspective speaker profiles. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, pp. 252–256 (2017)
Kirilin, A., Strube, M.: Exploiting a speaker’s credibility to detect fake news. In: Proceedings of Data Science, Journalism & Media Workshop at KDD (DSJM 2018) (2018)
Bhattacharjee, S.D., Talukder, A., Balantrapu, B.V.: Active learning based news veracity detection with feature weighting and deep-shallow fusion. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 556–565. IEEE (2017)
Rashkin, H., Choi, E., Jang, J., Volkova, S., Choi, Y.: Truth of varying shades: analyzing language in fake news and political fact-checking. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2931–2937 (2017)
Hamdi, T., Slimi, H., Bounhas, I., Slimani, Y.: A hybrid approach for fake news detection in twitter based on user features and graph embedding. In: Hung, D.V., D’Souza, M. (eds.) ICDCIT 2020. LNCS, vol. 11969, pp. 266–280. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36987-3_17
Oshikawa, R., Qian, J., and Wang., W.: A survey on natural language processing for fake news detection. arXiv preprint arXiv:1811.00770 (2018)
Akhtyamova, L.: Named entity recognition in Spanish biomedical literature: short review and BERT model. In: 26th Conference of Open Innovations Association (FRUCT), pp. 1–7. IEEE (2020)
Adhikari, A., Ram, A., Tang, R., and Lin, J.: Docbert: Bert for document classification. arXiv preprint arXiv:1904.08398 (2019)
Gonzalez-Carvajal S., Garrido-Merch E.: Comparing Bert against traditional machine learning text classification. arXiv preprint arXiv:2005.13012 (2020)
Flores, L.J., Yu, Hao, Y.: An adversarial benchmark for fake news detection models. arXiv:2201.00912v1 (2022)
Ali, H., et al.: All your fake detector are belong to us: evaluating adversarial robustness of fake-news detectors under black-box settings. IEEE Access 9, 81678–81692 (2021)
Yuan, H., et al.: Improving fake news detection with domain-adversarial and graph-attention neural network. Decis. Support Syst. 151, 113633 (2021)
Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018)
Giachanou, A., Rosso, P., Crestani, F.: Leveraging emotional signals for credibility detection. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019), Paris, France, 21–25 July (2019)
Pulido, C.M., Ruiz-Eugenio, L., Redondo-Sama, G., Villarejo-Carballido, B.: A new application of social impact in social media for overcoming fake news in health. Int. J. Environ. Res. Public Health 17, 2430 (2020)
Hovold, J.: Naive Bayes spam filtering using word-position-based attributes. In: CEAS, pp. 41–48 (2005)
Petrov, A., Proncheva, O.: Identifying the topics of Russian political talk shows. In: Proceedings of the Conference on Modeling and Analysis of Complex Systems and Processes, 22–24 October (MACSPro 2020), pp. 79–86. CEUR-WS.org (2020). online. https://ceur-ws.org/Vol-2795/short1.pdf
Popova, S., Skitalinskaya, G.: Extended list of stop words: does it work for keyphrase extraction from short texts? In: Proceedings of 12th Intern Scientific and Technical Conference on Computing Sciences and Information Technologies (CSIT-2017), pp. 401–404. IEEE (2017)
Khan, J.Y., Khondaker, M.T.I., Afroz, S., Uddin, G., Iqbal, A.: A benchmark study of machine learning models for online fake news detection. Mach. Learn. Appl. 100032. https://arxiv.org/abs/1905.04749 (2021)
GitHub Repository. https://github.com/joolsa. Accessed 12 Mar 2022
Wang, W.Y.: “Liar, Liar Pants on Fire”: a new benchmark dataset for fake news detection. ACL. https://arxiv.org/abs/1705.00648 (2017)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805 (2018)
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. https://arxiv.org/abs/1907.11692 (2019)
Loshchilov, I., Hutter F.: Fixing weight decay regularization in ADAM. arXiv preprint arXiv:1711.05101 (2017)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)
Wolf, T. et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020)
Glazkova, A., Glazkov, M., Trifonov, T.: g2tmn at constraint@AAAI2021: exploiting CT-BERT and Ensembling learning for COVID-19 fake news detection. In: Combating Online Hostile Posts in Regional Languages during Emergency Situation, pp. 116–127 (2021)
Akhtyamova, L., Alexandrov, M., Cardiff, J., Koshulko, O.: Opinion mining on small and noisy samples of health-related texts. In: Shakhovska, N., Medykovskyy, M.O. (eds.) CSIT 2018. AISC, vol. 871, pp. 379–390. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01069-0_27
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Shushkevich, E., Alexandrov, M., Cardiff, J. (2022). BERT-based Classifiers for Fake News Detection on Short and Long Texts with Noisy Data: A Comparative Analysis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-16270-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16269-5
Online ISBN: 978-3-031-16270-1
eBook Packages: Computer ScienceComputer Science (R0)