Skip to main content

FakeRecogna: A New Brazilian Corpus for Fake News Detection

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2022)

Abstract

Fake news has become a research topic of great importance in Natural Language Processing due to its negative impact on our society. Although its pertinence, there are few datasets available in Brazilian Portuguese and mostly comprise few samples. Therefore, this paper proposes creating a new fake news dataset named FakeRecogna that contains a greater number of samples, more up-to-date news, and covering a few of the most important categories. We perform a toy evaluation over the created dataset using traditional classifiers such as Naive Bayes, Optimum-Path Forest, and Support Vector Machines. A Convolutional Neural Network is also evaluated in the context of fake news detection in the proposed dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.kaggle.com/rogeriochaves/boatos-de-whatsapp-boatosorg.

  2. 2.

    https://reporterslab.org/fact-checking/.

  3. 3.

    https://g1.globo.com/.

  4. 4.

    https://www.uol.com.br/.

  5. 5.

    https://extra.globo.com/.

  6. 6.

    https://www.gov.br/saude/pt-br.

  7. 7.

    https://github.com/recogna-lab/datasets/tree/master/FakeRecogna.

  8. 8.

    We used an implementation provided by OPFython library [26].

  9. 9.

    We used the well-known TensorFlow library [1].

  10. 10.

    https://keras.io/api/optimizers/adam/.

  11. 11.

    https://keras.io/api/losses/probabilistic_losses/.

References

  1. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, pp. 265–283. OSDI 2016, USENIX Association, USA (2016)

    Google Scholar 

  2. Abirami, S., Chitra, P.: Energy-efficient edge based real-time healthcare support system. In: Raj, P., Evangeline, P. (eds.) The Digital Twin Paradigm for Smarter Systems and Environments: The Industry Use Cases, Advances in Computers, vol. 117, pp. 339–368. Elsevier, Amsterdam (2020)

    Chapter  Google Scholar 

  3. Abonizio, H.Q., de Morais, J.I., Tavares, G.M., Barbon Junior, S.: Language-independent fake news detection: English, Portuguese, and Spanish mutual features. Future Internet 12(5), 87 (2020)

    Article  Google Scholar 

  4. Ahmed, H., Traore, I., Saad, S.: Detection of online fake news using N-gram analysis and machine learning techniques. In: Traore, I., Woungang, I., Awad, A. (eds.) ISDDC 2017. LNCS, vol. 10618, pp. 127–138. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69155-8_9

    Chapter  Google Scholar 

  5. Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31, 211–236 (2017)

    Article  Google Scholar 

  6. Aphiwongsophon, S., Chongstitvatana, P.: Detecting fake news with machine learning method. In: 2018 15th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 528–531 (2018)

    Google Scholar 

  7. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)

    MATH  Google Scholar 

  8. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  9. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. COLT 1992, Association for Computing Machinery, New York, NY, USA (1992)

    Google Scholar 

  10. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  11. Dalto, M., Matuško, J., Vašak, M.: Deep neural networks for ultra-short-term wind forecasting, pp. 1657–1663 (2015)

    Google Scholar 

  12. Endo, P.T., et al.: Covid-19 rumor: a classified dataset of covid-19 related online rumors in Brazilian Portuguese. In: Mendeley Data V3 (2021)

    Google Scholar 

  13. Ferreira, A., Giraldi, G.: Convolutional neural network approaches to granite tiles classification. Expert Syst. App. 84, 19–29 (2017)

    Google Scholar 

  14. Gilda, S.: Evaluating machine learning algorithms for fake news detection. In: 2017 IEEE 15th Student Conference on Research and Development (SCOReD), pp. 110–115 (2017)

    Google Scholar 

  15. Hippisley, A.: Lexical analysis. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn., pp. 31–58. Chapman and Hall/CRC, Boca Raton (2010)

    Google Scholar 

  16. Honnibal, M., Montani, I.: spaCy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To Appear 7, 411–420 (2017)

    Google Scholar 

  17. Jain, M.K., Gopalani, D., Meena, Y.K., Kumar, R.: Machine learning based fake news detection using linguistic features and word vector features. In: 2020 IEEE 7th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), pp. 1–6 (2020)

    Google Scholar 

  18. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Bengio, Y., LeCun, Y. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Workshop Track Proceedings (2013)

    Google Scholar 

  19. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  20. Monteiro, R.A., et al.: Contributions to the study of fake news in Portuguese: new corpus and automatic detection results. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 324–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_33

    Chapter  Google Scholar 

  21. Moreno, J.a., Bressan, G.: Factck.br: a new dataset to study fake news. In: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web, pp. 525–527. WebMedia 2019, Association for Computing Machinery, New York, NY, USA (2019)

    Google Scholar 

  22. Okano, E.Y., Liu, Z., Ji, D., Ruiz, E.E.S.: Fake news detection on Fake.Br using hierarchical attention networks. In: Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., Gonçalves, T. (eds.) PROPOR 2020. LNCS (LNAI), vol. 12037, pp. 143–152. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41505-1_14

    Chapter  Google Scholar 

  23. Papa, J.P., Falcão, A.X., Albuquerque, V.H.C., Tavares, J.M.R.S.: Efficient supervised optimum-path forest classification for large datasets. Pattern Recogn. 45(1), 512–520 (2012)

    Article  Google Scholar 

  24. Papa, J.P., Falcão, A.X., Suzuki, C.T.N.: Supervised pattern classification based on optimum-path forest. Int. J. Imaging Syst. Technol. 19(2), 120–131 (2009)

    Article  Google Scholar 

  25. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  26. de Rosa, G.H., Papa, J.P., Falcão, A.X.: OPFython: a python-inspired optimum-path forest classifier (2020). https://arxiv.org/abs/2001.10420

  27. Rubin, V.L., Chen, Y., Conroy, N.J.: Deception detection for news: three types of fakes. In: Proceedings of the 78th ASIST Annual Meeting: Information Science with Impact: Research in and for the Community, ASIST 2015, American Society for Information Science, USA (2015)

    Google Scholar 

  28. Silva, R.M., Santos, R.L., Almeida, T.A., Pardo, T.A.: Towards automatically filtering fake news in Portuguese. Expert Syst. App. 146, 113199 (2020)

    Article  Google Scholar 

  29. de Souza, M.P., da Silva, F.R.M., Freire, P.M.S., Goldschmidt, R.R.: A linguistic-based method that combines polarity, emotion and grammatical characteristics to detect fake news in Portuguese. In: Proceedings of the Brazilian Symposium on Multimedia and the Web, pp. 217–224. WebMedia 2020, Association for Computing Machinery, New York, NY, USA (2020)

    Google Scholar 

Download references

Acknowledgments

The authors are grateful to FAPESP grants #2013/07375-0, #2014/12236-1, and #2019/07665-4, and CNPq grants #307066/2017-7 and #427968/2018-6.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Garcia, G.L., Afonso, L.C.S., Papa, J.P. (2022). FakeRecogna: A New Brazilian Corpus for Fake News Detection. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98305-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98304-8

  • Online ISBN: 978-3-030-98305-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics