FakeRecogna: A New Brazilian Corpus for Fake News Detection

Garcia, Gabriel L.; Afonso, Luis C. S.; Papa, João P.

doi:10.1007/978-3-030-98305-5_6

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13208))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

732 Accesses

Abstract

Fake news has become a research topic of great importance in Natural Language Processing due to its negative impact on our society. Although its pertinence, there are few datasets available in Brazilian Portuguese and mostly comprise few samples. Therefore, this paper proposes creating a new fake news dataset named FakeRecogna that contains a greater number of samples, more up-to-date news, and covering a few of the most important categories. We perform a toy evaluation over the created dataset using traditional classifiers such as Naive Bayes, Optimum-Path Forest, and Support Vector Machines. A Convolutional Neural Network is also evaluated in the context of fake news detection in the proposed dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.kaggle.com/rogeriochaves/boatos-de-whatsapp-boatosorg.
2.
https://reporterslab.org/fact-checking/.
3.
https://g1.globo.com/.
4.
https://www.uol.com.br/.
5.
https://extra.globo.com/.
6.
https://www.gov.br/saude/pt-br.
7.
https://github.com/recogna-lab/datasets/tree/master/FakeRecogna.
8.
We used an implementation provided by OPFython library [26].
9.
We used the well-known TensorFlow library [1].
10.
https://keras.io/api/optimizers/adam/.
11.
https://keras.io/api/losses/probabilistic_losses/.

References

Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, pp. 265–283. OSDI 2016, USENIX Association, USA (2016)
Google Scholar
Abirami, S., Chitra, P.: Energy-efficient edge based real-time healthcare support system. In: Raj, P., Evangeline, P. (eds.) The Digital Twin Paradigm for Smarter Systems and Environments: The Industry Use Cases, Advances in Computers, vol. 117, pp. 339–368. Elsevier, Amsterdam (2020)
Chapter Google Scholar
Abonizio, H.Q., de Morais, J.I., Tavares, G.M., Barbon Junior, S.: Language-independent fake news detection: English, Portuguese, and Spanish mutual features. Future Internet 12(5), 87 (2020)
Article Google Scholar
Ahmed, H., Traore, I., Saad, S.: Detection of online fake news using N-gram analysis and machine learning techniques. In: Traore, I., Woungang, I., Awad, A. (eds.) ISDDC 2017. LNCS, vol. 10618, pp. 127–138. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69155-8_9
Chapter Google Scholar
Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31, 211–236 (2017)
Article Google Scholar
Aphiwongsophon, S., Chongstitvatana, P.: Detecting fake news with machine learning method. In: 2018 15th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 528–531 (2018)
Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
MATH Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. COLT 1992, Association for Computing Machinery, New York, NY, USA (1992)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Dalto, M., Matuško, J., Vašak, M.: Deep neural networks for ultra-short-term wind forecasting, pp. 1657–1663 (2015)
Google Scholar
Endo, P.T., et al.: Covid-19 rumor: a classified dataset of covid-19 related online rumors in Brazilian Portuguese. In: Mendeley Data V3 (2021)
Google Scholar
Ferreira, A., Giraldi, G.: Convolutional neural network approaches to granite tiles classification. Expert Syst. App. 84, 19–29 (2017)
Google Scholar
Gilda, S.: Evaluating machine learning algorithms for fake news detection. In: 2017 IEEE 15th Student Conference on Research and Development (SCOReD), pp. 110–115 (2017)
Google Scholar
Hippisley, A.: Lexical analysis. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn., pp. 31–58. Chapman and Hall/CRC, Boca Raton (2010)
Google Scholar
Honnibal, M., Montani, I.: spaCy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To Appear 7, 411–420 (2017)
Google Scholar
Jain, M.K., Gopalani, D., Meena, Y.K., Kumar, R.: Machine learning based fake news detection using linguistic features and word vector features. In: 2020 IEEE 7th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), pp. 1–6 (2020)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Bengio, Y., LeCun, Y. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Workshop Track Proceedings (2013)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Monteiro, R.A., et al.: Contributions to the study of fake news in Portuguese: new corpus and automatic detection results. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 324–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_33
Chapter Google Scholar
Moreno, J.a., Bressan, G.: Factck.br: a new dataset to study fake news. In: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web, pp. 525–527. WebMedia 2019, Association for Computing Machinery, New York, NY, USA (2019)
Google Scholar
Okano, E.Y., Liu, Z., Ji, D., Ruiz, E.E.S.: Fake news detection on Fake.Br using hierarchical attention networks. In: Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., Gonçalves, T. (eds.) PROPOR 2020. LNCS (LNAI), vol. 12037, pp. 143–152. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41505-1_14
Chapter Google Scholar
Papa, J.P., Falcão, A.X., Albuquerque, V.H.C., Tavares, J.M.R.S.: Efficient supervised optimum-path forest classification for large datasets. Pattern Recogn. 45(1), 512–520 (2012)
Article Google Scholar
Papa, J.P., Falcão, A.X., Suzuki, C.T.N.: Supervised pattern classification based on optimum-path forest. Int. J. Imaging Syst. Technol. 19(2), 120–131 (2009)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
MathSciNet MATH Google Scholar
de Rosa, G.H., Papa, J.P., Falcão, A.X.: OPFython: a python-inspired optimum-path forest classifier (2020). https://arxiv.org/abs/2001.10420
Rubin, V.L., Chen, Y., Conroy, N.J.: Deception detection for news: three types of fakes. In: Proceedings of the 78th ASIST Annual Meeting: Information Science with Impact: Research in and for the Community, ASIST 2015, American Society for Information Science, USA (2015)
Google Scholar
Silva, R.M., Santos, R.L., Almeida, T.A., Pardo, T.A.: Towards automatically filtering fake news in Portuguese. Expert Syst. App. 146, 113199 (2020)
Article Google Scholar
de Souza, M.P., da Silva, F.R.M., Freire, P.M.S., Goldschmidt, R.R.: A linguistic-based method that combines polarity, emotion and grammatical characteristics to detect fake news in Portuguese. In: Proceedings of the Brazilian Symposium on Multimedia and the Web, pp. 217–224. WebMedia 2020, Association for Computing Machinery, New York, NY, USA (2020)
Google Scholar

Download references

Acknowledgments

The authors are grateful to FAPESP grants #2013/07375-0, #2014/12236-1, and #2019/07665-4, and CNPq grants #307066/2017-7 and #427968/2018-6.

Author information

Authors and Affiliations

School of Sciences, São Paulo State University, Bauru, Brazil
Gabriel L. Garcia, Luis C. S. Afonso & João P. Papa

Authors

Gabriel L. Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Luis C. S. Afonso
View author publications
You can also search for this author in PubMed Google Scholar
João P. Papa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidade de Fortaleza, Fortaleza, Brazil
Vládia Pinheiro
CiTIUS - Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Pablo Gamallo
Universidade Nova de Lisboa, Lisbon, Portugal
Raquel Amaro
University of Sheffield, Sheffield, UK
Carolina Scarton
INESC-ID, Lisbon, Portugal
Fernando Batista
Federal University of São Carlos, São Carlos, Brazil
Diego Silva
University of Lisbon, Lisbon, Portugal
Catarina Magro
Sentimonitor, Porto Alegre, Brazil
Hugo Pinto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garcia, G.L., Afonso, L.C.S., Papa, J.P. (2022). FakeRecogna: A New Brazilian Corpus for Fake News Detection. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-98305-5_6
Published: 16 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98304-8
Online ISBN: 978-3-030-98305-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics