Skip to main content

Detection of False Information in Spanish Using Machine Learning Techniques

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2022 (IDEAL 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13756))

Abstract

This research focuses on the detection of false claims in Spanish through the use of machine learning techniques. Most of the current work related to automated, or semi-automated, fake news detections are carried out for the English language, however, there is still a large room for improvement in other languages such as Spanish. The detection of fake news content and its dissemination (spread) in online platforms is an open and hard problem, this work is focused on the detection of false and misleading information spreading during the election campaign in the Spanish Parliament and Catalonia crisis in 2019, migration crisis, COVID-19 pandemic, and hate speech against minorities. We propose the use of a machine learning model adapted for dealing with human language understanding tasks, called BERT, which has been trained and experimentally tested. We have collected a corpus of different types of false information and claims such as articles, posts on Facebook, WhatsApp’s messages, tweets, and others. The results evidence how usage of machine learning techniques can help in the identification of false statements with more than 88% accuracy, and in collecting samples of false information. The experiments, with a comparison between different machine learning methods, have also been carried out using previous datasets, providing a comparison between different approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aghakhani, H., Machiry, A., Nilizadeh, S., Kruegel, C., Vigna, G.: Detecting deceptive reviews using generative adversarial networks. In: IEEE Security and Privacy Workshops (SPW), pp. 89–95. IEEE (2018). https://doi.org/10.1109/spw.2018.00022

  2. Alarcon, N.: Fabula AI develops a new algorithm to stop sake sews. Nvidia Developer. News Center. https://developer.nvidia.com/blog/fabula-ai-develops-a-new-algorithm-to-stop-fake-news/. Accessed 15 Apr 2022

  3. Almatarneh, S., Gamallo, P., Pena, F.J.R.: CiTIUS-COLE at semeval-2019 task 5: combining linguistic features to identify hate speech against immigrants and women on multilingual tweets. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 387–390. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019). https://doi.org/10.18653/v1/s19-2068

  4. Aragón, M.E., Jarquín-Vásquez, H.J., Montes-y-Gómez, M., Escalante, H.J., Pineda, L.V., Gómez-Adorno, H., et al.: Overview of MEX-A3T at IberLEF 2020: fake news and aggressiveness analysis in Mexican Spanish. In: IberLEF@ SEPLN, pp. 222–235 (2020). https://doi.org/10.29057/mjmr.v8i16.3926

  5. Baly, R., Karadzhov, G., Alexandrov, D., Glass, J., Nakov, P.: Predicting factuality of reporting and bias of news media sources. arXiv preprint arXiv:1810.01765 (2018). https://doi.org/10.48550/arXiv.1810.01765

  6. Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O., Kompatsiaris, Y.: Detection and visualization of misleading content on Twitter. Int. J. Multimed. Inf. Retrieval 7(1), 71–86 (2017). https://doi.org/10.1007/s13735-017-0143-x

    Article  Google Scholar 

  7. Canete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-trained bert model and evaluation data. Pml4dc at iclr, pp. 1–10 (2020)

    Google Scholar 

  8. Cano-Orón, L., Calvo, D., García, G.L., Baviera, T.: Disinformation in Facebook ads in the 2019 Spanish general election campaigns. Media Commun. 9(1), 217–228 (2021). doi: https://doi.org/10.1080/13216597.2019.1634619

  9. Della Vedova, M.L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M., de Alfaro, L.: Automatic online fake news detection combining content and social signals. In: 2018 22nd Conference of Open Innovations Association (FRUCT), pp. 272–279. IEEE. Jyvaskyla, Finland (2018). https://doi.org/10.23919/fruct.2018.8468301

  10. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). https://doi.org/10.48550/arXiv.1810.04805

  11. Dukic, D., Kržic, A.S.: Detection of hate speech spreaders with BERT. CLEF (Working Notes) (2021)

    Google Scholar 

  12. Elías, C., Catalan-Matamoros, D.: Coronavirus in Spain: Fear of “official” fake news boosts WhatsApp and alternative sources. Media Commun. 8(2), 462–466 (2020). https://doi.org/10.17645/mac.v8i2.3217

  13. Flores Vivar, J.M.: Artificial intelligence and journalism: diluting the impact of misinformation and fakes news through bots. Doxa. Comunicación nº 029, 197–212 (2019)

    Google Scholar 

  14. Gómez-Adorno, H., Posadas-Durán, J.P., Enguix, G.B., Capetillo, C.P.: Overview of FakeDeS at IberLEF 2021: fake news detection in Spanish shared task. Procesamiento del Lenguaje Natural 67, 223–231 (2021). https://doi.org/10.26342/2021-67-19

  15. Han, E.-H., Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 53–65. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45357-1_9

    Chapter  Google Scholar 

  16. Hassan, N., Zhang, G., Arslan, F., Caraballo, J., Jimenez, D., Gawsane, S., et al.: ClaimBuster: the first-ever end-to-end fact-checking system. Proc. VLDB Endowm. 10(12), 1945–1948 (2017). https://doi.org/10.14778/3137765.3137815

  17. Huertas-García, Á., Huertas-Tato, J., Martín, A., Camacho, D.: Countering misinformation through semantic-aware multilingual models. In: Yin, H., et al. (eds.) IDEAL 2021. LNCS, vol. 13113, pp. 312–323. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91608-4_31

    Chapter  Google Scholar 

  18. Huertas-García, Á., Martín, A., Huertas-Tato, J., Camacho, D.: Exploring Dimensionality Reduction Techniques in Multilingual Transformers. arXiv preprint arXiv:2204.08415 (2022). https://doi.org/10.48550/arXiv.2204.08415

  19. Huertas-Tato, J., Martin, A., Camacho, D. BERTuit: understanding Spanish language in Twitter through a native transformer. arXiv preprint arXiv:2204.03465 (2022). https://doi.org/10.48550/arXiv.2204.03465

  20. Huertas-Tato, J., Martín, A., Camacho, D.: SILT: Efficient transformer training for inter-lingual inference. Expert Syst. Appl. 200, 116923 (2022). https://doi.org/10.1016/j.eswa.2022.116923

    Article  Google Scholar 

  21. Internet world stats. INTERNET WORLD USERS BY LANGUAGE. Top 10 Languages. https://www.internetworldstats.com/stats7.htm. Accessed 02 Apr 2022

  22. Iqbal, M.: WhatsApp Revenue and Usage Statistics (2022). Business of Apps. https://www.businessofapps.com/data/whatsapp-statistics/. Accessed 30 June 2022

  23. Isa, D., Lee, L.H., Kallimani, V.P., Rajkumar, R.: Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Trans. Knowl. Data Eng. 20(9), 1264–1272 (2008). https://doi.org/10.1109/tkde.2008.76

    Article  Google Scholar 

  24. Kosslyn, J., Yu, C.: Fact check now available in google search and news around the world. Google Blog. https://blog.google/products/search/fact-check-now-available-google-search-and-news-around-world/. Accessed 03 Apr 2022

  25. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2267–2273 (2015)

    Google Scholar 

  26. Lozhnikov, N., Derczynski, L., Mazzara, M.: Stance prediction for Russian: data and analysis. In: Ciancarini, P., Mazzara, M., Messina, A., Sillitti, A., Succi, G. (eds.) SEDA 2018. AISC, vol. 925, pp. 176–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-14687-0_16

    Chapter  Google Scholar 

  27. Magallón-Rosa, R. Nuevos formatos de verificación. El caso de Maldito Bulo en Twitter. Sphera publica 1(18), 41–65 (2018). https://doi.org/10.6084/m9.figshare.6142808

  28. Magallón-Rosa, R.: Desinformación en campaña electoral. Telos. https://telos.fundaciontelefonica.com/desinformacion-en-campana-electoral/. Accessed 25 Mar 2022

  29. Martín, A., Huertas-Tato, J., Huertas-García, Á., Villar-Rodríguez, G., Camacho, D.: FacTeR-Check: Semi-automated fact-checking through Semantic Similarity and Natural Language Inference. arXiv preprint arXiv:2110.14532 (2021). https://doi.org/10.48550/arXiv.2110.14532

  30. Mozafari, M., Farahbakhsh, R., Crespi, N.: Hate speech detection and racial bias mitigation in social media based on BERT model. PLoS ONE 15(8), e0237861 (2020). https://doi.org/10.1371/journal.pone.0237861

    Article  Google Scholar 

  31. Paschen, J.: Investigating the emotional appeal of fake news using artificial intelligence and human contributions. J. Prod. Brand Manage. 29(2), 223–233 (2019). https://doi.org/10.1108/JPBM-12-2018-2179

    Article  Google Scholar 

  32. Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. arXiv preprint arXiv:1708.07104 (2017)

  33. Posadas-Durán, J. P., Gómez-Adorno, H., Sidorov, G., Escobar, J. J. M.: Detection of fake news in a new corpus for the Spanish language. J. Intell. Fuzzy Syst. 36(5), 4869–4876 (2019). https://doi.org/10.48550/arXiv.1708.07104

  34. Qayyum, A., Qadir, J., Janjua, M.U., Sher, F.: Using blockchain to rein in the new post-truth world and check the spread of fake news. IT Professional 21(4), 16–24 (2019). https://doi.org/10.1109/mitp.2019.2910503

    Article  Google Scholar 

  35. Rama, J., Cordero, G., Zagórski, P.: Three Is a Crowd? Podemos, Ciudadanos, and Vox: The End of Bipartisanship in Spain. Front. Political Sci. 95 (2021). https://doi.org/10.3389/fpos.2021.688130

  36. Reyes, J., Palafox, L.: Detection of fake news based on readability. RIIAA 2019 Conference Submission (2019). https://openreview.net/forum?id=ByxTOnokxr. Accessed 30 May 2022

  37. San Norberto, E.M., Gómez-Alonso, D., Trigueros, J.M., Quiroga, J., Gualis, J., Vaquero, C.: Readability of surgical informed consent in Spain. Cirugía Española (English Edition) 92(3), 201–207 (2014). doi:https://doi.org/10.1016/j.cireng.2013.02.010

  38. Silva, R.M., Santos, R.L., Almeida, T.A., Pardo, T.A.: Towards automatically filtering fake news in Portuguese. Expert Syst. Appl. 146, 113199 (2020). https://doi.org/10.1016/j.eswa.2020.113199

    Article  Google Scholar 

  39. Villar-Rodríguez, G., Souto-Rico, M., Martín, A.: Virality, only the tip of the iceberg: ways of spread and interaction around COVID-19 misinformation in Twitter. Commun. Soc., 239–256 (2022). https://doi.org/10.15581/003.35.2.239-256

  40. Yang, K.C., Niven, T., Kao, H.Y.: Fake news detection as natural language inference. In: WSDM ‘19 Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining 2019 arXiv preprint arXiv:1907.07347 (2019). https://doi.org/10.48550/arXiv.1907.07347

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arsenii Tretiakov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tretiakov, A., Martín, A., Camacho, D. (2022). Detection of False Information in Spanish Using Machine Learning Techniques. In: Yin, H., Camacho, D., Tino, P. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2022. IDEAL 2022. Lecture Notes in Computer Science, vol 13756. Springer, Cham. https://doi.org/10.1007/978-3-031-21753-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21753-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21752-4

  • Online ISBN: 978-3-031-21753-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics