Detection of False Information in Spanish Using Machine Learning Techniques

Tretiakov, Arsenii; Martín, Alejandro; Camacho, David

doi:10.1007/978-3-031-21753-1_5

Arsenii Tretiakov¹⁰,
Alejandro Martín¹⁰ &
David Camacho¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13756))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

900 Accesses
1 Citations
1 Altmetric

Abstract

This research focuses on the detection of false claims in Spanish through the use of machine learning techniques. Most of the current work related to automated, or semi-automated, fake news detections are carried out for the English language, however, there is still a large room for improvement in other languages such as Spanish. The detection of fake news content and its dissemination (spread) in online platforms is an open and hard problem, this work is focused on the detection of false and misleading information spreading during the election campaign in the Spanish Parliament and Catalonia crisis in 2019, migration crisis, COVID-19 pandemic, and hate speech against minorities. We propose the use of a machine learning model adapted for dealing with human language understanding tasks, called BERT, which has been trained and experimentally tested. We have collected a corpus of different types of false information and claims such as articles, posts on Facebook, WhatsApp’s messages, tweets, and others. The results evidence how usage of machine learning techniques can help in the identification of false statements with more than 88% accuracy, and in collecting samples of false information. The experiments, with a comparison between different machine learning methods, have also been carried out using previous datasets, providing a comparison between different approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aghakhani, H., Machiry, A., Nilizadeh, S., Kruegel, C., Vigna, G.: Detecting deceptive reviews using generative adversarial networks. In: IEEE Security and Privacy Workshops (SPW), pp. 89–95. IEEE (2018). https://doi.org/10.1109/spw.2018.00022
Alarcon, N.: Fabula AI develops a new algorithm to stop sake sews. Nvidia Developer. News Center. https://developer.nvidia.com/blog/fabula-ai-develops-a-new-algorithm-to-stop-fake-news/. Accessed 15 Apr 2022
Almatarneh, S., Gamallo, P., Pena, F.J.R.: CiTIUS-COLE at semeval-2019 task 5: combining linguistic features to identify hate speech against immigrants and women on multilingual tweets. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 387–390. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019). https://doi.org/10.18653/v1/s19-2068
Aragón, M.E., Jarquín-Vásquez, H.J., Montes-y-Gómez, M., Escalante, H.J., Pineda, L.V., Gómez-Adorno, H., et al.: Overview of MEX-A3T at IberLEF 2020: fake news and aggressiveness analysis in Mexican Spanish. In: IberLEF@ SEPLN, pp. 222–235 (2020). https://doi.org/10.29057/mjmr.v8i16.3926
Baly, R., Karadzhov, G., Alexandrov, D., Glass, J., Nakov, P.: Predicting factuality of reporting and bias of news media sources. arXiv preprint arXiv:1810.01765 (2018). https://doi.org/10.48550/arXiv.1810.01765
Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O., Kompatsiaris, Y.: Detection and visualization of misleading content on Twitter. Int. J. Multimed. Inf. Retrieval 7(1), 71–86 (2017). https://doi.org/10.1007/s13735-017-0143-x
Article Google Scholar
Canete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-trained bert model and evaluation data. Pml4dc at iclr, pp. 1–10 (2020)
Google Scholar
Cano-Orón, L., Calvo, D., García, G.L., Baviera, T.: Disinformation in Facebook ads in the 2019 Spanish general election campaigns. Media Commun. 9(1), 217–228 (2021). doi: https://doi.org/10.1080/13216597.2019.1634619
Della Vedova, M.L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M., de Alfaro, L.: Automatic online fake news detection combining content and social signals. In: 2018 22nd Conference of Open Innovations Association (FRUCT), pp. 272–279. IEEE. Jyvaskyla, Finland (2018). https://doi.org/10.23919/fruct.2018.8468301
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). https://doi.org/10.48550/arXiv.1810.04805
Dukic, D., Kržic, A.S.: Detection of hate speech spreaders with BERT. CLEF (Working Notes) (2021)
Google Scholar
Elías, C., Catalan-Matamoros, D.: Coronavirus in Spain: Fear of “official” fake news boosts WhatsApp and alternative sources. Media Commun. 8(2), 462–466 (2020). https://doi.org/10.17645/mac.v8i2.3217
Flores Vivar, J.M.: Artificial intelligence and journalism: diluting the impact of misinformation and fakes news through bots. Doxa. Comunicación nº 029, 197–212 (2019)
Google Scholar
Gómez-Adorno, H., Posadas-Durán, J.P., Enguix, G.B., Capetillo, C.P.: Overview of FakeDeS at IberLEF 2021: fake news detection in Spanish shared task. Procesamiento del Lenguaje Natural 67, 223–231 (2021). https://doi.org/10.26342/2021-67-19
Han, E.-H., Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 53–65. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45357-1_9
Chapter Google Scholar
Hassan, N., Zhang, G., Arslan, F., Caraballo, J., Jimenez, D., Gawsane, S., et al.: ClaimBuster: the first-ever end-to-end fact-checking system. Proc. VLDB Endowm. 10(12), 1945–1948 (2017). https://doi.org/10.14778/3137765.3137815
Huertas-García, Á., Huertas-Tato, J., Martín, A., Camacho, D.: Countering misinformation through semantic-aware multilingual models. In: Yin, H., et al. (eds.) IDEAL 2021. LNCS, vol. 13113, pp. 312–323. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91608-4_31
Chapter Google Scholar
Huertas-García, Á., Martín, A., Huertas-Tato, J., Camacho, D.: Exploring Dimensionality Reduction Techniques in Multilingual Transformers. arXiv preprint arXiv:2204.08415 (2022). https://doi.org/10.48550/arXiv.2204.08415
Huertas-Tato, J., Martin, A., Camacho, D. BERTuit: understanding Spanish language in Twitter through a native transformer. arXiv preprint arXiv:2204.03465 (2022). https://doi.org/10.48550/arXiv.2204.03465
Huertas-Tato, J., Martín, A., Camacho, D.: SILT: Efficient transformer training for inter-lingual inference. Expert Syst. Appl. 200, 116923 (2022). https://doi.org/10.1016/j.eswa.2022.116923
Article Google Scholar
Internet world stats. INTERNET WORLD USERS BY LANGUAGE. Top 10 Languages. https://www.internetworldstats.com/stats7.htm. Accessed 02 Apr 2022
Iqbal, M.: WhatsApp Revenue and Usage Statistics (2022). Business of Apps. https://www.businessofapps.com/data/whatsapp-statistics/. Accessed 30 June 2022
Isa, D., Lee, L.H., Kallimani, V.P., Rajkumar, R.: Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Trans. Knowl. Data Eng. 20(9), 1264–1272 (2008). https://doi.org/10.1109/tkde.2008.76
Article Google Scholar
Kosslyn, J., Yu, C.: Fact check now available in google search and news around the world. Google Blog. https://blog.google/products/search/fact-check-now-available-google-search-and-news-around-world/. Accessed 03 Apr 2022
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2267–2273 (2015)
Google Scholar
Lozhnikov, N., Derczynski, L., Mazzara, M.: Stance prediction for Russian: data and analysis. In: Ciancarini, P., Mazzara, M., Messina, A., Sillitti, A., Succi, G. (eds.) SEDA 2018. AISC, vol. 925, pp. 176–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-14687-0_16
Chapter Google Scholar
Magallón-Rosa, R. Nuevos formatos de verificación. El caso de Maldito Bulo en Twitter. Sphera publica 1(18), 41–65 (2018). https://doi.org/10.6084/m9.figshare.6142808
Magallón-Rosa, R.: Desinformación en campaña electoral. Telos. https://telos.fundaciontelefonica.com/desinformacion-en-campana-electoral/. Accessed 25 Mar 2022
Martín, A., Huertas-Tato, J., Huertas-García, Á., Villar-Rodríguez, G., Camacho, D.: FacTeR-Check: Semi-automated fact-checking through Semantic Similarity and Natural Language Inference. arXiv preprint arXiv:2110.14532 (2021). https://doi.org/10.48550/arXiv.2110.14532
Mozafari, M., Farahbakhsh, R., Crespi, N.: Hate speech detection and racial bias mitigation in social media based on BERT model. PLoS ONE 15(8), e0237861 (2020). https://doi.org/10.1371/journal.pone.0237861
Article Google Scholar
Paschen, J.: Investigating the emotional appeal of fake news using artificial intelligence and human contributions. J. Prod. Brand Manage. 29(2), 223–233 (2019). https://doi.org/10.1108/JPBM-12-2018-2179
Article Google Scholar
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. arXiv preprint arXiv:1708.07104 (2017)
Posadas-Durán, J. P., Gómez-Adorno, H., Sidorov, G., Escobar, J. J. M.: Detection of fake news in a new corpus for the Spanish language. J. Intell. Fuzzy Syst. 36(5), 4869–4876 (2019). https://doi.org/10.48550/arXiv.1708.07104
Qayyum, A., Qadir, J., Janjua, M.U., Sher, F.: Using blockchain to rein in the new post-truth world and check the spread of fake news. IT Professional 21(4), 16–24 (2019). https://doi.org/10.1109/mitp.2019.2910503
Article Google Scholar
Rama, J., Cordero, G., Zagórski, P.: Three Is a Crowd? Podemos, Ciudadanos, and Vox: The End of Bipartisanship in Spain. Front. Political Sci. 95 (2021). https://doi.org/10.3389/fpos.2021.688130
Reyes, J., Palafox, L.: Detection of fake news based on readability. RIIAA 2019 Conference Submission (2019). https://openreview.net/forum?id=ByxTOnokxr. Accessed 30 May 2022
San Norberto, E.M., Gómez-Alonso, D., Trigueros, J.M., Quiroga, J., Gualis, J., Vaquero, C.: Readability of surgical informed consent in Spain. Cirugía Española (English Edition) 92(3), 201–207 (2014). doi:https://doi.org/10.1016/j.cireng.2013.02.010
Silva, R.M., Santos, R.L., Almeida, T.A., Pardo, T.A.: Towards automatically filtering fake news in Portuguese. Expert Syst. Appl. 146, 113199 (2020). https://doi.org/10.1016/j.eswa.2020.113199
Article Google Scholar
Villar-Rodríguez, G., Souto-Rico, M., Martín, A.: Virality, only the tip of the iceberg: ways of spread and interaction around COVID-19 misinformation in Twitter. Commun. Soc., 239–256 (2022). https://doi.org/10.15581/003.35.2.239-256
Yang, K.C., Niven, T., Kao, H.Y.: Fake news detection as natural language inference. In: WSDM ‘19 Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining 2019 arXiv preprint arXiv:1907.07347 (2019). https://doi.org/10.48550/arXiv.1907.07347

Download references

Author information

Authors and Affiliations

Department of Computer System Engineering, Universidad Politécnica de Madrid, Calle de Alan Turing, 28031, Madrid, Spain
Arsenii Tretiakov, Alejandro Martín & David Camacho

Authors

Arsenii Tretiakov
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Martín
View author publications
You can also search for this author in PubMed Google Scholar
David Camacho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arsenii Tretiakov .

Editor information

Editors and Affiliations

University of Manchester, Manchester, UK
Hujun Yin
Technical University of Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tretiakov, A., Martín, A., Camacho, D. (2022). Detection of False Information in Spanish Using Machine Learning Techniques. In: Yin, H., Camacho, D., Tino, P. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2022. IDEAL 2022. Lecture Notes in Computer Science, vol 13756. Springer, Cham. https://doi.org/10.1007/978-3-031-21753-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-21753-1_5
Published: 21 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21752-4
Online ISBN: 978-3-031-21753-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics