Abstract
This research focuses on the detection of false claims in Spanish through the use of machine learning techniques. Most of the current work related to automated, or semi-automated, fake news detections are carried out for the English language, however, there is still a large room for improvement in other languages such as Spanish. The detection of fake news content and its dissemination (spread) in online platforms is an open and hard problem, this work is focused on the detection of false and misleading information spreading during the election campaign in the Spanish Parliament and Catalonia crisis in 2019, migration crisis, COVID-19 pandemic, and hate speech against minorities. We propose the use of a machine learning model adapted for dealing with human language understanding tasks, called BERT, which has been trained and experimentally tested. We have collected a corpus of different types of false information and claims such as articles, posts on Facebook, WhatsApp’s messages, tweets, and others. The results evidence how usage of machine learning techniques can help in the identification of false statements with more than 88% accuracy, and in collecting samples of false information. The experiments, with a comparison between different machine learning methods, have also been carried out using previous datasets, providing a comparison between different approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aghakhani, H., Machiry, A., Nilizadeh, S., Kruegel, C., Vigna, G.: Detecting deceptive reviews using generative adversarial networks. In: IEEE Security and Privacy Workshops (SPW), pp. 89–95. IEEE (2018). https://doi.org/10.1109/spw.2018.00022
Alarcon, N.: Fabula AI develops a new algorithm to stop sake sews. Nvidia Developer. News Center. https://developer.nvidia.com/blog/fabula-ai-develops-a-new-algorithm-to-stop-fake-news/. Accessed 15 Apr 2022
Almatarneh, S., Gamallo, P., Pena, F.J.R.: CiTIUS-COLE at semeval-2019 task 5: combining linguistic features to identify hate speech against immigrants and women on multilingual tweets. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 387–390. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019). https://doi.org/10.18653/v1/s19-2068
Aragón, M.E., Jarquín-Vásquez, H.J., Montes-y-Gómez, M., Escalante, H.J., Pineda, L.V., Gómez-Adorno, H., et al.: Overview of MEX-A3T at IberLEF 2020: fake news and aggressiveness analysis in Mexican Spanish. In: IberLEF@ SEPLN, pp. 222–235 (2020). https://doi.org/10.29057/mjmr.v8i16.3926
Baly, R., Karadzhov, G., Alexandrov, D., Glass, J., Nakov, P.: Predicting factuality of reporting and bias of news media sources. arXiv preprint arXiv:1810.01765 (2018). https://doi.org/10.48550/arXiv.1810.01765
Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O., Kompatsiaris, Y.: Detection and visualization of misleading content on Twitter. Int. J. Multimed. Inf. Retrieval 7(1), 71–86 (2017). https://doi.org/10.1007/s13735-017-0143-x
Canete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-trained bert model and evaluation data. Pml4dc at iclr, pp. 1–10 (2020)
Cano-Orón, L., Calvo, D., García, G.L., Baviera, T.: Disinformation in Facebook ads in the 2019 Spanish general election campaigns. Media Commun. 9(1), 217–228 (2021). doi: https://doi.org/10.1080/13216597.2019.1634619
Della Vedova, M.L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M., de Alfaro, L.: Automatic online fake news detection combining content and social signals. In: 2018 22nd Conference of Open Innovations Association (FRUCT), pp. 272–279. IEEE. Jyvaskyla, Finland (2018). https://doi.org/10.23919/fruct.2018.8468301
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). https://doi.org/10.48550/arXiv.1810.04805
Dukic, D., Kržic, A.S.: Detection of hate speech spreaders with BERT. CLEF (Working Notes) (2021)
Elías, C., Catalan-Matamoros, D.: Coronavirus in Spain: Fear of “official” fake news boosts WhatsApp and alternative sources. Media Commun. 8(2), 462–466 (2020). https://doi.org/10.17645/mac.v8i2.3217
Flores Vivar, J.M.: Artificial intelligence and journalism: diluting the impact of misinformation and fakes news through bots. Doxa. Comunicación nº 029, 197–212 (2019)
Gómez-Adorno, H., Posadas-Durán, J.P., Enguix, G.B., Capetillo, C.P.: Overview of FakeDeS at IberLEF 2021: fake news detection in Spanish shared task. Procesamiento del Lenguaje Natural 67, 223–231 (2021). https://doi.org/10.26342/2021-67-19
Han, E.-H., Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 53–65. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45357-1_9
Hassan, N., Zhang, G., Arslan, F., Caraballo, J., Jimenez, D., Gawsane, S., et al.: ClaimBuster: the first-ever end-to-end fact-checking system. Proc. VLDB Endowm. 10(12), 1945–1948 (2017). https://doi.org/10.14778/3137765.3137815
Huertas-García, Á., Huertas-Tato, J., Martín, A., Camacho, D.: Countering misinformation through semantic-aware multilingual models. In: Yin, H., et al. (eds.) IDEAL 2021. LNCS, vol. 13113, pp. 312–323. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91608-4_31
Huertas-García, Á., Martín, A., Huertas-Tato, J., Camacho, D.: Exploring Dimensionality Reduction Techniques in Multilingual Transformers. arXiv preprint arXiv:2204.08415 (2022). https://doi.org/10.48550/arXiv.2204.08415
Huertas-Tato, J., Martin, A., Camacho, D. BERTuit: understanding Spanish language in Twitter through a native transformer. arXiv preprint arXiv:2204.03465 (2022). https://doi.org/10.48550/arXiv.2204.03465
Huertas-Tato, J., Martín, A., Camacho, D.: SILT: Efficient transformer training for inter-lingual inference. Expert Syst. Appl. 200, 116923 (2022). https://doi.org/10.1016/j.eswa.2022.116923
Internet world stats. INTERNET WORLD USERS BY LANGUAGE. Top 10 Languages. https://www.internetworldstats.com/stats7.htm. Accessed 02 Apr 2022
Iqbal, M.: WhatsApp Revenue and Usage Statistics (2022). Business of Apps. https://www.businessofapps.com/data/whatsapp-statistics/. Accessed 30 June 2022
Isa, D., Lee, L.H., Kallimani, V.P., Rajkumar, R.: Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Trans. Knowl. Data Eng. 20(9), 1264–1272 (2008). https://doi.org/10.1109/tkde.2008.76
Kosslyn, J., Yu, C.: Fact check now available in google search and news around the world. Google Blog. https://blog.google/products/search/fact-check-now-available-google-search-and-news-around-world/. Accessed 03 Apr 2022
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2267–2273 (2015)
Lozhnikov, N., Derczynski, L., Mazzara, M.: Stance prediction for Russian: data and analysis. In: Ciancarini, P., Mazzara, M., Messina, A., Sillitti, A., Succi, G. (eds.) SEDA 2018. AISC, vol. 925, pp. 176–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-14687-0_16
Magallón-Rosa, R. Nuevos formatos de verificación. El caso de Maldito Bulo en Twitter. Sphera publica 1(18), 41–65 (2018). https://doi.org/10.6084/m9.figshare.6142808
Magallón-Rosa, R.: Desinformación en campaña electoral. Telos. https://telos.fundaciontelefonica.com/desinformacion-en-campana-electoral/. Accessed 25 Mar 2022
Martín, A., Huertas-Tato, J., Huertas-García, Á., Villar-Rodríguez, G., Camacho, D.: FacTeR-Check: Semi-automated fact-checking through Semantic Similarity and Natural Language Inference. arXiv preprint arXiv:2110.14532 (2021). https://doi.org/10.48550/arXiv.2110.14532
Mozafari, M., Farahbakhsh, R., Crespi, N.: Hate speech detection and racial bias mitigation in social media based on BERT model. PLoS ONE 15(8), e0237861 (2020). https://doi.org/10.1371/journal.pone.0237861
Paschen, J.: Investigating the emotional appeal of fake news using artificial intelligence and human contributions. J. Prod. Brand Manage. 29(2), 223–233 (2019). https://doi.org/10.1108/JPBM-12-2018-2179
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. arXiv preprint arXiv:1708.07104 (2017)
Posadas-Durán, J. P., Gómez-Adorno, H., Sidorov, G., Escobar, J. J. M.: Detection of fake news in a new corpus for the Spanish language. J. Intell. Fuzzy Syst. 36(5), 4869–4876 (2019). https://doi.org/10.48550/arXiv.1708.07104
Qayyum, A., Qadir, J., Janjua, M.U., Sher, F.: Using blockchain to rein in the new post-truth world and check the spread of fake news. IT Professional 21(4), 16–24 (2019). https://doi.org/10.1109/mitp.2019.2910503
Rama, J., Cordero, G., Zagórski, P.: Three Is a Crowd? Podemos, Ciudadanos, and Vox: The End of Bipartisanship in Spain. Front. Political Sci. 95 (2021). https://doi.org/10.3389/fpos.2021.688130
Reyes, J., Palafox, L.: Detection of fake news based on readability. RIIAA 2019 Conference Submission (2019). https://openreview.net/forum?id=ByxTOnokxr. Accessed 30 May 2022
San Norberto, E.M., Gómez-Alonso, D., Trigueros, J.M., Quiroga, J., Gualis, J., Vaquero, C.: Readability of surgical informed consent in Spain. Cirugía Española (English Edition) 92(3), 201–207 (2014). doi:https://doi.org/10.1016/j.cireng.2013.02.010
Silva, R.M., Santos, R.L., Almeida, T.A., Pardo, T.A.: Towards automatically filtering fake news in Portuguese. Expert Syst. Appl. 146, 113199 (2020). https://doi.org/10.1016/j.eswa.2020.113199
Villar-Rodríguez, G., Souto-Rico, M., Martín, A.: Virality, only the tip of the iceberg: ways of spread and interaction around COVID-19 misinformation in Twitter. Commun. Soc., 239–256 (2022). https://doi.org/10.15581/003.35.2.239-256
Yang, K.C., Niven, T., Kao, H.Y.: Fake news detection as natural language inference. In: WSDM ‘19 Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining 2019 arXiv preprint arXiv:1907.07347 (2019). https://doi.org/10.48550/arXiv.1907.07347
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tretiakov, A., Martín, A., Camacho, D. (2022). Detection of False Information in Spanish Using Machine Learning Techniques. In: Yin, H., Camacho, D., Tino, P. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2022. IDEAL 2022. Lecture Notes in Computer Science, vol 13756. Springer, Cham. https://doi.org/10.1007/978-3-031-21753-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-21753-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21752-4
Online ISBN: 978-3-031-21753-1
eBook Packages: Computer ScienceComputer Science (R0)