Fake News Detection via English-to-Spanish Translation: Is It Really Useful?

Ruíz, Sebastián; Providel, Eliana; Mendoza, Marcelo

doi:10.1007/978-3-030-77626-8_9

Sebastián Ruíz⁹,
Eliana Providel^9,10 &
Marcelo Mendoza¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12774))

Included in the following conference series:

International Conference on Human-Computer Interaction

2459 Accesses
1 Citations

Abstract

Social networks are used every day to report daily events, although the information published in them many times correspond to fake news. Detecting these fake news has become a research topic that can be approached using deep learning. However, most of the current research on the topic is available only for the English language. When working on fake news detection in other languages, such as Spanish, one of the barriers is the low quantity of labeled datasets available in Spanish. Hence, we explore if it is convenient to translate an English dataset to Spanish using Statistical Machine Translation. We use the translated dataset to evaluate the accuracy of several deep learning architectures and compare the results from the translated dataset and the original dataset in fake news classification. Our results suggest that the approach is feasible, although it requires high-quality translation techniques, such as those found in the translation’s neural-based models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Ajao, O., Bhowmik, D., Zargari, S.: Fake news identification on Twitter with hybrid CNN and RNN models. In: Proceedings of the 9th International Conference on Social Media and Society, SMSociety 2018, pp. 226–230 (2018)
Google Scholar
Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31, 211–36 (2017)
Article Google Scholar
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O., Kompatsiaris, Y.: Detection and visualization of misleading content on Twitter. Int. J. Multimedia Inf. Retrieval 7(1), 71–86 (2017). https://doi.org/10.1007/s13735-017-0143-x
Article Google Scholar
Caled, D., Silva, M.: FTR-18: Collecting rumours on football transfer news. In: Conference on Information and Knowledge Management Workshops, CIKM, vol. 2482. CEUR-WS (2019)
Google Scholar
Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: Proceedings of the 20th International Conference on World Wide Web, WWW, Hyderabad, India, pp. 675–684 (2011)
Google Scholar
Cañete, J., Chaperon, G., Fuentes, R., Ho, J.-H., Kang, H., Pérez, J.: Spanish pre-trained BERT model and evaluation data. In: PML4DC at ICLR 2020 (2020)
Google Scholar
Costa-jussà, M.R., Zampieri, M., Pal, S.: A neural approach to language variety translation. In: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, pp. 275–282. Association for Computational Linguistics (2018)
Google Scholar
Deepak, S., Bhadrachalam, C.: Deep neural approach to fake-news identification. Procedia Comput. Sci. 167, 2236–2243 (2020)
Article Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, USA, vol. 1. (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Ferrara, E.: Manipulation and abuse on social media. ACM SIGWEB Newsletter, pp. 1–9 (2015)
Google Scholar
Jehl, L.: Machine Translation for Twitter. Master’s thesis, University of Edinburgh (2010)
Google Scholar
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL, System Demonstrations, pp. 67–72 (2017)
Google Scholar
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 177–180 (2007)
Google Scholar
Kwon, S., Cha, M., Jung, K.: Rumor detection over varying time windows. PLOS One 12, e0168344 (2017)
Article Google Scholar
Liu, Y.: Early detection of fake news on social media. PhD thesis, New Jersey Institute of Technology (2019)
Google Scholar
Lohar, P., Popović, M., Way, A.: Building English-to-Serbian machine translation system for IMDb movie reviews. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pp. 105–113 (2019)
Google Scholar
Ma, J., et al.: Detecting rumors from microblogs with recurrent neural networks. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI2016, pp. 3818–3824 (2016)
Google Scholar
Ma, J., Gao, W., Wong, K.-F.: Detect rumors in microblog posts using propagation structure via kernel learning. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 708–717, (2017)
Google Scholar
Ma, J., Gao, W., Wong, K.-F.: Rumor detection on Twitter with tree-structured recursive neural networks. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1980–1989 (2018)
Google Scholar
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150 (2011)
Google Scholar
Mendoza, M., Poblete, B., Castillo, C.: Twitter under crisis: can we trust what we RT? In: Proceedings of the 1st Workshop on Social Media Analytics, SOMA, Washington, USA, pp. 71–79 (2010)
Google Scholar
Nouhaila, B., Habib, A., Abdellah, A., Abdelhamid, I.E.F.: Arabic machine translation using bidirectional LSTM encoder-decoder (2018)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Posadas-Durán, J.-P., Gomez-Adorno, H., Sidorov, G., Escobar, J.: Detection of fake news in a new corpus for the Spanish language. J. Intell. Fuzzy Syst. 36(5), 4868–4876 (2019)
Google Scholar
Pourebrahim, N., Sultana, S., Edwards, J., Gochanour, A., Mohanty, S.: Understanding communication dynamics on Twitter during natural disasters: a case study of hurricane sandy. Int. J. Disaster Risk Reduct. 37, 101176 (2019)
Article Google Scholar
Providel, E., Mendoza, M.: Using deep learning to detect rumors in Twitter. In: Meiselwitz, G. (ed.) HCII 2020, Part I. LNCS, vol. 12194, pp. 321–334. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49570-1_22
Chapter Google Scholar
Qazvinian, V., Rosengren, E., Radev, D.R., Mei, Q.: Rumor has it: identifying misinformation in microblogs. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1589–1599 (2011)
Google Scholar
Ramírez, V.: Plebiscito Colombia 2016 (2016). https://data.world/bikthor/plebiscito-colombia-2016
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 851–860 (2010)
Google Scholar
Sen, S., Banik, D., Ekbal, A., Bhattacharyya, P.: IITP English-Hindi machine translation system at WAT 2016. In: Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pp. 216–222, Osaka, Japan (2016)
Google Scholar
Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC2012), pp. 2214–2218 (2012)
Google Scholar
Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC2012) (2012)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Workshop Track Proceedings (2013)
Google Scholar
Vathsala, M., Holi, G.: RNN based machine translation and transliteration for Twitter data. Int. J. Speech Technol. 23, 499–504 (2020)
Article Google Scholar
Wang, Y., et al.: EANN: event adversarial neural networks for multi-modal fake news detection. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, pp. 849–857 (2018)
Google Scholar
Yu, F., Liu, Q., Wu, S., Wang, L., Tan, T.: A convolutional approach for misinformation identification. In: IJCAI2017, pp. 3901–3907 (2017)
Google Scholar
Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., Procter, R.: Detection and resolution of rumours in social media: a survey. ACM Comput. Surv. 51, 1–36 (2018)
Article Google Scholar

Download references

Acknowledgements

Mr. Mendoza acknowledge funding from the Millennium Institute for Foundational Research on Data. Mr. Mendoza was also funded by ANID PIA/APOYO AFB180002 and ANID FONDECYT 1200211.

Author information

Authors and Affiliations

Escuela de Ingeniería Civil Informática, Universidad de Valparaíso, Valparaíso, Chile
Sebastián Ruíz & Eliana Providel
Departamento de Informática, Universidad Técnica Federico Santa María, Santiago, Chile
Eliana Providel & Marcelo Mendoza

Authors

Sebastián Ruíz
View author publications
You can also search for this author in PubMed Google Scholar
Eliana Providel
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Mendoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcelo Mendoza .

Editor information

Editors and Affiliations

Department of Computer Science, Towson University, Towson, MD, USA
Gabriele Meiselwitz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ruíz, S., Providel, E., Mendoza, M. (2021). Fake News Detection via English-to-Spanish Translation: Is It Really Useful?. In: Meiselwitz, G. (eds) Social Computing and Social Media: Experience Design and Social Network Analysis . HCII 2021. Lecture Notes in Computer Science(), vol 12774. Springer, Cham. https://doi.org/10.1007/978-3-030-77626-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-77626-8_9
Published: 03 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77625-1
Online ISBN: 978-3-030-77626-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics