Using Machine Learning to Detect the Signs of Radicalization and Hate Speech on Twitter

Kuchczyński, Marcin; Pawlicka, Aleksandra; Pawlicki, Marek; Choraś, Michał

doi:10.1007/978-3-030-81523-3_21

Using Machine Learning to Detect the Signs of Radicalization and Hate Speech on Twitter

Marcin Kuchczyński¹⁵,
Aleksandra Pawlicka¹⁶,
Marek Pawlicki¹⁵ &
…
Michał Choraś¹⁵

Conference paper
First Online: 18 August 2021

458 Accesses

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 255))

Abstract

The paper deals with the issue of hate speech and radicalization. Oftentimes, they are spread by means of social media. Twitter lets one express their views in a relatively anonymous way; however, it seems to be a simple, yet effective tool for disseminating offensive or radical contents, too. The paper proposes an effective solution which applies machine learning for detecting signs of radicalization and hate speech in Twitter posts. The authors decided to use the Polish language, which due to the level of its complexity is known to pose a challenge for automated sentiment analysis. The authors also needed to create their own dataset of posts containing hate speech, as prior to the experiment, there existed no such datasets in the language. In the paper, the underlying technologies are first presented, then the course of experiment is described and the final conclusions are given thereafter.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alshalan, R., Al-Khalifa, H.: A deep learning approach for automatic hate speech detection in the Saudi Twittersphere. Appl. Sci. 10(23), 8614 (2020). https://doi.org/10.3390/app10238614
Article Google Scholar
Article 19: UN HRC maintains consensus on Internet resolution (2018). https://tinyurl.com/tp3p7pu3
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Berger, J., Morgan, J.: The ISIS Twitter census defining and describing the population of ISIS supporters on Twitter. Technical report, The Brookings Project on U.S. Relations with the Islamic World, Washington (2015)
Google Scholar
Bloomfield, E.F., Tillery, D.: The circulation of climate change denial online: rhetorical and networking strategies on Facebook. Environ. Commun. 13(1), 23–34 (2019). https://doi.org/10.1080/17524032.2018.1527378
Article Google Scholar
Bobriakov, I.: Sentiment analysis with naive bayes and LSTM. Data Science Central (2020). https://tinyurl.com/5mdzkf4h
Bradshaw, S., Howard, P.N.: The global disinformation order 2019 global inventory of organised social media manipulation. Technical report, Computational Propaganda Research Project (2019). https://tinyurl.com/mz9nf5j8
Choraś, M., et al.: Advanced machine learning techniques for fake news (online disinformation) detection: a systematic mapping study. Appl. Soft Comput. 101, 107050 (2020)
Google Scholar
De Souza, G.A., Da Costa-Abreu, M.: Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata. In: 2020 IJCNN, pp. 1–6. IEEE (2020). https://doi.org/10.1109/IJCNN48605.2020.9207652
Fauzi, M.A.: Word2Vec model for sentiment analysis of product reviews in Indonesian language. In. J. Electr. Comput. Eng. (IJECE) 9(1), 525 (2019). https://doi.org/10.11591/ijece.v9i1.pp525-530
Article Google Scholar
Fbi: How Do Violent Extremists Make Contact? (2021). https://www.fbi.gov/cve508/teen-website/how
Gaydhani, A., Doma, V., Kendre, S., Bhagwat, L.: Detecting hate speech and offensive language on Twitter using machine learning: an N-gram and TFIDF based approach (2018)
Google Scholar
Internet World Stats: Internet Usage Statistics; The Internet Big Picture; World Internet Users and 2021 Population Stats (2021). https://www.internetworldstats.com/stats.htm
Jacobo, J.: This is what Trump told supporters before many stormed Capitol Hill. ABC News (2021). https://tinyurl.com/w5aaar5c
Jang, B., Kim, I., Kim, J.W.: Word2vec convolutional neural networks for classification of news articles and tweets. PLOS One 14(8), e0220,976 (2019). https://doi.org/10.1371/journal.pone.0220976
Article Google Scholar
Khattak, F.K., Jeblee, S., Pou-Prom, C., Abdalla, M., Meaney, C., Rudzicz, F.: A survey of word embeddings for clinical text. J. Biomed. Inf. X 4, 100,057 (2019). https://doi.org/10.1016/j.yjbinx.2019.100057
Article Google Scholar
Kula, S., Choraś, M., Kozik, R.: Application of the BERT-based architecture in fake news detection. In: Conference on Complex, Intelligent, and Software Intensive Systems, pp. 239–249. Springer (2020)
Google Scholar
Lewis, R.: Alternative influence; Broadcasting the reactionary right on YouTube. Data & Society (2018). https://tinyurl.com/4pys8w93
Liu, B.: Sentiment analysis and subjectivity. Handb. Nat. Lang. Process. 2(2010), 627–666 (2010)
Google Scholar
Lyons, D.: The 6 hardest languages For English speakers to learn. Babbel Magazine (2021). https://tinyurl.com/drb83774
Ma, L., Zhang, Y.: Using Word2Vec to process big text data. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2895–2897. IEEE (2015). https://doi.org/10.1109/BigData.2015.7364114
McDonald, S., Ramscar, M.: Testing the distributioanl hypothesis: the influence of context on judgements of semantic similarity. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 23 (2001)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). http://arxiv.org/abs/1301.3781
Mussiraliyeva, S., Bolatbek, M., Omarov, B., Medetbek, Z., Baispay, G., Ospanov, R.: On detecting online radicalization and extremism using natural language processing. In: 2020 21st International Arab Conference on Information Technology (ACIT), pp. 1–5. IEEE (2020). https://doi.org/10.1109/ACIT50332.2020.9300086
Nugroho, K., et al.: Improving random forest method to detect hatespeech and offensive word. In: 2019 ICOIACT, pp. 514–518. IEEE (2019). https://doi.org/10.1109/ICOIACT46704.2019.8938451
Pereira-Kohatsu, J.C., Quijano-Sánchez, L., Liberatore, F., Camacho-Collados, M.: Detecting and monitoring hate speech in Twitter. Sensors 19(21), 4654 (2019). https://doi.org/10.3390/s19214654
Article Google Scholar
Pitsilis, G.K., Ramampiaro, H., Langseth, H.: Effective hate-speech detection in Twitter data using recurrent neural networks. Appl. Intell. 48(12), 4730–4742 (2018). https://doi.org/10.1007/s10489-018-1242-y
Article Google Scholar
Ran: Extremists’ Use of Video Gaming - Strategies and Narratives (2020)
Google Scholar
Staudemeyer, R.C., Morris, E.R.: Understanding LSTM - a tutorial into long short-term memory recurrent neural networks (2019)
Google Scholar
The Washington Post: How rumors on WhatsApp led to a mob killing in India | The Fact Checker. The Washington Post (2020)
Google Scholar
United Nations Organization: United Nations Strategy and Plan of Action on Hate Speech (2020)
Google Scholar
Westerlund, M.: The emergence of deepfake technology: a review. Technol. Innov. Manage. Rev. 9(11), 39–52 (2019). https://doi.org/10.22215/timreview/1282
Article Google Scholar

Download references

Author information

Authors and Affiliations

UTP University of Science and Technology, Bydgoszcz, Poland
Marcin Kuchczyński, Marek Pawlicki & Michał Choraś
ITTI Sp. z o.o., Poznań, Poland
Aleksandra Pawlicka

Authors

Marcin Kuchczyński
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandra Pawlicka
View author publications
You can also search for this author in PubMed Google Scholar
Marek Pawlicki
View author publications
You can also search for this author in PubMed Google Scholar
Michał Choraś
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aleksandra Pawlicka .

Editor information

Editors and Affiliations

Institute of Computer Science and Telecommunications, University of Science and Technlogy, Bydgoszcz, Poland
Michal Choraś
Institute of Computer Science and Telecommunications, University of Science and Technology, Bydgoszcz, Poland
Ryszard S. Choraś
Faculty of Electronics, Wroclaw University of Science and Technology, Wrocław, Poland
Marek Kurzyński
Faculty of Electronics, Wroclaw University of Science and Technology, Wrocław, Poland
Paweł Trajdos
West Pomeranian University of Technology in Szczecin, Szczecin, Poland
Jerzy Pejaś
West Pomeranian University of Technology in Szczecin, Szczecin, Poland
Tomasz Hyla

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kuchczyński, M., Pawlicka, A., Pawlicki, M., Choraś, M. (2022). Using Machine Learning to Detect the Signs of Radicalization and Hate Speech on Twitter. In: Choraś, M., Choraś, R.S., Kurzyński, M., Trajdos, P., Pejaś, J., Hyla, T. (eds) Progress in Image Processing, Pattern Recognition and Communication Systems. CORES IP&C ACS 2021 2021 2021. Lecture Notes in Networks and Systems, vol 255. Springer, Cham. https://doi.org/10.1007/978-3-030-81523-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-81523-3_21
Published: 18 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81522-6
Online ISBN: 978-3-030-81523-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics