Authors:
Shane Cooke
1
;
Damien Graux
2
and
Soumyabrata Dev
1
Affiliations:
1
ADAPT SFI Research Centre, School of Computer Science, University College Dublin, Ireland
;
2
ADAPT SFI Research Centre, Trinity College Dublin, Ireland
Keyword(s):
Hate Speech Detection, Multi-Platform, Combining Embeddings, Classifiers.
Abstract:
A major issue faced by social media platforms today is the detection, and handling of hateful speech. The intricacies and imperfections of online communication make this a difficult task, and the rapidly changing use of both non-hateful, and hateful language in the online sphere means that researchers must constantly update and modify their hate speech detection methodologies. In this study, we propose an accurate and versatile multi-platform model for the detection of hate speech, using first-hand data scraped from some of the most popular social media platforms, that we share to the community. We explore and optimise 50 different model approaches, and evaluate their performances using several evaluation metrics. Overall, we successfully build a hate speech detection model, pairing the USE word embeddings with the SVC machine learning classifier, to obtain an average accuracy of 95.65% and achieved a maximum accuracy of 96.89%. We also develop and share an application allowing users
to test sentences against a collection of the most accurate hate speech detection models. Our application then returns a aggregated hate speech classification, together with a confidence level, and a breakdown of the methodologies used to produce the final classification for explainability.
(More)