Keywords

1 Introduction

Social networks have been the battlefield of users for many years. Natural language reveals values, perspectives, and emotions. Among all types of hate and abusive language, harassment tweets have been very dominant on social media platforms such as Twitter and Facebook. The Canadian Human Rights CommissionFootnote 1 defines harassment as a form of discrimination which includes any unwanted physical or verbal behavior that offends or humiliates someone.

Harassment can also be a way to silence the speech of others, especially women [8]. The extensive debate about the use of social media has allowed identifying that hate language is a catalyst for discrimination and social segregationFootnote 2. Thus, the concepts of sexism and harassment are very related.

There are many definitions of sexism. For example, sexism is a form of discrimination against women [9]. However, sexism is not just about discrimination, and what is happening on social networks is far from this definition. This observation drives the need to conceptualize sexism and harassment on social media. The categorization of sexism in social media into hostile or benevolent sexism has changed over the years, giving way to a more specific vision in terms of the types of harassments.

Waseem and Hovy [18] collected hateful tweets, categorizing them into sexism, racism, or neither. Later, Jha and Mamidi [9] focused on sexist tweets and proposed two categories of hostile and benevolent sexism. These categories mutated to a finer granularity in [15]. That work proposed a distinct category of sexism, including indirect harassment, information threat, sexual harassment, and physical harassment. The work proposes a more comprehensive and in-depth categorization of online harassment in social media. From that work, due to the significant problem to apply automatic methods to strongly unbalanced data, techniques such as text augmentation and text generation [13] have been applied to achieve performance improvements.

In this paper, we focus on the categories proposed in [15]. Our approach applies self-attention models for harassment classification in order to combine different baselines outputs with a BERT-based representation of each tweet [5]. To accomplish this task, we use the transformer [16], a successful deep learning architecture used for translation in natural language processing. The transformer can detect which part of the data ingested is useful to solve a given task. As a consequence, the encoding learned by the transformer can consistently produce a better prediction of the harassment label than the ones provided by the baselines. Experiments on the proposed dataset show that our proposal reaches a macro-averaged F-score of 0.481 with an accuracy of 0.764.

This work is organized as follows. In Sect. 2 we present a literature review. In Sect. 3 we introduce our proposal. Sections 4 and 5 present the experimental configurations with the results. We conclude in Sect. 6 providing remarks and outlining future work.

2 Related Work

The massiveness with which users interact on social networks has driven new analytical tasks. Among them, the detection of hate speech in social media has captured the interest of the scientific community. Due to the massive volume of social media data, the need for automatic hate speech detection methods has become increasingly urgent.

Several works have approached the problem from a classic machine learning perspective [3, 4, 17]. These jobs generally combine features extracted from messages with features retrieved from user profiles, using a feature-engineering strategy. Combining both sources of information, several of these methods train supervised learning algorithms like support vector machines or random forests. A limitation of many of these works is that they are sensitive to the imbalance of labeled data. In practice, many of these methods fail to generalize well to other datasets, which limits their use in real environments. A thorough review of these types of methods is addressed in [12].

More sophisticated models, such as those studied in deep learning, have also been applied to the problem of hate speech detection. One of the advantages of deep learning architectures is that they allow the neural network to learn an adequate representation for the problem. The use of text encoders has offered advantages to these types of models over conventional machine learning models. For example, convolutional networks [7] have shown good results in the Wasem and Hovy dataset [18]. Recurrent neural networks have also shown good results in this dataset, based on the GRU architecture [20]. Nearly perfect results in this dataset were also reported using deep learning by Badjatiya et al. [2]. Unfortunately, many of these models have overfitting problems, and then, they are not transferable to production. Recently, Arango et al. [1] showed that there are also problems in the generation of these datasets considered as standard for the evaluation of this type of tasks. Among these problems, the most worrying is the population bias used to generate the samples that make up the dataset. These works show that the hate speech detection problem is far from being solved.

A significant problem that these datasets have is the imbalance between classes. The hate speech detection must be carried out in scenarios where most of the conversations are mostly neutral, and the harassment is exceptional. However, not being exceptional is less critical. The consequences that harassment and hate language produce on social network users is fierce. To address the problem of imbalance, in [13], the authors use techniques to increase and generate texts that allow generating training data with balanced classes. In this same line, Sharifirad et al. [15] show that a promising way to address the problem is to define a finer type of harassment. Based on this latest work, the Simah challenge defines a dataset with three types of harassment, which will be addressed by our work.

Hate speech has many variants and has been at the center of attention of many researchers in recent years. Recently, the relationship between hate speech and mood detection has shown to be a promising way of research [14], which would allow linking two tasks that apparently might seem unconnected, sentiment analysis, and hate speech detection. The advances shown in the consolidation of hate speech lexicons have also been impressive [4], which would allow the flourishing of unsupervised techniques to address this task such as graph-based techniques [10].

Far from showing itself as a task with mature and robust solutions, this task shows many challenges. For more details on all hate speech detection variants, the reader is recommended to review the Fortuna and Nunes survey [6].

3 Proposal

The SIMAH competition defines two sub-tasks. The first task is a binary classification to separate online harassment tweets versus non-harassment tweets. The second task is a multi-class classification of online harassment tweets into indirect harassment, sexual harassment, and physical harassment. Our proposal jointly faces both tasks without the need to have a phase for each sub-task. The trained system employs an adaptation of the transformer [16] in order to exploit its self-attention modules.

3.1 Applying Self-attention

For a given tweet, each baseline provides a label from the set of possible tags, i.e., Non-Harassment, Indirect Harassment, Sexual Harassment, or Physical Harassment (Non-H, IH, SH, PH). To ingest these outputs into the transformer, we encode each label using a one-hot encoding vector of the class, producing orthogonal vectors for different categories. To ingest the text of the tweet, we use its BERT vector (Bidirectional Encoder Representations from Transformers) [5] which are computed at sentence-level. BERT vectors are provided by Google research in a library as a serviceFootnote 3. Our model uses four baselines. We concatenate these four vectors with the BERT vector of the tweet, which is ingested into the transformer. We use the encoder of the transformer using two layers, each one with four attention heads. Each layer has an attention module and a position-wise feed-forward layer. The position-wise layer is a crucial module that allows to code from which baseline the data is encoded. The outputs of the encoders are concatenated, and then by applying a Hadamard product between them, we obtain a state vector that represents what the transformer learned from the baselines and the tweet. Then, the vector is ingested into a softmax layer, who is in charge of producing an output. The model is depicted in Fig. 1.

Fig. 1.
figure 1

To apply self-attention, each baseline is encoded and concatenated with the BERT vector of the tweet. This encoding is ingested into the transformer using two levels of encodings, each one with a self-attention module and four attention heads. At the output of the encoder, we apply a Hadamard product. The resulting vector is ingested in a softmax layer producing the output.

Despite the original architecture [16] was proposed as a sequence transduction model based on an encoder-decoder structure, we only use the encoder of the transformer with its attention mechanisms. Hence, the transformer uses stacked self-attention and fully connected layers. The encoder used is composed of a stack of two layers. The transformer uses a residual connection between each module and a normalization. The links inside the transformer are produced by inputs and outputs of the same dimension. The attention mechanism of the transformer is wired using a scaled dot-product operator. Then, multi-head attention consists of several attention layers running in parallel. After the attention module, a position-wise feedforward module is applied to each position, consisting of two linear transformations with ReLU activations in between. The output sequence produced by the encoder gives five vectors, one for each input ingested into the encoder, which are then combined using the Hadamard product.

3.2 Baselines

One of the baseline models is based on convolutional neural networks (CNN) and the other three on recurrent neural networks (RNN). One RNN used one recurrent layer while the others used two layers (as the CNN). Note that in the case of RNNs, we used GRU layers. For the RNN baselines, the output was produced using a softmax and focal loss as loss function while for CNN, we used categorical cross-entropy. Table 1 shows the parameters of each architecture.

4 Experiments

4.1 Data

The dataset provided for this challenge contains 10622 annotated tweets, split into training, validation, and testing partitions, as it is shown in Table 2. The competition has two related tasks: the first one is a binary classification (harassment or non-harassment tweet) and the second task is a multi-class classification of online harassment tweets into three categories: indirect harassment, sexual harassment, and physical harassment.

Table 1. Architecture of our baseline models. CNN1_cce: each convolutional layer is followed by a max_pool (stride = 3) and a batch normalization layer
Table 2. Distribution of tweets into training, validation, and testing data partitions across each class. *Two tweets were labeled as harassment posts but they were not classified into any valid category for the second task.

As social media data sources are unstructured and noisy, we need to do some transforms of the irregular input text. Accordingly, we considered stopwords removal, punctuation marks, digits removal, and text transform to lowercase. Furthermore, it is worth mentioning that we leave important question marks and exclamation marks since have proven to be helpful [19]. To process jargon, we removed emojis. In addition, HTML marks were replaced by the term <url>, \(\#\)word with the term <hashtag>, @word terms by the term <user>, and numerical terms with <number>.

4.2 Traning

Baselines. Once each tweet was preprocessed, we used GloVe [11] word embeddings (pre-trained on a Twitter corpus) to represent each word in each tweet. These 100 dimension vectors were used in the four baselines and were ingested one-at-a-time as a sequence of word vectors per tweet.

Transformer. Once the baselines’ outputs were computed, we encoded each output using class vectors with 768 dimensions, to be consistent with the dimensionality of BERT. Each tweet was encoded using BERT as serviceFootnote 4, a library that maps a variable length-sentence to a fixed-length vector with 768 dimensions. In the transformer, we used gradient descent for parameter update. The size of the hidden units was set to 256 with a dropout of 0.3. We varied the learning rate throughout training for 100 epochs, according to recommendations provided by the transformer’s authors using a warmup of 500 and a factor of 3. We used focal loss with class weights inversely proportional to each class size as a loss function.

5 Results and Discussion

The performance of our model on the validation and testing sets is shown in Tables 3 and 4. Together with the accuracy, in Table 3, we show macro-averaged F1-score and per-class macro-averaged F1-scores as these metrics account for the class imbalance. In this way, it is easy to appreciate that accuracy accuses values that could be considered quite good considering the number of classes. However, when observing the F-score, we note that the metric is much lower than expected. The imbalance between classes explains this fact. Minority classes are the most complicated classes to detect. This fact explains why theses classes reach an F1-score of about 20% (16.7% for indirect harassment tweets).

Table 3. Results on the development and testing sets. Accuracy and F1-scores: macro-averaged and per class
Table 4. Results on the development and testing sets. Precision and Recall macro-averaged and per class

Table 4 shows the accuracy and recall per class (Non-H: Non-Harassment, IH: Indirect Harassment, SH: Sexual Harassment, PH: Physical Harassment) as well as the macro-averaged metrics. Indeed, the precision and recall metrics are comparable between the validation and testing partitions. However, IH and PH classes have a low recall, although they have good enough precision, given the complexity of the task. This fact indicates that, of the total of examples classified as indirect or physical harassment tweets, an acceptable portion is correctly labeled, but the amount of recovered examples is insignificant. In other words, the predictions for both classes are poorly contaminated but somewhat incomplete, especially for the IH class with only 13.7%. This fact occurs due to the difference in the distribution of the examples in the training partition versus the evaluation partitions, which made it hard to recognize and extract patterns correctly.

6 Conclusions

We have presented a method based on the transformer architecture for harassment detection and classification. Experimental results show that our model can detect a substantial proportion of the hardest classes of this challenging task. Our architecture achieves a macro-averaged F1-score of 0.481 in the Simah competition dataset.

We are currently extending this work to improve its performance. One change we are making is to replace the one-hot encoders of the baselines with their confidence vectors. Another promising line is to use data augmentation techniques to handle the imbalance in minority classes. The use of SMOTE techniques is promising in this line of work.