Learning to Detect Online Harassment on Twitter with the Transformer

Bugueño, Margarita; Mendoza, Marcelo

doi:10.1007/978-3-030-43887-6_23

Margarita Bugueño⁸ &
Marcelo Mendoza⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1168))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1576 Accesses
6 Citations

Abstract

This paper describes our submission to the SIMAH challenge (SocIaL Media And Harassment). The proposed competition addresses the challenge of harassment detection on Twitter posts as well as the identification of a harassment category. Automatically detecting content containing harassment could be the basis for removing it. Accordingly, the task is considered to be an essential step to distinguishing different types of harassment provides the means to control such a mechanism in a fine-grained way. In this work, we classify a set of Twitter posts into non-harassment or harassment tweets where the last ones are classified as indirect harassment, sexual harassment, or physical harassment. We explore how to use self-attention models for harassment classification in order to combine different baselines’ outputs. For a given post, we use the transformer architecture to encode each baseline output exploiting relationships between baselines and posts. Then, the transformer learns how to combine the outputs of these methods with a BERT representation of the post, reaching a macro-averaged F-score of 0.481 on the SIMAH test set.

You have full access to this open access chapter, Download conference paper PDF

Attention-Based LSTM Network for Rumor Veracity Estimation of Tweets

Article Open access 12 August 2020

Task Adaptive Pretraining of Transformers for Hostility Detection

Fake News Detection on Twitter

Keywords

1 Introduction

Social networks have been the battlefield of users for many years. Natural language reveals values, perspectives, and emotions. Among all types of hate and abusive language, harassment tweets have been very dominant on social media platforms such as Twitter and Facebook. The Canadian Human Rights Commission^{Footnote 1} defines harassment as a form of discrimination which includes any unwanted physical or verbal behavior that offends or humiliates someone.

Harassment can also be a way to silence the speech of others, especially women [8]. The extensive debate about the use of social media has allowed identifying that hate language is a catalyst for discrimination and social segregation^{Footnote 2}. Thus, the concepts of sexism and harassment are very related.

There are many definitions of sexism. For example, sexism is a form of discrimination against women [9]. However, sexism is not just about discrimination, and what is happening on social networks is far from this definition. This observation drives the need to conceptualize sexism and harassment on social media. The categorization of sexism in social media into hostile or benevolent sexism has changed over the years, giving way to a more specific vision in terms of the types of harassments.

Waseem and Hovy [18] collected hateful tweets, categorizing them into sexism, racism, or neither. Later, Jha and Mamidi [9] focused on sexist tweets and proposed two categories of hostile and benevolent sexism. These categories mutated to a finer granularity in [15]. That work proposed a distinct category of sexism, including indirect harassment, information threat, sexual harassment, and physical harassment. The work proposes a more comprehensive and in-depth categorization of online harassment in social media. From that work, due to the significant problem to apply automatic methods to strongly unbalanced data, techniques such as text augmentation and text generation [13] have been applied to achieve performance improvements.

In this paper, we focus on the categories proposed in [15]. Our approach applies self-attention models for harassment classification in order to combine different baselines outputs with a BERT-based representation of each tweet [5]. To accomplish this task, we use the transformer [16], a successful deep learning architecture used for translation in natural language processing. The transformer can detect which part of the data ingested is useful to solve a given task. As a consequence, the encoding learned by the transformer can consistently produce a better prediction of the harassment label than the ones provided by the baselines. Experiments on the proposed dataset show that our proposal reaches a macro-averaged F-score of 0.481 with an accuracy of 0.764.

This work is organized as follows. In Sect. 2 we present a literature review. In Sect. 3 we introduce our proposal. Sections 4 and 5 present the experimental configurations with the results. We conclude in Sect. 6 providing remarks and outlining future work.

2 Related Work

The massiveness with which users interact on social networks has driven new analytical tasks. Among them, the detection of hate speech in social media has captured the interest of the scientific community. Due to the massive volume of social media data, the need for automatic hate speech detection methods has become increasingly urgent.

Several works have approached the problem from a classic machine learning perspective [3, 4, 17]. These jobs generally combine features extracted from messages with features retrieved from user profiles, using a feature-engineering strategy. Combining both sources of information, several of these methods train supervised learning algorithms like support vector machines or random forests. A limitation of many of these works is that they are sensitive to the imbalance of labeled data. In practice, many of these methods fail to generalize well to other datasets, which limits their use in real environments. A thorough review of these types of methods is addressed in [12].

More sophisticated models, such as those studied in deep learning, have also been applied to the problem of hate speech detection. One of the advantages of deep learning architectures is that they allow the neural network to learn an adequate representation for the problem. The use of text encoders has offered advantages to these types of models over conventional machine learning models. For example, convolutional networks [7] have shown good results in the Wasem and Hovy dataset [18]. Recurrent neural networks have also shown good results in this dataset, based on the GRU architecture [20]. Nearly perfect results in this dataset were also reported using deep learning by Badjatiya et al. [2]. Unfortunately, many of these models have overfitting problems, and then, they are not transferable to production. Recently, Arango et al. [1] showed that there are also problems in the generation of these datasets considered as standard for the evaluation of this type of tasks. Among these problems, the most worrying is the population bias used to generate the samples that make up the dataset. These works show that the hate speech detection problem is far from being solved.

A significant problem that these datasets have is the imbalance between classes. The hate speech detection must be carried out in scenarios where most of the conversations are mostly neutral, and the harassment is exceptional. However, not being exceptional is less critical. The consequences that harassment and hate language produce on social network users is fierce. To address the problem of imbalance, in [13], the authors use techniques to increase and generate texts that allow generating training data with balanced classes. In this same line, Sharifirad et al. [15] show that a promising way to address the problem is to define a finer type of harassment. Based on this latest work, the Simah challenge defines a dataset with three types of harassment, which will be addressed by our work.

Hate speech has many variants and has been at the center of attention of many researchers in recent years. Recently, the relationship between hate speech and mood detection has shown to be a promising way of research [14], which would allow linking two tasks that apparently might seem unconnected, sentiment analysis, and hate speech detection. The advances shown in the consolidation of hate speech lexicons have also been impressive [4], which would allow the flourishing of unsupervised techniques to address this task such as graph-based techniques [10].

Far from showing itself as a task with mature and robust solutions, this task shows many challenges. For more details on all hate speech detection variants, the reader is recommended to review the Fortuna and Nunes survey [6].

3 Proposal

The SIMAH competition defines two sub-tasks. The first task is a binary classification to separate online harassment tweets versus non-harassment tweets. The second task is a multi-class classification of online harassment tweets into indirect harassment, sexual harassment, and physical harassment. Our proposal jointly faces both tasks without the need to have a phase for each sub-task. The trained system employs an adaptation of the transformer [16] in order to exploit its self-attention modules.

3.1 Applying Self-attention

For a given tweet, each baseline provides a label from the set of possible tags, i.e., Non-Harassment, Indirect Harassment, Sexual Harassment, or Physical Harassment (Non-H, IH, SH, PH). To ingest these outputs into the transformer, we encode each label using a one-hot encoding vector of the class, producing orthogonal vectors for different categories. To ingest the text of the tweet, we use its BERT vector (Bidirectional Encoder Representations from Transformers) [5] which are computed at sentence-level. BERT vectors are provided by Google research in a library as a service^{Footnote 3}. Our model uses four baselines. We concatenate these four vectors with the BERT vector of the tweet, which is ingested into the transformer. We use the encoder of the transformer using two layers, each one with four attention heads. Each layer has an attention module and a position-wise feed-forward layer. The position-wise layer is a crucial module that allows to code from which baseline the data is encoded. The outputs of the encoders are concatenated, and then by applying a Hadamard product between them, we obtain a state vector that represents what the transformer learned from the baselines and the tweet. Then, the vector is ingested into a softmax layer, who is in charge of producing an output. The model is depicted in Fig. 1.

Despite the original architecture [16] was proposed as a sequence transduction model based on an encoder-decoder structure, we only use the encoder of the transformer with its attention mechanisms. Hence, the transformer uses stacked self-attention and fully connected layers. The encoder used is composed of a stack of two layers. The transformer uses a residual connection between each module and a normalization. The links inside the transformer are produced by inputs and outputs of the same dimension. The attention mechanism of the transformer is wired using a scaled dot-product operator. Then, multi-head attention consists of several attention layers running in parallel. After the attention module, a position-wise feedforward module is applied to each position, consisting of two linear transformations with ReLU activations in between. The output sequence produced by the encoder gives five vectors, one for each input ingested into the encoder, which are then combined using the Hadamard product.

3.2 Baselines

One of the baseline models is based on convolutional neural networks (CNN) and the other three on recurrent neural networks (RNN). One RNN used one recurrent layer while the others used two layers (as the CNN). Note that in the case of RNNs, we used GRU layers. For the RNN baselines, the output was produced using a softmax and focal loss as loss function while for CNN, we used categorical cross-entropy. Table 1 shows the parameters of each architecture.

4 Experiments

4.1 Data

The dataset provided for this challenge contains 10622 annotated tweets, split into training, validation, and testing partitions, as it is shown in Table 2. The competition has two related tasks: the first one is a binary classification (harassment or non-harassment tweet) and the second task is a multi-class classification of online harassment tweets into three categories: indirect harassment, sexual harassment, and physical harassment.

Table 1. Architecture of our baseline models. CNN1_cce: each convolutional layer is followed by a max_pool (stride = 3) and a batch normalization layer

Full size table

Table 2. Distribution of tweets into training, validation, and testing data partitions across each class. *Two tweets were labeled as harassment posts but they were not classified into any valid category for the second task.

Full size table

As social media data sources are unstructured and noisy, we need to do some transforms of the irregular input text. Accordingly, we considered stopwords removal, punctuation marks, digits removal, and text transform to lowercase. Furthermore, it is worth mentioning that we leave important question marks and exclamation marks since have proven to be helpful [19]. To process jargon, we removed emojis. In addition, HTML marks were replaced by the term <url>, \(\#\)word with the term <hashtag>, @word terms by the term <user>, and numerical terms with <number>.

4.2 Traning

Baselines. Once each tweet was preprocessed, we used GloVe [11] word embeddings (pre-trained on a Twitter corpus) to represent each word in each tweet. These 100 dimension vectors were used in the four baselines and were ingested one-at-a-time as a sequence of word vectors per tweet.

Transformer. Once the baselines’ outputs were computed, we encoded each output using class vectors with 768 dimensions, to be consistent with the dimensionality of BERT. Each tweet was encoded using BERT as service^{Footnote 4}, a library that maps a variable length-sentence to a fixed-length vector with 768 dimensions. In the transformer, we used gradient descent for parameter update. The size of the hidden units was set to 256 with a dropout of 0.3. We varied the learning rate throughout training for 100 epochs, according to recommendations provided by the transformer’s authors using a warmup of 500 and a factor of 3. We used focal loss with class weights inversely proportional to each class size as a loss function.

5 Results and Discussion

The performance of our model on the validation and testing sets is shown in Tables 3 and 4. Together with the accuracy, in Table 3, we show macro-averaged F1-score and per-class macro-averaged F1-scores as these metrics account for the class imbalance. In this way, it is easy to appreciate that accuracy accuses values that could be considered quite good considering the number of classes. However, when observing the F-score, we note that the metric is much lower than expected. The imbalance between classes explains this fact. Minority classes are the most complicated classes to detect. This fact explains why theses classes reach an F1-score of about 20% (16.7% for indirect harassment tweets).

Table 3. Results on the development and testing sets. Accuracy and F1-scores: macro-averaged and per class

Full size table

Table 4. Results on the development and testing sets. Precision and Recall macro-averaged and per class

Full size table

Table 4 shows the accuracy and recall per class (Non-H: Non-Harassment, IH: Indirect Harassment, SH: Sexual Harassment, PH: Physical Harassment) as well as the macro-averaged metrics. Indeed, the precision and recall metrics are comparable between the validation and testing partitions. However, IH and PH classes have a low recall, although they have good enough precision, given the complexity of the task. This fact indicates that, of the total of examples classified as indirect or physical harassment tweets, an acceptable portion is correctly labeled, but the amount of recovered examples is insignificant. In other words, the predictions for both classes are poorly contaminated but somewhat incomplete, especially for the IH class with only 13.7%. This fact occurs due to the difference in the distribution of the examples in the training partition versus the evaluation partitions, which made it hard to recognize and extract patterns correctly.

6 Conclusions

We have presented a method based on the transformer architecture for harassment detection and classification. Experimental results show that our model can detect a substantial proportion of the hardest classes of this challenging task. Our architecture achieves a macro-averaged F1-score of 0.481 in the Simah competition dataset.

We are currently extending this work to improve its performance. One change we are making is to replace the one-hot encoders of the baselines with their confidence vectors. Another promising line is to use data augmentation techniques to handle the imbalance in minority classes. The use of SMOTE techniques is promising in this line of work.

Notes

1.
https://www.chrc-ccdp.gc.ca/eng/content/what-harassment-1.
2.
The Washington Post: http://tiny.cc/sltgcz.
3.
https://github.com/google-research/bert#pre-trained-models.
4.
https://github.com/hanxiao/bert-as-service.

References

Arango, A., Pérez, J., Poblete, B.: Hate speech detection is not as easy as you may think: a closer look at model validation. In: SIGIR 2019, pp. 45–54 (2019)
Google Scholar
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: WWW 2017, pp. 759–760 (2017)
Google Scholar
Chatzakou, D., Kourtellis, N., Blackburn, J., de Cristo-Faro, E., Stringhini, G., Vakali, A.: Mean birds: detecting aggression and bullying on Twitter. In: WebSci 2017, pp. 13–22 (2017)
Google Scholar
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: ICWSM 2017, pp. 512–515 (2017)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51(4), 1–30 (2018). Article no. 85
Article Google Scholar
Gambäck, B., Sikdar, U.K.: Using convolutional neural networks to classify hate-speech. In: 1st Workshop on ACL, pp. 85–90 (2017)
Google Scholar
Hess, A.: Why women aren’t welcome on the internet. Pac. Stand. (2014). https://psmag.com/social-justice/women-arent-welcome-internet-72170
Jha, A., Mamidi, R.: When does a compliment become sexist? Analysis and classification of ambivalent sexism using Twitter data. In: 2nd Workshop on NLP and Computational Social Science, pp. 7–16 (2017)
Google Scholar
Papegnies, E., Labatut, V., Dufour, R., Linarès, G.: Graph-based features for automatic online abuse detection. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 70–81. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68456-7_6
Chapter Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP 2014, pp. 1532–1543 (2014)
Google Scholar
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: 5th Workshop on NLP for Social Media, pp. 1–10 (2017)
Google Scholar
Sharifirad, S., Jafarpour, B., Matwin, S.: Boosting text classification performance on sexist tweets by text augmentation and text generation using a combination of knowledge graphs. In: 2nd Workshop on Abusive Language Online (ALW2), pp. 107–114 (2018)
Google Scholar
Sharifirad, S., Jafarpour, B., Matwin, S.: How is your mood when writing sexist tweets? Detecting the emotion type and intensity of emotion using natural language processing techniques. arXiv preprint arXiv:1902.03089 (2019)
Sharifirad, S., Matwin, S.: When a tweet is actually sexist. A more comprehensive classification of different online harassment categories and the challenges in NLP. arXiv preprint arXiv:1902.10584 (2019)
Vaswani, A., et al.: Attention is all you need. In: NIPS 2017, pp. 6000–6010 (2017)
Google Scholar
Waseem, Z.: Are you a racist or am i seeing things? Annotator influence on hate speech detection on Twitter. In: 1st Workshop on NLP and Computational Social Science, pp. 138–142 (2016)
Google Scholar
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings NAACL, pp. 88–93 (2016)
Google Scholar
Zhao, Z., Resnick, P., Mei, Q.: Enquiring minds: early detection of rumors in social media from enquiry posts. In: WWW 2015, pp. 1395–1405 (2015)
Google Scholar
Zhang, Z., Robinson, D., Tepper, J.: Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 745–760. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_48
Chapter Google Scholar

Download references

Acknowledgements

Authors acknowledge funding from the Millennium Institute for Foundational Research on Data. Mr. Mendoza was partially funded by the project BASAL FB0821 while Ms. Bugueño was partially funded by the Programa de Iniciación Científica PIIC-DGIP of Universidad Técnica Federico Santa María.

Author information

Authors and Affiliations

Instituto Milenio Fundamentos de Los Datos, Departamento de Informática, Universidad Técnica Federico Santa María, Santiago, Chile
Margarita Bugueño & Marcelo Mendoza

Authors

Margarita Bugueño
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Mendoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Margarita Bugueño .

Editor information

Editors and Affiliations

Institut National des Sciences Appliquées, Rennes, France
Peggy Cellier
Maastricht University, Maastricht, The Netherlands
Kurt Driessens

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bugueño, M., Mendoza, M. (2020). Learning to Detect Online Harassment on Twitter with the Transformer. In: Cellier, P., Driessens, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham. https://doi.org/10.1007/978-3-030-43887-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-43887-6_23
Published: 28 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43886-9
Online ISBN: 978-3-030-43887-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Learning to Detect Online Harassment on Twitter with the Transformer

Abstract

Similar content being viewed by others

Attention-Based LSTM Network for Rumor Veracity Estimation of Tweets