Detecting racial stereotypes: An Italian social media corpus where psychology meets NLP

https://doi.org/10.1016/j.ipm.2022.103118Get rights and content

Highlights

  • The paper offers a psychological and computational perspective on racial stereotype.

  • It provides a novel fine-grained annotation scheme for racial stereotype.

  • It sheds light on how to describe and model stereotypes in hoaxes against immigrants.

  • A novel Italian corpus from FaceBook is annotated for validating this scheme.

  • The dataset is lexically analyzed and used for training classifiers.

Abstract

The generation of stereotypes allows us to simplify the cognitive complexity we have to deal with in everyday life. Stereotypes are extensively used to describe people who belong to a different ethnic group, particularly in racial hoaxes and hateful content against immigrants. This paper addresses the study of stereotypes from a novel perspective that involves psychology and computational linguistics both. On the one hand, it describes an Italian social media corpus built within a social psychology study, where stereotypes and related forms of discredit were made explicit through annotation. On the other hand, it provides some lexical analysis, to bring out the linguistic features of the messages collected in the corpus, and experiments for validating this annotation scheme and its automatic application to other corpora in the future. The main expected outcome is to shed some light on the usefulness of this scheme for training tools that automatically detect and label stereotypes in Italian.

Introduction

In the last few years, media have often raised local and international social issues and events concerning immigration, such as the war in Syria, the persistent state of crisis in many African countries, and the subsequent impulse to migration fluxes in Europe. Newspapers, TV channels and social networks offer news, information and opinions related to multi-ethnic relationships. They may also spread a racist and discriminatory discourse based on ethnic prejudices and stereotypes, for instance, hoaxes focused on racial differences. Sometimes such forms of communication also include Hate Speech (henceforth HS) and compel us to think about and deal with issues related to social conflicts.

A variety of answers to these societal challenges is proposed by policymakers, governments and civil society organizations, but also by researchers involved in local and European projects for monitoring, preventing and countering racism and xenophobia spreading in our society.1

For contrasting HS, the effort of researchers from several disciplinary fields is currently focused on mass media and on different facets of the language used in these contexts. Data network analysis, social psychology and discourse analysis are devoted to studying and monitoring them; but the social issues they raise impose challenges and expectations especially for Artificial Intelligence in general and for Computational Linguistics in particular. Consequently in the last few years the detection of hateful contents in social media has been among the hottest topics for Natural Language Processing (NLP), text classification and opinion mining (Basile et al., 2019, Bosco et al., 2018, Pang and Lee, 2008).

On the one hand, online communication, and in particular the so-called user-generated contents (UGC), offers us the largest amount of data ever seen before (i.e. big data) where HS and related phenomena are plentifully represented. On the other hand, several studies confirm that stereotypes, which are the cognitive basis of HS, are learned by various forms of socialization and especially by public discourse, spoken and written within mass media, and interpersonal conversations influenced by such public discourse (van Dijk, 2016), like those collected in social networks. The notion of HS is related in literature with that of discrimination and stereotype (Bauwelinck and Lefever, 2019, van Dijk, 2016, Fiske, 1998), which are the cognitive and behavioral counterparts of this phenomenon in human social life.

The first objective of this paper is to introduce a novel corpus of Italian social media texts built within the context of social psychology research (D’Errico & Paciello, 2018) aimed at exploring the socio-cognitive mechanisms underlying the opposition to immigrants’ hosting, and then annotated for making explicit stereotypes against immigrants and the related discredit forms. Nevertheless, beyond the opportunity that this project offers to deepen the theme of multi-ethnic relationships according to social psychology, it also allowed us to pursue a second objective which is to provide some contribution to the advancement of computational linguistic research about the possibility of automatically detecting and annotating stereotypes in texts.

This corpus, exploited – in a previous release – within social psychology studies,

is a collection of Facebook messages now made adequate also for training and testing NLP tools for Italian. This effort is coordinated by the Department of Formation, Psychology and Communication of the University of Bari “Aldo Moro” and also forms part of the Hate Speech Monitoring program of the Computer Science Department of the University of Turin2 with the aim at detecting, analyzing and countering HS implementing an inter-disciplinary approach (Bosco et al., 2017) within the context of the international project STERHEOTYPES3.

We can summarize the main steps of the methodology applied in this study as follows. First of all, we discuss the design and application of a novel fine-grained annotation scheme. This step allows us to thoroughly analyze how, and using which specific discredit forms, people express racial stereotypes by referring to a real social context and specific situation where discrimination and racism arise. The main goal of this paper is indeed to pave the way for the improvement of linguistic resources and tools for automatic stereotype detection and annotation, but also to show how to take into account other aspects involved in the generation of hateful content in order to give evidence to multiple facets of the hateful communication.

Secondly, for validating the scheme after its application on the FB-Stereotypes corpus,4 we perform a lexical analysis where word n-grams collected from the dataset are observed and compared with those drawn from other smaller datasets annotated with the same scheme. In particular, we exploited a sample of tweets collected as reactions to a set of racial hoaxes and another sample from a benchmark corpus of an HS detection shared task.

Finally, we provide some experiments especially focused on the stereotype category: for evaluating the possibility of automatically labeling stereotypes in a novel set of data an automatic stereotype detection tool is indeed trained on FB-Stereotypes and these other cited resources.

The paper is organized as follows. The next section surveys related literature regarding stereotypes and the major computational experiences about the detection of stereotypes, HS and other alike phenomena in social media texts. The third section is focused on the collection of data included in the FB-Stereotypes corpus, while the fourth presents the annotation scheme we designed and applied for making HS, stereotype and related phenomena explicit. Finally, in Sections 5 Lexical analysis and discussion, 6 Experiments, a lexical analysis and computational experiments are presented and discussed as the results we achieved in this project.

Section snippets

Related work and background

The notions of stereotype and prejudice are often used almost as synonym terms and there exists a close relationship between them that motivates this common use. The stereotype is indeed the cognitive nucleus of prejudice, which assumes in turn the face of discrimination, racist and hateful behaviors in social interactions, such as HS.

A stereotype consists of a firmly held association between a social group and some features, like physical, mental, behavioral or occupational quality, e.g. “

Collecting the FB-Stereotypes corpus

The main reference corpus of this study, i.e. FB-Stereotypes, is a collection of 2,990 Italian messages retrieved from Facebook. It is the result of a filtering and selection applied on a larger pre-existing dataset (including 12,583 posts) collected thanks to a conjoint effort of the group of the University of Bari and that of the UniNettuno, for social psychology studies about Emotions and Online Unethical dynamics toward immigrants hosting (D’Errico and Paciello, 2018, D’Errico et al., 2018).

Annotating HS and stereotypes in the FB-Stereotypes corpus

In this Section, we mainly focus on the annotation scheme designed for FB-Stereotypes and its application on this corpus. The first subsection is especially devoted to the description of the labels used for the annotation of each dimension, provided together with the constraints we defined for them, while the second is about their application on data.

Lexical analysis and discussion

In this section, we present a detailed lexical analysis that will also provide some hints about further investigation. Moreover, we will exploit in this analysis small sets of data extracted from another corpus which has been used as benchmark in an evaluation campaign for the detection of HS (HaSpeeDe2020).

In order to better understand the linguistic characteristics of the dataset taken into account in the present study, we performed the following lexical analyses:

  • i.

    listing the most relevant

Experiments

In this section, we aim at shedding some light on the detection of stereotypes by designing it as a binary classification task, that is the identification of messages where some stereotype occurs versus those where it does not. Therefore, we have performed experiments where we train on the dataset of comments from FaceBook described above (FB-Stereotypes) and a portion of a benchmark dataset of tweets and news headlines (HaSpeeDe2020) some classifiers whose performance has been well

Conclusion and future work

This paper presents and discusses an annotation scheme for making explicit the presence of stereotype and some related phenomena, such as HS and discredit. Stereotype is indeed the cognitive nucleus of prejudice which assumes in turn the face of discrimination and hateful behavior. Provided the relevance in the last few years of HS detection and other alike tasks, e.g. cyberbullying detection, offensive language and misogyny identification, this paper aims at shedding some novel light on HS by

CRediT authorship contribution statement

Cristina Bosco: Conceptualization, Methodology, Data curation, Writing – original draft, Supervision. Viviana Patti: Conceptualization, Methodology, Data curation, Writing – original draft. Simona Frenda: Software, Formal analysis, Writing – original draft. Alessandra Teresa Cignarella: Data curation, Investigation, Writing – review & editing. Marinella Paciello: Data curation, Writing – review & editing. Francesca D’Errico: Conceptualization, Data curation, Project administration, Funding

Acknowledgments

The work of all the authors is supported by the international project STERHEOTYPES - Studying European Racial Hoaxes and Stereotypes funded by VolksWagen Stiftung/Compagnia di San Paolo, Italy for the call for projects ‘Challenges for Europe”, (CUP: B99C20000640007); https://www.irit.fr/sterheotypes/.

References (52)

  • AllportG.

    The nature of prejudice

    (1954)
  • Álvarez CarmonaM.Á. et al.

    Overview of MEX-A3T at IberEval 2018: Authorship and aggressiveness analysis in Mexican Spanish tweets

  • AronsonE. et al.

    Social psychology

    (2013)
  • Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Pardo, F. M. R., et al. (2019). Semeval-2019 task 5:...
  • Bauwelinck, N., & Lefever, E. (2019). Measuring the Impact of Sentiment for Hate Speech Detection on Twitter. In...
  • BoscoC. et al.

    Overview of the EVALITA 2018 hate speech detection task

  • Bosco, C., Viviana, P., Bogetti, M., Conoscenti, M., Ruffo, G., Schifanella, R., et al. (2017). Tools and Resources for...
  • BrownR.

    Prejudice: Its social psychology

    (2011)
  • ChirilP. et al.

    “be nice to your wife! the restaurants are closed”: Can gender stereotype detection improve sexism classification?

  • CryanJ. et al.
  • Del VignaF. et al.

    Hate me, hate me not: Hate speech detection on Facebook

  • D’ErricoF. et al.

    Online moral disengagement and hostile emotions in discussions on hosting immigrants

    Internet Research

    (2018)
  • D’ErricoF. et al.

    Behind our words: Psychological paths underlying the un/supportive stance toward immigrants in social media

  • D’ErricoF. et al.

    ‘Immigrants, hell on board’. Stereotypes and Prejudice emerging from Racial Hoaxes through a Psycho-Linguistic Analysis

    Journal of Language and Discrimination

    (2022)
  • D’ErricoF. et al.

    Blame the opponent! Effects of multimodal discrediting moves in public debates

    Cognitive Computation

    (2012)
  • D’ErricoF. et al.

    Discrediting signals. A model of social evaluation to study discrediting moves in political debates

    Journal on Multimodal User Interfaces

    (2012)
  • DevlinJ. et al.

    BERT: Pre-training of deep bidirectional transformers for language understanding

  • van DijkT.A.

    Racism in the press

  • DurrheimK.

    Implicit prejudice in mind and interaction

  • ErjavecK. et al.

    ”You don’t understand, this is a new war!”?? analysis of hate speech in news web sites’ comments

    Mass Communication and Society

    (2012)
  • FersiniE. et al.

    Overview of the EVALITA 2018 task on automatic misogyny identification (AMI)

  • FersiniE. et al.

    Overview of the task on automatic misogyny identification at IberEval 2018

  • FieldsC.

    Stereotypes and stereotyping: Misperceptions, perspectives and role of social media

    (2016)
  • FiskeS.T.

    Stereotyping, prejudice, and discrimination

  • FiskeS.T. et al.

    Universal dimensions of social cognition: Warmth and competence

    TRENDS in Cognitive Sciences

    (2006)
  • FortunaP. et al.

    A survey on automatic detection of hate speech in text

    ACM Computing Surveys

    (2018)
  • Cited by (0)

    View full text