Skip to main content

Advertisement

Log in

Differences between antisemitic and non-antisemitic English language tweets

  • Original Paper
  • Published:
Computational and Mathematical Organization Theory Aims and scope Submit manuscript

Abstract

Antisemitism is a global phenomenon on the rise that is negatively affecting Jews and communities more broadly. It has been argued that social media has opened up new opportunities for antisemites to disseminate material and organize. It is, therefore, necessary to get a picture of the scope and nature of antisemitism on social media. However, identifying antisemitic messages in large datasets is not trivial and more work is needed in this area. In this paper, we present and describe an annotated dataset that can be used to train tweet classifiers. We first explain how we created our dataset and approached identifying antisemitic content by experts. We then describe the annotated data, where 11% of conversations about Jews (January 2019–August 2020) and 13% of conversations about Israel (January–August 2020) were labeled antisemitic. Another important finding concerns lexical differences across queries and labels. We find that antisemitic content often relates to conspiracies of Jewish global dominance, the Middle East conflict, and the Holocaust.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://osome.iu.edu.

  2. The asterisk serves as a wildcard in our search query. Thus, the query includes "ZioNazi,” "ZioNazis" and "ZioNazism.".

  3. IHRA Working Definition of Antisemitism, see https://www.holocaustremembrance.com/resources/working-definitions-charters/working-definition-antisemitism.

  4. https://www.holocaustremembrance.com/resources/working-definitions-charters/working-definition-antisemitism/adoption-endorsement.

  5. The two tweets did not contain enough information to reach an agreement. One of them was a reply to another user that contained only the word "Jews." In the series of tweets, it could mean that Jews were being blamed for something. However, it was unclear at the time the two commenters were discussing it. Both tweets have since been deleted.

  6. For the keyword “ZioNazi*” we only find an agreement of 38%. This is due to one annotator choosing the following paragraph consistently for all antisemitic tweets: “Drawing comparisons of contemporary Israeli policy to that of the Nazis.” This relatively low number is a side effect of our annotation portal only allowing the choice of one Working Definition paragraph when annotating.

  7. S. Bird, E. Klein, and E. Loper (2009), Natural language processing with Python: analyzing text with the natural language toolkit," O’Reilly Media, Inc.,” PorterStemming() package.

  8. Scikit-learn: Machine Learning in Python, Pedregosa et al. JMLR 12, pp. 2825–2830, 2011., CountVectorizer() package.

  9. The token 'palestinian' appears most frequently in both antisemitic messages with 1.46% (n = 18) and non-antisemitic messages with 2.3% (n = 67).

  10. In particular, tweets including the insult "Kikes" show a high usage of emojis and mixed language with a span of N = 722 unique tokens for tweets classified as antisemitic, and 34.67% tokens (n = 250) represent emojis. In contrast, non-antisemitic tweets with a span of n = 788 unique tokens contain 4.82% (n = 38), only a margin of tokens that represent emojis.

  11. Figure 2 shows a word graph using the top 50 words by keyword (based on Sklearn’s CountVectorizer term frequency) for edge weights connecting words to the tweet annotation type and the Louvain Modularity algorithm within the Gephi application for grouping. Before graphing, English stop words and keywords were removed. The NLTK Porter Stemmer package was applied for word stemming. This resulted in a total of 238 unique words.

  12. This includes two duplicates in the randomized samples.

  13. The input was the full tweet texts for each corpus. A corpus is one keyword and either antisemitic or non-antisemitic coding. Sklearn’s CountVectorizer function was used to create a text frequency vector filtering words used in only one tweet and those used in more than 95% of the tweets, for a maximum of 10,000 words applying its word analyzer feature and stemming words into tokens.

  14. For further reference see Yener (2020) Step by Step: Twitter Sentiment Analysis in Python, in Towards Data Science (https://towardsdatascience.com/step-by-step-twitter-sentiment-analysis-in-python-d6f650ade58d), accessed 20 September 2021.

  15. For further API reference see https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis.

  16. Clarifying the concept of “followers” and “friends”: According to Twitter, “friends” are those whom the Twitter user follows (back), and “followers” are those who follow a particular user.

  17. Usernames are only displayed if they have more than 1000 followers or refer to an organization, institute, or NGO. Otherwise, they appear blackened.

  18. According to the profile's description, the bot slices random text and subsequently publishes it on Twitter. After examining the profile more thoroughly, we can assume that this user is not a human subject.

References

Download references

Acknowledgements

This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. We are grateful that we were able to use Indiana University’s Observatory on Social Media (OSoMe) tool and data (Davis et al. 2016). This research was supported by the Koret Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gunther Jikeli.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jikeli, G., Axelrod, D., Fischer, R.K. et al. Differences between antisemitic and non-antisemitic English language tweets. Comput Math Organ Theory 30, 232–266 (2024). https://doi.org/10.1007/s10588-022-09363-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10588-022-09363-2

Keywords