short-paper

Language agnostic model: detecting islamophobic content on social media

Authors:
Heena Khan

Middle Tennessee State University

Middle Tennessee State University
View Profile

,
Joshua L. Phillips

Middle Tennessee State University

Middle Tennessee State University
View Profile

ACM SE '21: Proceedings of the 2021 ACM Southeast ConferenceApril 2021Pages 229–233https://doi.org/10.1145/3409334.3452077

Published:10 May 2021Publication History

ACM SE '21: Proceedings of the 2021 ACM Southeast Conference

Pages 229–233

ABSTRACT

Social media platforms can struggle to enforce rules preventing online abuse and hate speech due to the large amount of content that must be manually reviewed. Machine learning approaches have been proposed in the literature as a way to automate much of these labors, but social content in multiple languages further complicates this issue. Past work has focused on first building word embeddings in the target language which limits the application of such embeddings to other languages. We use the Google Neural Machine Translator (NMT) to identify and translate Non-English text to English to make the system language agnostic. We can therefore use already available pre-trained word embeddings, instead of training our models and word embeddings in different languages. We have experimented with different word-embedding and classifier pairs as we aimed to assess whether translated English data gives us accuracy comparable to an untranslated English dataset. Our best performing model, SVM with TF-IDF, gave us a 10-fold accuracy of 95.56 percent followed by the BERT model with a 10-fold accuracy of 94.66 percent on the translated data. This accuracy is close to the accuracy of the untranslated English dataset and far better than the accuracy of the untranslated Hindi dataset.

References

R. Batuwita and V. Palade. 2013. Class Imbalance Learning Methods for Support Vector Machines. (2013).Google Scholar
K. Darwish, W. Magdy, A. Rahimi, T. Baldwin, and N. Abokhodair. 2018. Predicting Online Islamophopic Behavior After #parisattacks. The Journal of Web Science 4 (2018).Google Scholar
T. Davidson, D. Warmsley, M. Macy, and I. Weber. 2017. Automated Hate Speech Detection and the Problem of Offensive Language. In Eleventh International AAAI Conference on Web and Social Media. Québec, Canada.Google Scholar
O. de Gibert, N. Pérez, A.-G. Pablos, and M. Cuadros. 2018. Hate Speech Dataset from a White Supremacy Forum. arXiv preprint arXiv:1809.04444 (2018).Google Scholar
Google Cloud [n.d.]. Google Translation. https://cloud.google.com/translate/docs/basic/translating-text.Google Scholar
Hatebase [n.d.]. Hatebase. https://hatebase.org/.Google Scholar
P. Kaliamoorthi. 2020. Google AI - Advancing NLP with Efficient Projection based Model Architectures. https://ai.googleblog.com/2020/09/advancing-nlp-with-efficient-projection.html.Google Scholar
Y. Kim. 2014. Convolutional Neural Networks for Sentence Classification. arXiv preprint arXiv:1408.5882 (2014).Google Scholar
K. Krishnamoorthi, S. Ravi, and Z. Kozareva. 2019. PRADO: Projection Attention Networks for Document Classification On-device. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, 5013--5024.Google Scholar
R. Kumar, A. Ojha, S. Malmasi, and M. Zampieri. 2018. Benchmarking Aggression Identification in Social Media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). Santa Fe, New Mexico, 1--11.Google Scholar
S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Goharian, and O. Frieder. 2019. Hate Speech Detection: Challenges and Solutions. Plos One 14, 8 (2019), e0221152.Google ScholarCross Ref
H. Mulki, H. Haddad, C. Ali, and H. Alshabani. 2019. L-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language. In Proceedings of the Third Workshop on Abusive Language Online. Florence, Italy, 111--118.Google Scholar
P. Saha, B. Mathew, P. Goyal, and A. Mukherjee. 2019. HateMonitors: Language Agnostic Abuse Detection in Social Media. arXiv preprint arXiv:1909.12642 (2019).Google Scholar
V. Valkov. [n.d.]. Intent Recognition with BERT using Keras and TensorFlow 2. https://www.kdnuggets.com/2020/02/intent-recognition-bert-keras-tensorflow.html.Google Scholar
B. Vidgen and T. Yasseri. 2020. Detecting Weak and Strong Islamophobic Hate Speech on Social Media. Journal of Information Technology & Politics 17, 1 (2020), 66--78.Google ScholarCross Ref
Z. Waseem and D. Hovy. 2016. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In Proceedings of the NAACL Student Research Workshop. San Diego, California, 88--93.Google Scholar

Index Terms

Language agnostic model: detecting islamophobic content on social media
1. Applied computing
  1. Law, social and behavioral sciences
    1. Sociology
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi
Artificial Neural Networks in Pattern Recognition
Abstract
Transformers are the most eminent architectures used for a vast range of Natural Language Processing tasks. These models are pre-trained over a large text corpus and are meant to serve state-of-the-art results over tasks like text classification. ...
Read More
An automatic non-English sentiment lexicon builder using unannotated corpus

Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages ...
Read More
A statistical approach to crosslingual natural language tasks

The existence of huge volumes of documents written in multiple languages on Internet leads to investigate novel algorithmic approaches to deal with information of this kind. However, most crosslingual natural language processing (NLP) tasks consider a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACM SE '21: Proceedings of the 2021 ACM Southeast Conference
April 2021
263 pages
ISBN:9781450380683
DOI:10.1145/3409334
Conference Chair:
Kazi Rahman
Jacksonville State University
,
Program Chair:
Eric Gamess
Jacksonville State University, Jacksonville, Alabama, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 May 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dataset
islamophobia
natural language processing
sentiment analysis
social media
text classification
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate178of377submissions,47%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 100
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Language agnostic model: detecting islamophobic content on social media

ACM SE '21: Proceedings of the 2021 ACM Southeast Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi

An automatic non-English sentiment lexicon builder using unannotated corpus

A statistical approach to crosslingual natural language tasks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Language agnostic model: detecting islamophobic content on social media

ACM SE '21: Proceedings of the 2021 ACM Southeast Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi

An automatic non-English sentiment lexicon builder using unannotated corpus

A statistical approach to crosslingual natural language tasks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media