skip to main content
10.1145/3409334.3452077acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
short-paper

Language agnostic model: detecting islamophobic content on social media

Published:10 May 2021Publication History

ABSTRACT

Social media platforms can struggle to enforce rules preventing online abuse and hate speech due to the large amount of content that must be manually reviewed. Machine learning approaches have been proposed in the literature as a way to automate much of these labors, but social content in multiple languages further complicates this issue. Past work has focused on first building word embeddings in the target language which limits the application of such embeddings to other languages. We use the Google Neural Machine Translator (NMT) to identify and translate Non-English text to English to make the system language agnostic. We can therefore use already available pre-trained word embeddings, instead of training our models and word embeddings in different languages. We have experimented with different word-embedding and classifier pairs as we aimed to assess whether translated English data gives us accuracy comparable to an untranslated English dataset. Our best performing model, SVM with TF-IDF, gave us a 10-fold accuracy of 95.56 percent followed by the BERT model with a 10-fold accuracy of 94.66 percent on the translated data. This accuracy is close to the accuracy of the untranslated English dataset and far better than the accuracy of the untranslated Hindi dataset.

References

  1. R. Batuwita and V. Palade. 2013. Class Imbalance Learning Methods for Support Vector Machines. (2013).Google ScholarGoogle Scholar
  2. K. Darwish, W. Magdy, A. Rahimi, T. Baldwin, and N. Abokhodair. 2018. Predicting Online Islamophopic Behavior After #parisattacks. The Journal of Web Science 4 (2018).Google ScholarGoogle Scholar
  3. T. Davidson, D. Warmsley, M. Macy, and I. Weber. 2017. Automated Hate Speech Detection and the Problem of Offensive Language. In Eleventh International AAAI Conference on Web and Social Media. Québec, Canada.Google ScholarGoogle Scholar
  4. O. de Gibert, N. Pérez, A.-G. Pablos, and M. Cuadros. 2018. Hate Speech Dataset from a White Supremacy Forum. arXiv preprint arXiv:1809.04444 (2018).Google ScholarGoogle Scholar
  5. Google Cloud [n.d.]. Google Translation. https://cloud.google.com/translate/docs/basic/translating-text.Google ScholarGoogle Scholar
  6. Hatebase [n.d.]. Hatebase. https://hatebase.org/.Google ScholarGoogle Scholar
  7. P. Kaliamoorthi. 2020. Google AI - Advancing NLP with Efficient Projection based Model Architectures. https://ai.googleblog.com/2020/09/advancing-nlp-with-efficient-projection.html.Google ScholarGoogle Scholar
  8. Y. Kim. 2014. Convolutional Neural Networks for Sentence Classification. arXiv preprint arXiv:1408.5882 (2014).Google ScholarGoogle Scholar
  9. K. Krishnamoorthi, S. Ravi, and Z. Kozareva. 2019. PRADO: Projection Attention Networks for Document Classification On-device. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, 5013--5024.Google ScholarGoogle Scholar
  10. R. Kumar, A. Ojha, S. Malmasi, and M. Zampieri. 2018. Benchmarking Aggression Identification in Social Media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). Santa Fe, New Mexico, 1--11.Google ScholarGoogle Scholar
  11. S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Goharian, and O. Frieder. 2019. Hate Speech Detection: Challenges and Solutions. Plos One 14, 8 (2019), e0221152.Google ScholarGoogle ScholarCross RefCross Ref
  12. H. Mulki, H. Haddad, C. Ali, and H. Alshabani. 2019. L-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language. In Proceedings of the Third Workshop on Abusive Language Online. Florence, Italy, 111--118.Google ScholarGoogle Scholar
  13. P. Saha, B. Mathew, P. Goyal, and A. Mukherjee. 2019. HateMonitors: Language Agnostic Abuse Detection in Social Media. arXiv preprint arXiv:1909.12642 (2019).Google ScholarGoogle Scholar
  14. V. Valkov. [n.d.]. Intent Recognition with BERT using Keras and TensorFlow 2. https://www.kdnuggets.com/2020/02/intent-recognition-bert-keras-tensorflow.html.Google ScholarGoogle Scholar
  15. B. Vidgen and T. Yasseri. 2020. Detecting Weak and Strong Islamophobic Hate Speech on Social Media. Journal of Information Technology & Politics 17, 1 (2020), 66--78.Google ScholarGoogle ScholarCross RefCross Ref
  16. Z. Waseem and D. Hovy. 2016. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In Proceedings of the NAACL Student Research Workshop. San Diego, California, 88--93.Google ScholarGoogle Scholar

Index Terms

  1. Language agnostic model: detecting islamophobic content on social media

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ACM SE '21: Proceedings of the 2021 ACM Southeast Conference
        April 2021
        263 pages
        ISBN:9781450380683
        DOI:10.1145/3409334
        • Conference Chair:
        • Kazi Rahman,
        • Program Chair:
        • Eric Gamess

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 May 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate178of377submissions,47%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader