skip to main content
10.1145/3501247.3531579acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

Hostility Detection in Online Hindi-English Code-Mixed Conversations

Published:26 June 2022Publication History

ABSTRACT

With the rise in accessibility and popularity of various social media platforms, people have started expressing and communicating their ideas, opinions, and interests online. While these platforms are active sources of entertainment and idea-sharing, they also attract hostile and offensive content equally. Identification of hostile posts is an essential and challenging task. In particular, Hindi-English Code-Mixed online posts of conversational nature (which have a hierarchy of posts, comments, and replies) have escalated the challenges. There are two major challenges: (1) the complex structure of Code-Mixed text and (2) filtering the relevant previous context for a given utterance. To overcome these challenges, in this paper, we propose a novel hierarchical neural network architecture to identify hostile posts/comments/replies in online Hindi-English Code-Mixed conversations. We leverage large multilingual pre-trained (mLPT) models like mBERT, XLMR, and MuRIL. The mLPT models provide a rich representation of code-mix text and hierarchical modeling leads to a natural abstraction and selection of the relevant context. The propose model consistently outperformed all the baselines and emerged as a state-of-the-art performing model. We conducted multiple analyses and ablation studies to prove the robustness of the proposed model.

Skip Supplemental Material Section

Supplemental Material

WS22_S7_114.mp4

mp4

846.2 MB

References

  1. Somnath Banerjee, Maulindu Sarkar, Nancy Agrawal, Punyajoy Saha, and Mithun Das. 2021. Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages. (11 2021).Google ScholarGoogle Scholar
  2. Mohit Bhardwaj, Md. Shad Akhtar, Asif Ekbal, Amitava Das, and Tanmoy Chakraborty. 2020. Hostility Detection Dataset in Hindi. CoRR abs/2011.03588(2020). arXiv:2011.03588https://arxiv.org/abs/2011.03588Google ScholarGoogle Scholar
  3. Irina Bigoulaeva, Viktor Hangya, and Alexander Fraser. 2021. Cross-lingual transfer learning for hate speech detection. In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. 15–25.Google ScholarGoogle Scholar
  4. Ana-Maria Bucur, Marcos Zampieri, and Liviu P. Dinu. 2021. An Exploratory Analysis of the Relation between Offensive Language and Mental Health. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 3600–3606. https://doi.org/10.18653/v1/2021.findings-acl.315Google ScholarGoogle ScholarCross RefCross Ref
  5. Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747Google ScholarGoogle ScholarCross RefCross Ref
  6. Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. 512–515.Google ScholarGoogle ScholarCross RefCross Ref
  7. Arkadipta De, Venkatesh Elangovan, Kaushal Kumar Maurya, and Maunendra Sankar Desarkar. 2021. Coarse and fine-grained hostility detection in Hindi posts using fine tuned multilingual embeddings. In International Workshop on Combating On line Hostile Posts in Regional Languages during Emergency Situation. Springer, 201–212.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarGoogle ScholarCross RefCross Ref
  9. Sumanth Doddapaneni, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, and Mitesh M. Khapra. 2021. A Primer on Pretrained Multilingual Language Models. CoRR abs/2107.00676(2021). arXiv:2107.00676https://arxiv.org/abs/2107.00676Google ScholarGoogle Scholar
  10. Ayush Gupta, Rohan Sukumaran, Kevin John, and Sundeep Teki. 2021. Hostility Detection and Covid-19 Fake News Detection in Social Media. CoRR abs/2101.05953(2021). arXiv:2101.05953https://arxiv.org/abs/2101.05953Google ScholarGoogle Scholar
  11. Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. 2020. Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In International Conference on Machine Learning. PMLR, 4411–4421.Google ScholarGoogle Scholar
  12. Vikas Kumar Jha, Pa Hrudya, PN Vinu, Vishnu Vijayan, and Pa Prabaharan. 2020. DHOT-repository and classification of offensive tweets in the Hindi language. Procedia Computer Science 171 (2020), 2324–2333.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, and Ravirai Joshi. 2021. Evaluation of Deep Learning Models for Hostility Detection in Hindi Text. In 2021 6th International Conference for Convergence in Technology (I2CT). IEEE, 1–5.Google ScholarGoogle Scholar
  14. Aditya Kadam, Anmol Goel, Jivitesh Jain, Jushaan Singh Kalra, Mallika Subramanian, Manvith Reddy, Prashant Kodali, T. H. Arjun, Manish Shrivastava, and Ponnurangam Kumaraguru. 2021. Battling Hateful Content in Indic Languages HASOC ’21. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings abs/2110.12780. https://cdn.iiit.ac.in/cdn/precog.iiit.ac.in/pubs/2021_Sept_Battling_Hateful_Content_in_Indic_Languages_HASOC.pdfGoogle ScholarGoogle Scholar
  15. Satyajit Kamble and Aditya Joshi. December, 2018. Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models. International Conference on Natural Language Processing, Patiala, India abs/1811.05145(December, 2018).Google ScholarGoogle Scholar
  16. Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, and Partha Talukdar. 2021. MuRIL: Multilingual Representations for Indian Languages. arxiv:2103.10730 [cs.CL]Google ScholarGoogle Scholar
  17. Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. 2019. A BERT-based transfer learning approach for hate speech detection in online social media. In International Conference on Complex Networks and Their Applications. Springer, 928–940.Google ScholarGoogle Scholar
  18. Ravindra Nayak and Raviraj Joshi. 2021. Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings abs/2110.09338(2021).Google ScholarGoogle Scholar
  19. Thseen Nazir and Liyana Thabassum. 2021. Cyberbullying: Definition, types, effects, related factors and precautions to be taken during COVID-19 pandemic. The International Journal of Indian Psychology (2021).Google ScholarGoogle Scholar
  20. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016. 1135–1144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Debjoy Saha, Naman Paharia, Debajit Chakraborty, Punyajoy Saha, and Animesh Mukherjee. 2021. Hate-Alert@DravidianLangTech-EACL2021: Ensembling strategies for Transformer-based Offensive language Detection. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. Association for Computational Linguistics, Kyiv, 270–276. https://aclanthology.org/2021.dravidianlangtech-1.38Google ScholarGoogle Scholar
  22. Ujwal Narayan Sayar Ghosh Roy, Tathagata Raha, Zubair Abid, and Vasudeva Varma. 2021. Leveraging multilingual transformers for hate speech detection. (2021).Google ScholarGoogle Scholar
  23. Jonas Paul Schöne, Brian Parkinson, and Amit Goldenberg. 2021. Negativity spreads more than positivity on Twitter after both positive and negative political situations. Affective Science 2, 4 (2021), 379–390.Google ScholarGoogle ScholarCross RefCross Ref
  24. Chander Shekhar, Bhavya Bagla, Kaushal Kumar Maurya, and Maunendra Sankar Desarkar. 2021. Walk in Wild: An Ensemble Approach for Hostility Detection in Hindi Posts. CoRR abs/2101.06004(2021). arXiv:2101.06004https://arxiv.org/abs/2101.06004Google ScholarGoogle Scholar
  25. K Sreelakshmi, B Premjith, and KP Soman. 2020. Detection of hate speech text in Hindi-English code-mixed data. Procedia Computer Science 171 (2020), 737–744.Google ScholarGoogle ScholarCross RefCross Ref
  26. Phoey Lee Teh, Chi-Bin Cheng, and Weng Mun Chee. 2018. Identifying and categorising profane words in hate speech. In Proceedings of the 2nd International Conference on Compute and Data Analysis. 65–69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Abhishek Velankar, Hrushikesh Patil, Amol Gore, Shubham Salunke, and Raviraj Joshi. 2021. Hate and Offensive Speech Detection in Hindi and Marathi. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings (2021).Google ScholarGoogle Scholar
  28. Michael Walsh and Stephanie Baker. 2021. Twitter’s design stokes hostility and controversy. Here’s why, and how it might change. (2021).Google ScholarGoogle Scholar
  29. Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=rJ4km2R5t7Google ScholarGoogle Scholar
  30. Huiling You, Xingran Zhu, and Sara Stymne. 2021. Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context Disambiguation. In Proceedings of the 15th International Workshop on Semantic Evaluation, SemEval@ACL/IJCNLP 2021, Virtual Event / Bangkok, Thailand, August 5-6, 2021, Alexis Palmer, Nathan Schneider, Natalie Schluter, Guy Emerson, Aurélie Herbelot, and Xiaodan Zhu (Eds.). Association for Computational Linguistics, 150–156. https://doi.org/10.18653/v1/2021.semeval-1.15Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Hostility Detection in Online Hindi-English Code-Mixed Conversations
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022
            June 2022
            479 pages
            ISBN:9781450391917
            DOI:10.1145/3501247

            Copyright © 2022 ACM

            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 26 June 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate218of875submissions,25%

            Upcoming Conference

            Websci '24
            16th ACM Web Science Conference
            May 21 - 24, 2024
            Stuttgart , Germany

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format