research-article

Hostility Detection in Online Hindi-English Code-Mixed Conversations

Authors:
Aditi Bagora

Indian Institute of Technology Hyderabad, India

Indian Institute of Technology Hyderabad, India
View Profile

,
Kamal Shrestha

Indian Institute of Technology Hyderabad, Nepal

Indian Institute of Technology Hyderabad, Nepal
View Profile

,
Kaushal Maurya

Indian Institute of Technology Hyderabad, India

Indian Institute of Technology Hyderabad, India
View Profile

,
Maunendra Sankar Desarkar

Indian Institute of Technology Hyderabad, India

Indian Institute of Technology Hyderabad, India
View Profile

WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022June 2022Pages 390–400https://doi.org/10.1145/3501247.3531579

Published:26 June 2022Publication History

WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022

Pages 390–400

ABSTRACT

With the rise in accessibility and popularity of various social media platforms, people have started expressing and communicating their ideas, opinions, and interests online. While these platforms are active sources of entertainment and idea-sharing, they also attract hostile and offensive content equally. Identification of hostile posts is an essential and challenging task. In particular, Hindi-English Code-Mixed online posts of conversational nature (which have a hierarchy of posts, comments, and replies) have escalated the challenges. There are two major challenges: (1) the complex structure of Code-Mixed text and (2) filtering the relevant previous context for a given utterance. To overcome these challenges, in this paper, we propose a novel hierarchical neural network architecture to identify hostile posts/comments/replies in online Hindi-English Code-Mixed conversations. We leverage large multilingual pre-trained (mLPT) models like mBERT, XLMR, and MuRIL. The mLPT models provide a rich representation of code-mix text and hierarchical modeling leads to a natural abstraction and selection of the relevant context. The propose model consistently outperformed all the baselines and emerged as a state-of-the-art performing model. We conducted multiple analyses and ablation studies to prove the robustness of the proposed model.

Supplemental Material

WS22_S7_114.mp4

mp4

846.2 MB

Download

References

Somnath Banerjee, Maulindu Sarkar, Nancy Agrawal, Punyajoy Saha, and Mithun Das. 2021. Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages. (11 2021).Google Scholar
Mohit Bhardwaj, Md. Shad Akhtar, Asif Ekbal, Amitava Das, and Tanmoy Chakraborty. 2020. Hostility Detection Dataset in Hindi. CoRR abs/2011.03588(2020). arXiv:2011.03588https://arxiv.org/abs/2011.03588Google Scholar
Irina Bigoulaeva, Viktor Hangya, and Alexander Fraser. 2021. Cross-lingual transfer learning for hate speech detection. In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. 15–25.Google Scholar
Ana-Maria Bucur, Marcos Zampieri, and Liviu P. Dinu. 2021. An Exploratory Analysis of the Relation between Offensive Language and Mental Health. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 3600–3606. https://doi.org/10.18653/v1/2021.findings-acl.315Google ScholarCross Ref
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747Google ScholarCross Ref
Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. 512–515.Google ScholarCross Ref
Arkadipta De, Venkatesh Elangovan, Kaushal Kumar Maurya, and Maunendra Sankar Desarkar. 2021. Coarse and fine-grained hostility detection in Hindi posts using fine tuned multilingual embeddings. In International Workshop on Combating On line Hostile Posts in Regional Languages during Emergency Situation. Springer, 201–212.Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarCross Ref
Sumanth Doddapaneni, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, and Mitesh M. Khapra. 2021. A Primer on Pretrained Multilingual Language Models. CoRR abs/2107.00676(2021). arXiv:2107.00676https://arxiv.org/abs/2107.00676Google Scholar
Ayush Gupta, Rohan Sukumaran, Kevin John, and Sundeep Teki. 2021. Hostility Detection and Covid-19 Fake News Detection in Social Media. CoRR abs/2101.05953(2021). arXiv:2101.05953https://arxiv.org/abs/2101.05953Google Scholar
Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. 2020. Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In International Conference on Machine Learning. PMLR, 4411–4421.Google Scholar
Vikas Kumar Jha, Pa Hrudya, PN Vinu, Vishnu Vijayan, and Pa Prabaharan. 2020. DHOT-repository and classification of offensive tweets in the Hindi language. Procedia Computer Science 171 (2020), 2324–2333.Google ScholarCross Ref
Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, and Ravirai Joshi. 2021. Evaluation of Deep Learning Models for Hostility Detection in Hindi Text. In 2021 6th International Conference for Convergence in Technology (I2CT). IEEE, 1–5.Google Scholar
Aditya Kadam, Anmol Goel, Jivitesh Jain, Jushaan Singh Kalra, Mallika Subramanian, Manvith Reddy, Prashant Kodali, T. H. Arjun, Manish Shrivastava, and Ponnurangam Kumaraguru. 2021. Battling Hateful Content in Indic Languages HASOC ’21. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings abs/2110.12780. https://cdn.iiit.ac.in/cdn/precog.iiit.ac.in/pubs/2021_Sept_Battling_Hateful_Content_in_Indic_Languages_HASOC.pdfGoogle Scholar
Satyajit Kamble and Aditya Joshi. December, 2018. Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models. International Conference on Natural Language Processing, Patiala, India abs/1811.05145(December, 2018).Google Scholar
Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, and Partha Talukdar. 2021. MuRIL: Multilingual Representations for Indian Languages. arxiv:2103.10730 [cs.CL]Google Scholar
Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. 2019. A BERT-based transfer learning approach for hate speech detection in online social media. In International Conference on Complex Networks and Their Applications. Springer, 928–940.Google Scholar
Ravindra Nayak and Raviraj Joshi. 2021. Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings abs/2110.09338(2021).Google Scholar
Thseen Nazir and Liyana Thabassum. 2021. Cyberbullying: Definition, types, effects, related factors and precautions to be taken during COVID-19 pandemic. The International Journal of Indian Psychology (2021).Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016. 1135–1144.Google ScholarDigital Library
Debjoy Saha, Naman Paharia, Debajit Chakraborty, Punyajoy Saha, and Animesh Mukherjee. 2021. Hate-Alert@DravidianLangTech-EACL2021: Ensembling strategies for Transformer-based Offensive language Detection. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. Association for Computational Linguistics, Kyiv, 270–276. https://aclanthology.org/2021.dravidianlangtech-1.38Google Scholar
Ujwal Narayan Sayar Ghosh Roy, Tathagata Raha, Zubair Abid, and Vasudeva Varma. 2021. Leveraging multilingual transformers for hate speech detection. (2021).Google Scholar
Jonas Paul Schöne, Brian Parkinson, and Amit Goldenberg. 2021. Negativity spreads more than positivity on Twitter after both positive and negative political situations. Affective Science 2, 4 (2021), 379–390.Google ScholarCross Ref
Chander Shekhar, Bhavya Bagla, Kaushal Kumar Maurya, and Maunendra Sankar Desarkar. 2021. Walk in Wild: An Ensemble Approach for Hostility Detection in Hindi Posts. CoRR abs/2101.06004(2021). arXiv:2101.06004https://arxiv.org/abs/2101.06004Google Scholar
K Sreelakshmi, B Premjith, and KP Soman. 2020. Detection of hate speech text in Hindi-English code-mixed data. Procedia Computer Science 171 (2020), 737–744.Google ScholarCross Ref
Phoey Lee Teh, Chi-Bin Cheng, and Weng Mun Chee. 2018. Identifying and categorising profane words in hate speech. In Proceedings of the 2nd International Conference on Compute and Data Analysis. 65–69.Google ScholarDigital Library
Abhishek Velankar, Hrushikesh Patil, Amol Gore, Shubham Salunke, and Raviraj Joshi. 2021. Hate and Offensive Speech Detection in Hindi and Marathi. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings (2021).Google Scholar
Michael Walsh and Stephanie Baker. 2021. Twitter’s design stokes hostility and controversy. Here’s why, and how it might change. (2021).Google Scholar
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=rJ4km2R5t7Google Scholar
Huiling You, Xingran Zhu, and Sara Stymne. 2021. Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context Disambiguation. In Proceedings of the 15th International Workshop on Semantic Evaluation, SemEval@ACL/IJCNLP 2021, Virtual Event / Bangkok, Thailand, August 5-6, 2021, Alexis Palmer, Nathan Schneider, Natalie Schluter, Guy Emerson, Aurélie Herbelot, and Xiaodan Zhu (Eds.). Association for Computational Linguistics, 150–156. https://doi.org/10.18653/v1/2021.semeval-1.15Google ScholarCross Ref

Index Terms

Hostility Detection in Online Hindi-English Code-Mixed Conversations
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals

Index terms have been assigned to the content through auto-classification.

Recommendations

A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari
Social Media has been growing and has provided the world with a platform to opine, debate, display, and discuss like never before. It has a major influence in research areas that analyze human behavior and social groups, and the phenomenon of social ...
Read More
Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

With the increase in user generated content, particularly on social media networks, the amount of hate speech is also steadily increasing. So, there is a need to automatically detect such hateful content and curb the wrongful activities. While relevant ...
Read More
Stance Detection in Hindi-English Code-Mixed Data
CoDS COMAD 2020: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD

Social media sites such as Twitter, Facebook, and many other microblogging forums have emerged as a platform for people to express their opinions and perspectives on different events. People often tend to take a stance; in favor, against or neutral ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022
June 2022
479 pages
ISBN:9781450391917
DOI:10.1145/3501247
General Chairs:
Ricardo Baeza-Yates
Northeastern University, MA, USA & Universitat Pompeu Fabra, Spain
,
Katrin Weller
GESIS & Center for Advanced Internet Studies, Germany
,
Organizing Chair:
Manuel Portela
Universitat Pompeu Fabra, Spain
,
Program Chairs:
Oshani Seneviratne
Rensselaer Polytechnic Institute, NY, USA
,
Ingmar Weber
Qatar Computing Research Institute, Qatar
,
Taha Yasseri
University College Dublin, Ireland
,
Publications Chairs:
Anna Bon
Vrije Universiteit Amsterdam, Netherlands
,
Srinath Srinivas
International Institute of Information Technology, Bangalore, India
,
Luis-Daniel Ibáñez
University of Southampton, UK
Copyright © 2022 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Code-Mixed data
Neural networks
hostility detection
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate218of875submissions,25%
Upcoming Conference
Websci '24

Sponsor:

sigweb

16th ACM Web Science Conference

May 21 - 24, 2024

Stuttgart , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 203
  Total Downloads
- Downloads (Last 12 months)46
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Hostility Detection in Online Hindi-English Code-Mixed Conversations

WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text

Stance Detection in Hindi-English Code-Mixed Data