ABSTRACT
With the widespread increase in hateful content on the web, hate detection has become more crucial than ever. Although vast literature exists on hate detection from text, images and videos, interestingly, there has been no previous work on hateful comment detection (HCD) from video pages. HCD is critical for comment moderation and for flagging controversial videos. Comments are often short, contextual and convoluted making the problem challenging. Toward solving this problem, we contribute a dataset, HateComments, consisting of 2071 comments for 401 videos obtained from two popular video sharing platforms. We investigate two related tasks: binary HCD and 4-class multi-label hate target-type prediction (HTP). We systematically explore the importance of various forms of context for effective HCD. Our initial experiments show that our best method which leverages rich video context (like description, transcript and visual input) leads to an HCD accuracy of ~78.6% and an ROC AUC score of ~0.61 for HTP. Code and data is at https://drive.google.com/file/d/1EUbWDUokv1CYkWKlwByUC6yIuBGUw2MN/.
Supplemental Material
- Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web companion (WWW '17). ACM, Perth, Australia, 759--760.Google ScholarDigital Library
- Thales Bertaglia, Andreea Grigoriu, Michel Dumontier, and Gijs van Dijck. 2021. Abusive Language on Social Media Through the Legal Looking Glass. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021). 191--200.Google ScholarCross Ref
- Mohit Bhardwaj, Md Shad Akhtar, Asif Ekbal, Amitava Das, and Tanmoy Chakraborty. 2020. Hostility detection dataset in Hindi. arXiv:2011.03588 (2020).Google Scholar
- Pete Burnap and Matthew L Williams. 2016. Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Science, Vol. 5 (2016), 1--15.Google ScholarCross Ref
- Mohit Chandra, Ashwin Pathak, Eesha Dutta, Paryul Jain, Manish Gupta, Manish Shrivastava, and Ponnurangam Kumaraguru. 2020. AbuseAnalyzer: Abuse detection, severity and target prediction for Gab posts. In Proceedings of the 28th International Conference on Computational Linguistics (COLING '20). ICCL, Barcelona, Spain (Online), 6277--6283.Google ScholarCross Ref
- Lu Cheng, Kai Shu, Siqi Wu, Yasin N Silva, Deborah L Hall, and Huan Liu. 2020. Unsupervised cyberbullying detection via time-informed Gaussian mixture model. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20). ACM, Online, 185--194.Google ScholarDigital Library
- Abhishek Das, Japsimar Singh Wahi, and Siyao Li. 2020. Detecting hate speech in multi-modal memes. arXiv:2012.14891 (2020).Google Scholar
- Mithun Das, Rohit Raj, Punyajoy Saha, Binny Mathew, Manish Gupta, and Animesh Mukherjee. 2023. HateMM: A Multi-Modal Dataset for Hate Video Classification. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 17. 1014--1023.Google ScholarCross Ref
- Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the Eleventh International Conference on Web and Social Media (ICWSM '17). AAAI Press, Montré al, Qué bec, Canada, 512--515.Google ScholarCross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Paula Fortuna and Sérgio Nunes. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR), Vol. 51, 4 (2018), 1--30.Google ScholarDigital Library
- Antigoni Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2019. A unified deep learning architecture for abuse detection. In Proceedings of the 10th ACM Conference on Web Science (WebSci '19). ACM, Boston, MA, USA, 105--114.Google ScholarDigital Library
- Raul Gomez, Jaume Gibert, Lluis Gomez, and Dimosthenis Karatzas. 2020. Exploring hate speech detection in multimodal publications. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV '20). IEEE, Snowmass Village, CO, USA, 1470--1478.Google ScholarCross Ref
- Maarten Grootendorst. 2020. Keybert: Minimal keyword extraction with bert. Internet]. Available: https://maartengr. github. io/KeyBERT/index. html (2020).Google Scholar
- Vijayasaradhi Indurthi, Bakhtiyar Syed, Manish Shrivastava, Nikhil Chakravartula, Manish Gupta, and Vasudeva Varma. 2019. FERMI at SemEval-2019 task 5: Using sentence embeddings to identify hate speech against immigrants and women in Twitter. In Proceedings of the 13th international workshop on semantic evaluation. 70--74.Google ScholarCross Ref
- Mika Juuti, Tommi Gröndahl, Adrian Flanagan, and N Asokan. 2020. A little goes a long way: Improving toxic language classification despite data scarcity. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings (EMNLP '20). ACL, Online, 2991--3009.Google ScholarCross Ref
- Sweta Karlekar and Mohit Bansal. 2018. SafeCity: Understanding diverse forms of sexual harassment personal stories. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP '18). ACL, Brussels, Belgium, 2805--2811.Google ScholarCross Ref
- Brendan Kennedy, Xisen Jin, Aida Mostafazadeh Davani, Morteza Dehghani, and Xiang Ren. 2020. Contextualizing hate speech classifiers with post-hoc explanation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL '20). ACL, Online, 5435--5442.Google ScholarCross Ref
- Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, and Davide Testuggine. 2020. The Hateful Memes Challenge: Detecting hate speech in multimodal memes. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeuIPS '20, Vol. 33). Online.Google Scholar
- Sean MacAvaney, Hao-Ren Yao, Eugene Yang, Katina Russell, Nazli Goharian, and Ophir Frieder. 2019. Hate speech detection: Challenges and solutions. PLOS One, Vol. 14, 8 (2019), e0221152.Google ScholarCross Ref
- Pulkit Parikh, Harika Abburi, Pinkesh Badjatiya, Radhika Krishnan, Niyati Chhaya, Manish Gupta, and Vasudeva Varma. 2019. Multi-label categorization of accounts of sexism using a neural framework. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP '19). ACL, Hong Kong, China, 1642--1652.Google ScholarCross Ref
- Pulkit Parikh, Harika Abburi, Niyati Chhaya, Manish Gupta, and Vasudeva Varma. 2021. Categorizing sexism and misogyny through neural approaches. ACM Trans. Web, Vol. 15, 4, Article 17 (June 2021).Google ScholarDigital Library
- Jing Qian, Mai ElSherief, Elizabeth Belding, and William Yang Wang. 2018. Hierarchical CVAE for fine-grained hate speech classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP '18). ACL, Brussels, Belgium, 3550--3559.Google ScholarCross Ref
- Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.Google Scholar
- Ashima Suvarna and Grusha Bhalla. 2020. #NotAWhore! A computational linguistic perspective of rape culture and victimization on social media. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. ACL, Online, 328--335.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).Google Scholar
- Zeerak Waseem. 2016. Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the First Workshop on NLP and Computational Social Science. ACL, Austin, TX, USA, 138--142.Google ScholarCross Ref
- Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. ACL, San Diego, CA, USA, 88--93.Google ScholarCross Ref
- Ziqi Zhang and Lei Luo. 2019. Hate speech detection: A solved problem? The challenging case of long tail on Twitter. Semantic Web, Vol. 10, 5 (2019), 925--945.Google ScholarDigital Library
- Haoti Zhong, Hao Li, Anna Squicciarini, Sarah Rajtmajer, Christopher Griffin, David Miller, and Cornelia Caragea. 2016. Content-driven detection of cyberbullying on the Instagram social network. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI '16). AAAI Press, New York, NY, USA, 3952--3958.Google ScholarDigital Library
Index Terms
- Hateful Comment Detection and Hate Target Type Prediction for Video Comments
Recommendations
Understanding and Analyzing COVID-19-related Online Hate Propagation Through Hateful Memes Shared on Twitter
ASONAM '23: Proceedings of the 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningRecent studies regarding the COVID-19 pandemic have revealed the widespread propagation of hateful content during this period. While significant research has focused on COVID-19-related online hate in text (e.g., text-based tweets), the role of memes in ...
Exploring the Emerging Type of Comment for Online Videos: DanMu
DanMu, an emerging type of user-generated comment, has become increasingly popular in recent years. Many online video platforms such as Tudou.com have provided the DanMu function. Unlike traditional online reviews such as reviews at Youtube.com that are ...
Multimodal Zero-Shot Hateful Meme Detection
WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022Facebook has recently launched the hateful meme detection challenge, which garnered much attention in academic and industry research communities. Researchers have proposed multimodal deep learning classification methods to perform hateful meme ...
Comments