ABSTRACT
This paper presents the HASOC track and its two parts. HASOC is dedicated to evaluate technology for finding Offensive Language and Hate Speech. HASOC is creating test collections for languages with few resources and English for comparison. The first track within HASOC has continued work from 2019 and provided a testbed of Twitter posts for Hindi, German and English. The second track within HASOC has created test resources for Tamil and Malayalam in native and Latin script. Posts were extracted mainly from Youtube and Twitter. Both tracks have attracted much interest and over 40 research groups have participated as well as described their approaches in papers. In this overview, we present the tasks, the data and the main results.
- Ajees A P. 2020. Ajees@HASOC-Dravidian-CodeMix-FIRE2020. In FIRE (Working Notes). CEUR.Google Scholar
- Gaurav Arora. 2020. Gauravarora@HASOC-Dravidian-CodeMix- FIRE2020: Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection. In FIRE (Working Notes). CEUR.Google Scholar
- Nitin Nikamath Balaji and B Bharathi. 2020. SSNCSE-NLP@HASOC-Dravidian-CodeMix- FIRE2020: Offensive Language Identification on Multilingual Code Mixing Text. In FIRE (Working Notes). CEUR.Google Scholar
- Arup Baruah, Kaushik Amar Das, Ferdous Ahmed Barbhuiya, and Kuntal Dey. 2020. IIITG-ADBU@HASOC-Dravidian-CodeMix-FIRE2020: Offensive Content Detection in Code-Mixed Dravidian Text. In FIRE (Working Notes). CEUR.Google Scholar
- Bharathi Raja Chakravarthi, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, and John Philip McCrae. 2020. A Sentiment Analysis Dataset for Code-Mixed Malayalam-English. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL). European Language Resources association, Marseille, France, 177–184. https://www.aclweb.org/anthology/2020.sltu-1.25Google Scholar
- Bharathi Raja Chakravarthi, Anand Kumar M, John P. McCrae, B. Premjith, K.P. Soman, and Thomas Mandl. 2020. Overview of the track on HASOC-Offensive Language Identification-DravidianCodeMix. In FIRE (Working Notes). CEUR.Google Scholar
- Bharathi Raja Chakravarthi, Vigneshwaran Muralidaran, Ruba Priyadharshini, and John Philip McCrae. 2020. Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL). European Language Resources association, Marseille, France, 202–210. https://www.aclweb.org/anthology/2020.sltu-1.28Google Scholar
- Kunjie Dong. 2020. YUN@HASOC-Dravidian-CodeMix-FIRE2020: A Multi-component Sentiment Analysis Model for Offensive Language Identification. In FIRE (Working Notes). CEUR.Google Scholar
- Tochukwu Ezike and Manikandan Sivanesan. 2020. Chrestotes at HASOC 2020: Bert Fine-tuning for the Identification of Hate Speech and Offensive Language in Tweets. In FIRE (Working Notes). CEUR.Google Scholar
- Ritesh Kumar, Bornini Lahiri, Atul Kr. Ojha, and Akanksha Bansal. 2020. ComMA@FIRE 2020: Exploring Multilingual Joint Training across different Classification Tasks. In FIRE (Working Notes). CEUR.Google Scholar
- Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, and Marcos Zampieri. 2018. Benchmarking Aggression Identification in Social Media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). Association for Computational Linguistics, Santa Fe, New Mexico, USA, 1–11. https://www.aclweb.org/anthology/W18-4401Google Scholar
- Sunil Kumar, Abhinav Saumya, and Jyoti Prakash Singh. 2020. NITP-AINLP@HASOC-Dravidian-CodeMix-FIRE2020: A Machine Learning Approach to Identify Offensive Languages from Dravidian Code-Mixed Text. In FIRE (Working Notes). CEUR.Google Scholar
- Thomas Mandl, Sandip Modha, Prasenjit Majumder, Daksh Patel, Mohana Dave, Chintak Mandlia, and Aditya Patel. 2019. Overview of the HASOC track at FIRE 2019: Hate speech and offensive content identification in Indo-European languages. In Proceedings of the 11th Forum for Information Retrieval Evaluation. CEUR, 14–17. http://ceur-ws.org/Vol-2517/T3-1.pdfGoogle ScholarDigital Library
- Thomas Mandl, Sandip Modha, Gautam Kishore Shahi, Amit Kumar Jaiswal, Durgesh Nandini, Daksh Patel, Prasenjit Majumder, and Johannes Schäfer. 2020. Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive Content Identification in Indo-European Languages. In FIRE (Working Notes). CEUR.Google Scholar
- Ankit Kumar Mishra, Sunil Saumya, and Abhinav Kumar. 2020. IIIT_DWD@HASOC 2020: Identifying offensive content in multitask Indo-European languages. In FIRE (Working Notes). CEUR.Google Scholar
- Veena P V, Praveena Ramanan, and Remmiya Devi G. 2020. CENMates@HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification on Code-mixed Social Media Comments. In FIRE (Working Notes). CEUR.Google Scholar
- Varsha Pathak, Manish Joshi, Prasad Joshi, Monica Mundada, and Tanmay Joshi. 2020. KBCNMUJAL@HASOC-Dravidian-CodeMix-FIRE2020: Using Machine Learning for Detection of Hate Speech and Offensive Codemix Social Media text. In FIRE (Working Notes). CEUR.Google Scholar
- Fabio Poletto, Valerio Basile, Manuela Sanguinetti, Cristina Bosco, and Viviana Patti. 2020. Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation(2020), 1–47. https://doi.org/10.1007/s10579-020-09502-8Google ScholarDigital Library
- Roushan Raj, Shivangi Srivastava, and Sunil Saumya. 2020. NSIT & IIITDWD @ HASOC 2020: Deep learning model for hate-speech Identification in Indo-European languages. In FIRE (Working Notes). CEUR.Google Scholar
- Tharindu Ranasinghe and Marcos Zampieri. 2020. WLV-RIT @ HASOC 2020: Offensive Language Identification in Code-switched Texts. In FIRE (Working Notes). CEUR.Google Scholar
- Sara Renjit. 2020. CUSAT-NLP@HASOC-Dravidian-CodeMix-FIRE2020: Identifying Offensive Language from Manglish Tweets. In FIRE (Working Notes). CEUR.Google Scholar
- Siva Sai and Yashvardhan Sharma. 2020. Siva@HASOC-Dravidian-CodeMix-FIRE-2020: Multilingual Offensive Speech Detection in Code-mixed and Romanized Text. In FIRE (Working Notes). CEUR.Google Scholar
- Pankaj Singh and Pushpak Bhattacharyya. 2020. CFILT IIT Bombay@HASOC-Dravidian-CodeMix FIRE 2020: Assisting ensemble of transformers with random transliteration. In FIRE (Working Notes). CEUR. http://ceur-ws.org/Google Scholar
- Michael Wiegand, Melanie Siegel, and Josef Ruppenhofer. 2018. Overview of the germeval 2018 shared task on the identification of offensive language. (2018). https://ids-pub.bsz-bw.de/files/8493/Wiegand_Siegel_Ruppenhofer_Overview_of_the_GermEval_2018.pdfGoogle Scholar
- Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. Predicting the Type and Target of Offensive Posts in Social Media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 1415–1420. https://doi.org/10.18653/v1/N19-1144Google ScholarCross Ref
- Yueying Zhu and Xiaobing Zhou. 2020. Zyy1510@HASOC-Dravidian-CodeMix-FIRE2020: An Ensemble Model for Offensive Language Identification. In FIRE (Working Notes). CEUR.Google Scholar
Recommendations
Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of DataWith the increase in user generated content, particularly on social media networks, the amount of hate speech is also steadily increasing. So, there is a need to automatically detect such hateful content and curb the wrongful activities. While relevant ...
Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech
FIRE '21: Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval EvaluationThe HASOC track is dedicated to the evaluation of technology for finding Offensive Language and Hate Speech. HASOC is creating a multilingual data corpus mainly for English and under-resourced languages(Hindi and Marathi). This paper presents one HASOC ...
Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages
FIRE '22: Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval EvaluationIn recent years, the spread of online offensive content has become of great concern, motivating researchers to develop robust systems capable of identifying such content automatically. To carry out a fair evaluation of these systems, several ...
Comments