research-article

Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German

Authors:
Thomas Mandl

University of Hildesheim, Germany

University of Hildesheim, Germany
View Profile

,
Sandip Modha

LDRP Institute of Technology and Research, India

LDRP Institute of Technology and Research, India
View Profile

,
Anand Kumar M

National Institute of Technology Karnataka, India

National Institute of Technology Karnataka, India
View Profile

,
Bharathi Raja Chakravarthi

National University of Ireland, Ireland

National University of Ireland, Ireland
View Profile

FIRE '20: Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval EvaluationDecember 2020Pages 29–32https://doi.org/10.1145/3441501.3441517

Published:17 January 2021Publication History

FIRE '20: Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation

Pages 29–32

ABSTRACT

This paper presents the HASOC track and its two parts. HASOC is dedicated to evaluate technology for finding Offensive Language and Hate Speech. HASOC is creating test collections for languages with few resources and English for comparison. The first track within HASOC has continued work from 2019 and provided a testbed of Twitter posts for Hindi, German and English. The second track within HASOC has created test resources for Tamil and Malayalam in native and Latin script. Posts were extracted mainly from Youtube and Twitter. Both tracks have attracted much interest and over 40 research groups have participated as well as described their approaches in papers. In this overview, we present the tasks, the data and the main results.

References

Ajees A P. 2020. Ajees@HASOC-Dravidian-CodeMix-FIRE2020. In FIRE (Working Notes). CEUR.Google Scholar
Gaurav Arora. 2020. Gauravarora@HASOC-Dravidian-CodeMix- FIRE2020: Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection. In FIRE (Working Notes). CEUR.Google Scholar
Nitin Nikamath Balaji and B Bharathi. 2020. SSNCSE-NLP@HASOC-Dravidian-CodeMix- FIRE2020: Offensive Language Identification on Multilingual Code Mixing Text. In FIRE (Working Notes). CEUR.Google Scholar
Arup Baruah, Kaushik Amar Das, Ferdous Ahmed Barbhuiya, and Kuntal Dey. 2020. IIITG-ADBU@HASOC-Dravidian-CodeMix-FIRE2020: Offensive Content Detection in Code-Mixed Dravidian Text. In FIRE (Working Notes). CEUR.Google Scholar
Bharathi Raja Chakravarthi, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, and John Philip McCrae. 2020. A Sentiment Analysis Dataset for Code-Mixed Malayalam-English. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL). European Language Resources association, Marseille, France, 177–184. https://www.aclweb.org/anthology/2020.sltu-1.25Google Scholar
Bharathi Raja Chakravarthi, Anand Kumar M, John P. McCrae, B. Premjith, K.P. Soman, and Thomas Mandl. 2020. Overview of the track on HASOC-Offensive Language Identification-DravidianCodeMix. In FIRE (Working Notes). CEUR.Google Scholar
Bharathi Raja Chakravarthi, Vigneshwaran Muralidaran, Ruba Priyadharshini, and John Philip McCrae. 2020. Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL). European Language Resources association, Marseille, France, 202–210. https://www.aclweb.org/anthology/2020.sltu-1.28Google Scholar
Kunjie Dong. 2020. YUN@HASOC-Dravidian-CodeMix-FIRE2020: A Multi-component Sentiment Analysis Model for Offensive Language Identification. In FIRE (Working Notes). CEUR.Google Scholar
Tochukwu Ezike and Manikandan Sivanesan. 2020. Chrestotes at HASOC 2020: Bert Fine-tuning for the Identification of Hate Speech and Offensive Language in Tweets. In FIRE (Working Notes). CEUR.Google Scholar
Ritesh Kumar, Bornini Lahiri, Atul Kr. Ojha, and Akanksha Bansal. 2020. ComMA@FIRE 2020: Exploring Multilingual Joint Training across different Classification Tasks. In FIRE (Working Notes). CEUR.Google Scholar
Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, and Marcos Zampieri. 2018. Benchmarking Aggression Identification in Social Media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). Association for Computational Linguistics, Santa Fe, New Mexico, USA, 1–11. https://www.aclweb.org/anthology/W18-4401Google Scholar
Sunil Kumar, Abhinav Saumya, and Jyoti Prakash Singh. 2020. NITP-AINLP@HASOC-Dravidian-CodeMix-FIRE2020: A Machine Learning Approach to Identify Offensive Languages from Dravidian Code-Mixed Text. In FIRE (Working Notes). CEUR.Google Scholar
Thomas Mandl, Sandip Modha, Prasenjit Majumder, Daksh Patel, Mohana Dave, Chintak Mandlia, and Aditya Patel. 2019. Overview of the HASOC track at FIRE 2019: Hate speech and offensive content identification in Indo-European languages. In Proceedings of the 11th Forum for Information Retrieval Evaluation. CEUR, 14–17. http://ceur-ws.org/Vol-2517/T3-1.pdfGoogle ScholarDigital Library
Thomas Mandl, Sandip Modha, Gautam Kishore Shahi, Amit Kumar Jaiswal, Durgesh Nandini, Daksh Patel, Prasenjit Majumder, and Johannes Schäfer. 2020. Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive Content Identification in Indo-European Languages. In FIRE (Working Notes). CEUR.Google Scholar
Ankit Kumar Mishra, Sunil Saumya, and Abhinav Kumar. 2020. IIIT_DWD@HASOC 2020: Identifying offensive content in multitask Indo-European languages. In FIRE (Working Notes). CEUR.Google Scholar
Veena P V, Praveena Ramanan, and Remmiya Devi G. 2020. CENMates@HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification on Code-mixed Social Media Comments. In FIRE (Working Notes). CEUR.Google Scholar
Varsha Pathak, Manish Joshi, Prasad Joshi, Monica Mundada, and Tanmay Joshi. 2020. KBCNMUJAL@HASOC-Dravidian-CodeMix-FIRE2020: Using Machine Learning for Detection of Hate Speech and Offensive Codemix Social Media text. In FIRE (Working Notes). CEUR.Google Scholar
Fabio Poletto, Valerio Basile, Manuela Sanguinetti, Cristina Bosco, and Viviana Patti. 2020. Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation(2020), 1–47. https://doi.org/10.1007/s10579-020-09502-8Google ScholarDigital Library
Roushan Raj, Shivangi Srivastava, and Sunil Saumya. 2020. NSIT & IIITDWD @ HASOC 2020: Deep learning model for hate-speech Identification in Indo-European languages. In FIRE (Working Notes). CEUR.Google Scholar
Tharindu Ranasinghe and Marcos Zampieri. 2020. WLV-RIT @ HASOC 2020: Offensive Language Identification in Code-switched Texts. In FIRE (Working Notes). CEUR.Google Scholar
Sara Renjit. 2020. CUSAT-NLP@HASOC-Dravidian-CodeMix-FIRE2020: Identifying Offensive Language from Manglish Tweets. In FIRE (Working Notes). CEUR.Google Scholar
Siva Sai and Yashvardhan Sharma. 2020. Siva@HASOC-Dravidian-CodeMix-FIRE-2020: Multilingual Offensive Speech Detection in Code-mixed and Romanized Text. In FIRE (Working Notes). CEUR.Google Scholar
Pankaj Singh and Pushpak Bhattacharyya. 2020. CFILT IIT Bombay@HASOC-Dravidian-CodeMix FIRE 2020: Assisting ensemble of transformers with random transliteration. In FIRE (Working Notes). CEUR. http://ceur-ws.org/Google Scholar
Michael Wiegand, Melanie Siegel, and Josef Ruppenhofer. 2018. Overview of the germeval 2018 shared task on the identification of offensive language. (2018). https://ids-pub.bsz-bw.de/files/8493/Wiegand_Siegel_Ruppenhofer_Overview_of_the_GermEval_2018.pdfGoogle Scholar
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. Predicting the Type and Target of Offensive Posts in Social Media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 1415–1420. https://doi.org/10.18653/v1/N19-1144Google ScholarCross Ref
Yueying Zhu and Xiaobing Zhou. 2020. Zyy1510@HASOC-Dravidian-CodeMix-FIRE2020: An Ensemble Model for Offensive Language Identification. In FIRE (Working Notes). CEUR.Google Scholar

Recommendations

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

With the increase in user generated content, particularly on social media networks, the amount of hate speech is also steadily increasing. So, there is a need to automatically detect such hateful content and curb the wrongful activities. While relevant ...
Read More
Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech
FIRE '21: Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation

The HASOC track is dedicated to the evaluation of technology for finding Offensive Language and Hate Speech. HASOC is creating a multilingual data corpus mainly for English and under-resourced languages(Hindi and Marathi). This paper presents one HASOC ...
Read More
Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages
FIRE '22: Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation

In recent years, the spread of online offensive content has become of great concern, motivating researchers to develop robust systems capable of identifying such content automatically. To carry out a fair evaluation of these systems, several ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

FIRE '20: Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation
December 2020
70 pages
ISBN:9781450389785
DOI:10.1145/3441501

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 January 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Hate speech
datasets
deep learning
evaluation
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate19of64submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 58
  Total Citations
  View Citations
- 671
  Total Downloads
- Downloads (Last 12 months)150
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German

FIRE '20: Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation

ABSTRACT

References

Cited By

Recommendations

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech

Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German

FIRE '20: Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation

ABSTRACT

References

Cited By

Recommendations

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech

Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media