research-article

Comparison of Deep Learning Methods in Detecting Hate Speech in Indonesian Tweets

Authors:
Dwija Wisnu Brata

Department of Informatics, Institut Teknologi Sepuluh Nopember, Indonesia and Department of Information System, Brawijaya University, Indonesia

Department of Informatics, Institut Teknologi Sepuluh Nopember, Indonesia and Department of Information System, Brawijaya University, Indonesia

0009-0006-1754-6466
View Profile

,
Arif Djunaidy

Department of Information System, Institut Teknologi Sepuluh Nopember, Indonesia

Department of Information System, Institut Teknologi Sepuluh Nopember, Indonesia

0000-0001-6078-5610
View Profile

,
Daniel Oranova Siahaan

Department of Informatics, Institut teknologi Sepuluh Nopember, Indonesia

Department of Informatics, Institut teknologi Sepuluh Nopember, Indonesia

0000-0001-6560-2975
View Profile

SIET '23: Proceedings of the 8th International Conference on Sustainable Information Engineering and TechnologyOctober 2023Pages 58–63https://doi.org/10.1145/3626641.3626925

Published:27 December 2023Publication History

SIET '23: Proceedings of the 8th International Conference on Sustainable Information Engineering and Technology

Pages 58–63

ABSTRACT

Hate speech has negative effects on both the targeted victims and the listeners. The dissemination of hate speech can occur not only physically or verbally, but also in writing on social media. The emergence of hate speech on social media platforms can be difficult to identify in written communication. Currently, hate speech detection relies on machine learning. This study generates a vector representation of words using three pre-trained word insertion models: Global Vectors (GloVe), FastText, and Bidirectional Encoder Representations from Transformers (BERT). Synthetic Minority Oversampling Technique (SMOTE) and Random Over Sampling (ROS) were utilized as balancing methods to rectify data imbalance between classes. In addition, three distinct deep learning architectures were used to identify sentence-level hate speech in Indonesian tweets: Bidirectional Long Sort-Term Memory (BiLSTM), Convolution Neural Network (CNN), and Recurrent Neural Network (RNN). The dataset was collected by crawling the data via the Twitter API. After data underwent preprocessing, characteristics were extracted. Based on experimental results, classifiers employing RNN and BERT embedding and utilizing SMOTE produced the most accurate results (95.5%).

References

Aggarwal, A. 2021. Two-Way Feature Extraction Using Sequential and Multimodal Approach for Hateful Meme Classification. Complexity. 2021, (2021). DOI:https://doi.org/10.1155/2021/5510253.Google ScholarDigital Library
Ali Shah, S.M. 2021. GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models. Computers in Biology and Medicine. 131, (Apr. 2021). DOI:https://doi.org/10.1016/j.compbiomed.2021.104259.Google ScholarCross Ref
Ariwibowo, S. 2022. Hate Speech Text Classification Using Long Short-Term Memory (LSTM). ICOSNIKOM 2022 - 2022 IEEE International Conference of Computer Science and Information Technology: Boundary Free: Preparing Indonesia for Metaverse Society (2022).Google ScholarCross Ref
Asti, A.D. 2021. Multi-label Classification for Hate Speech and Abusive Language in Indonesian-Local Languages. 2021 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021 (2021).Google Scholar
Bojanowski, P. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics. 5, (Dec. 2017), 135–146. DOI:https://doi.org/10.1162/tacl_a_00051.Google ScholarCross Ref
D'Sa, A.G. 2020. BERT and fastText Embeddings for Automatic Detection of Toxic Speech. 2020 International Multi-Conference on: “Organization of Knowledge and Advanced Technologies” (OCTA) (Feb. 2020), 1–5.Google Scholar
Hana, K.M. 2020. Multi-label Classification of Indonesian Hate Speech on Twitter Using Support Vector Machines. 2020 International Conference on Data Science and Its Applications (ICoDSA) (Aug. 2020), 1–7.Google Scholar
Hasanah, N.A. 2021. Identifying degree-of-concern on covid-19 topics with text classification of twitters. Register: Jurnal Ilmiah Teknologi Sistem Informasi. 7, 1 (2021), 50–62. DOI:https://doi.org/10.26594/register.v7i1.2234.Google ScholarCross Ref
Joulin, A. Bag of Tricks for Efficient Text Classification. the Association for Computational Linguistics. 2, 427–431. DOI:https://doi.org/https://doi.org/10.48550/arXiv.1607.01759.Google ScholarCross Ref
Khasanah, I.N. 2021. Sentiment Classification Using fastText Embedding and Deep Learning Model. Procedia CIRP (2021), 343–350.Google Scholar
Lim, E. 2019. Stance Classification Post Kesehatan di Media Sosial Dengan FastText Embedding dan Deep Learning. Journal of Intelligent System and Computation. 1, 2 (Dec. 2019), 65–73. DOI:https://doi.org/10.52985/insyst.v1i2.86.Google ScholarCross Ref
Luthfi, E.T. 2021. Enhancing the Takhrij Al-Hadith based on Contextual Similarity using BERT Embeddings. International Journal of Advanced Computer Science and Applications. 12, 11 (2021), 2021. DOI:https://doi.org/10.14569/IJACSA.2021.0121133.Google ScholarCross Ref
Mossie, Z. and Wang, J.H. 2020. Vulnerable community identification using hate speech detection on social media. Information Processing and Management. 57, 3 (2020), 102087. DOI:https://doi.org/10.1016/j.ipm.2019.102087.Google ScholarDigital Library
Padurariu, C. and Breaban, M.E. 2019. Dealing with data imbalance in text classification. Procedia Computer Science (2019), 736–745.Google Scholar
Saketh Aluru, S. 2020. Deep Learning Models for Multilingual Hate Speech Detection *.Google Scholar
Sigurbergsson, G.I. and Derczynski, L. 2023. Offensive Language and Hate Speech Detection for Danish. Proceedings of the Twelfth Language Resources and Evaluation Conference (Aug. 2023).Google Scholar
Sreelakshmi, K. 2020. Detection of Hate Speech Text in Hindi-English Code-mixed Data. Procedia Computer Science (2020), 737–744.Google Scholar
SURYONO, R.R. and BUDI, I. 2020. P2P Lending Sentiment Analysis in Indonesian Online News. Proceedings of the Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019) (Paris, France, 2020).Google Scholar

Index Terms

Comparison of Deep Learning Methods in Detecting Hate Speech in Indonesian Tweets

Index terms have been assigned to the content through auto-classification.

Recommendations

Hate Speech Identification using the Hate Codes for Indonesian Tweets
DSIT 2019: Proceedings of the 2019 2nd International Conference on Data Science and Information Technology

The hate speech has become the major source of negativity spread in all over the social media. As the social media becomes aware of this issue, they gradually build several new regulations to handle the spread of hate speech e.g. by automatically ...
Read More
Detection of hate speech in Arabic tweets using deep learning
Abstract
Nowadays, people are communicating through social networks everywhere. However, for whatever reason it is noticeable that verbal misbehaviors, such as hate speech is now propagated through the social networks. One of the most popular social ...
Read More
Hate speech and offensive language detection in Dravidian languages using deep ensemble framework
Abstract
Social networking platforms gained widespread popularity and are used for various activities like: promoting products, sharing news, achievements and many more. On the other hand, it is also used for spreading rumors, bullying people, ...
Highlights
- Proposed a weighted ensemble framework for hate and offensive code-mixed posts identification on social platforms.
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIET '23: Proceedings of the 8th International Conference on Sustainable Information Engineering and Technology
October 2023
722 pages
ISBN:9798400708503
DOI:10.1145/3626641
Editors:
Edita Rosana Widasari,
Putra Pandu Adikara
Copyright © 2023 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 December 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep Learning
Hate Speech
Imbalance Data
Indonesia Tweets
Word Embedding
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate45of57submissions,79%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 24
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Comparison of Deep Learning Methods in Detecting Hate Speech in Indonesian Tweets

SIET '23: Proceedings of the 8th International Conference on Sustainable Information Engineering and Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hate Speech Identification using the Hate Codes for Indonesian Tweets

Detection of hate speech in Arabic tweets using deep learning

Hate speech and offensive language detection in Dravidian languages using deep ensemble framework

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Comparison of Deep Learning Methods in Detecting Hate Speech in Indonesian Tweets

SIET '23: Proceedings of the 8th International Conference on Sustainable Information Engineering and Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hate Speech Identification using the Hate Codes for Indonesian Tweets

Detection of hate speech in Arabic tweets using deep learning

Hate speech and offensive language detection in Dravidian languages using deep ensemble framework

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media