skip to main content
10.1145/3626641.3626925acmotherconferencesArticle/Chapter ViewAbstractPublication PagessietConference Proceedingsconference-collections
research-article

Comparison of Deep Learning Methods in Detecting Hate Speech in Indonesian Tweets

Authors Info & Claims
Published:27 December 2023Publication History

ABSTRACT

Hate speech has negative effects on both the targeted victims and the listeners. The dissemination of hate speech can occur not only physically or verbally, but also in writing on social media. The emergence of hate speech on social media platforms can be difficult to identify in written communication. Currently, hate speech detection relies on machine learning. This study generates a vector representation of words using three pre-trained word insertion models: Global Vectors (GloVe), FastText, and Bidirectional Encoder Representations from Transformers (BERT). Synthetic Minority Oversampling Technique (SMOTE) and Random Over Sampling (ROS) were utilized as balancing methods to rectify data imbalance between classes. In addition, three distinct deep learning architectures were used to identify sentence-level hate speech in Indonesian tweets: Bidirectional Long Sort-Term Memory (BiLSTM), Convolution Neural Network (CNN), and Recurrent Neural Network (RNN). The dataset was collected by crawling the data via the Twitter API. After data underwent preprocessing, characteristics were extracted. Based on experimental results, classifiers employing RNN and BERT embedding and utilizing SMOTE produced the most accurate results (95.5%).

References

  1. Aggarwal, A. 2021. Two-Way Feature Extraction Using Sequential and Multimodal Approach for Hateful Meme Classification. Complexity. 2021, (2021). DOI:https://doi.org/10.1155/2021/5510253.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ali Shah, S.M. 2021. GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models. Computers in Biology and Medicine. 131, (Apr. 2021). DOI:https://doi.org/10.1016/j.compbiomed.2021.104259.Google ScholarGoogle ScholarCross RefCross Ref
  3. Ariwibowo, S. 2022. Hate Speech Text Classification Using Long Short-Term Memory (LSTM). ICOSNIKOM 2022 - 2022 IEEE International Conference of Computer Science and Information Technology: Boundary Free: Preparing Indonesia for Metaverse Society (2022).Google ScholarGoogle ScholarCross RefCross Ref
  4. Asti, A.D. 2021. Multi-label Classification for Hate Speech and Abusive Language in Indonesian-Local Languages. 2021 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021 (2021).Google ScholarGoogle Scholar
  5. Bojanowski, P. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics. 5, (Dec. 2017), 135–146. DOI:https://doi.org/10.1162/tacl_a_00051.Google ScholarGoogle ScholarCross RefCross Ref
  6. D'Sa, A.G. 2020. BERT and fastText Embeddings for Automatic Detection of Toxic Speech. 2020 International Multi-Conference on: “Organization of Knowledge and Advanced Technologies” (OCTA) (Feb. 2020), 1–5.Google ScholarGoogle Scholar
  7. Hana, K.M. 2020. Multi-label Classification of Indonesian Hate Speech on Twitter Using Support Vector Machines. 2020 International Conference on Data Science and Its Applications (ICoDSA) (Aug. 2020), 1–7.Google ScholarGoogle Scholar
  8. Hasanah, N.A. 2021. Identifying degree-of-concern on covid-19 topics with text classification of twitters. Register: Jurnal Ilmiah Teknologi Sistem Informasi. 7, 1 (2021), 50–62. DOI:https://doi.org/10.26594/register.v7i1.2234.Google ScholarGoogle ScholarCross RefCross Ref
  9. Joulin, A. Bag of Tricks for Efficient Text Classification. the Association for Computational Linguistics. 2, 427–431. DOI:https://doi.org/https://doi.org/10.48550/arXiv.1607.01759.Google ScholarGoogle ScholarCross RefCross Ref
  10. Khasanah, I.N. 2021. Sentiment Classification Using fastText Embedding and Deep Learning Model. Procedia CIRP (2021), 343–350.Google ScholarGoogle Scholar
  11. Lim, E. 2019. Stance Classification Post Kesehatan di Media Sosial Dengan FastText Embedding dan Deep Learning. Journal of Intelligent System and Computation. 1, 2 (Dec. 2019), 65–73. DOI:https://doi.org/10.52985/insyst.v1i2.86.Google ScholarGoogle ScholarCross RefCross Ref
  12. Luthfi, E.T. 2021. Enhancing the Takhrij Al-Hadith based on Contextual Similarity using BERT Embeddings. International Journal of Advanced Computer Science and Applications. 12, 11 (2021), 2021. DOI:https://doi.org/10.14569/IJACSA.2021.0121133.Google ScholarGoogle ScholarCross RefCross Ref
  13. Mossie, Z. and Wang, J.H. 2020. Vulnerable community identification using hate speech detection on social media. Information Processing and Management. 57, 3 (2020), 102087. DOI:https://doi.org/10.1016/j.ipm.2019.102087.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Padurariu, C. and Breaban, M.E. 2019. Dealing with data imbalance in text classification. Procedia Computer Science (2019), 736–745.Google ScholarGoogle Scholar
  15. Saketh Aluru, S. 2020. Deep Learning Models for Multilingual Hate Speech Detection *.Google ScholarGoogle Scholar
  16. Sigurbergsson, G.I. and Derczynski, L. 2023. Offensive Language and Hate Speech Detection for Danish. Proceedings of the Twelfth Language Resources and Evaluation Conference (Aug. 2023).Google ScholarGoogle Scholar
  17. Sreelakshmi, K. 2020. Detection of Hate Speech Text in Hindi-English Code-mixed Data. Procedia Computer Science (2020), 737–744.Google ScholarGoogle Scholar
  18. SURYONO, R.R. and BUDI, I. 2020. P2P Lending Sentiment Analysis in Indonesian Online News. Proceedings of the Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019) (Paris, France, 2020).Google ScholarGoogle Scholar

Index Terms

  1. Comparison of Deep Learning Methods in Detecting Hate Speech in Indonesian Tweets
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              SIET '23: Proceedings of the 8th International Conference on Sustainable Information Engineering and Technology
              October 2023
              722 pages
              ISBN:9798400708503
              DOI:10.1145/3626641

              Copyright © 2023 ACM

              Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 27 December 2023

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited

              Acceptance Rates

              Overall Acceptance Rate45of57submissions,79%
            • Article Metrics

              • Downloads (Last 12 months)24
              • Downloads (Last 6 weeks)1

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format