FastThaiCaps: A Transformer Based Capsule Network for Hate Speech Detection in Thai Language

Maity, Krishanu; Bhattacharya, Shaubhik; Saha, Sriparna; Janoai, Suwika; Pasupa, Kitsuchart

doi:10.1007/978-3-031-30108-7_36

Krishanu Maity¹²,
Shaubhik Bhattacharya¹²,
Sriparna Saha¹²,
Suwika Janoai¹³ &
…
Kitsuchart Pasupa¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13624))

Included in the following conference series:

International Conference on Neural Information Processing

586 Accesses
1 Citations

Abstract

The advent of technology has led to people sharing their views openly like never before. Parallelly, cyberbullying and hate speech content have also increased as a side effect that is potentially hazardous to society. While plenty of research is going on to detect online hate speech in English, there is very little research on the Thai language. To investigate how noisy Thai posts can be handled effectively, in this work, we have developed a two-channel deep learning model FastThaiCaps based on BERT and FastText embedding along with a capsule network. The input to one channel is the BERT language model, and that to the other is the pre-trained FastText embedding. Our model has been evaluated on a benchmark Thai dataset categorized into four categories, i.e., peace speech, neutral speech, level-1 hate speech, and level-2 hate speech. Experiments show that FastThaiCaps outperforms state-of-the-art methods by up to 3.11% in terms F1 score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Chan, T.K., Cheung, C.M., Wong, R.Y.: Cyberbullying on social networking sites: the crime opportunity and affordance perspectives. J. Manage. Inf. Syst. 36(2), 574–609 (2019)
Article Google Scholar
Choudhury, M., Saraf, R., Jain, V., Mukherjee, A., Sarkar, S., Basu, A.: Investigation and modeling of the structure of texting language. Int. J. Doc. Anal. Recogn. (IJDAR) 10(3–4), 157–174 (2007)
Article Google Scholar
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 11, pp. 512–515 (2017)
Google Scholar
Del Vigna12, F., Cimino23, A., Dell’Orletta, F., Petrocchi, M., Tesconi, M.: Hate me, hate me not: hate speech detection on facebook. In: Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), pp. 86–95 (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dinakar, K., Reichart, R., Lieberman, H.: Modeling the detection of textual cyberbullying. In: Proceedings of the International Conference on Weblog and Social Media 2011. Citeseer (2011)
Google Scholar
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893 (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Ibrohim, M.O., Budi, I.: Multi-label hate speech and abusive language detection in Indonesian twitter. In: Proceedings of the Third Workshop on Abusive Language Online, pp. 46–57 (2019)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196. PMLR (2014)
Google Scholar
Lowphansirikul, L., Polpanumas, C., Jantrakulchai, N., Nutanong, S.: Wangchanberta: pretraining transformer-based Thai language models. arXiv preprint arXiv:2101.09635 (2021)
Maity, K., Jha, P., Saha, S., Bhattacharyya, P.: A multitask framework for sentiment, emotion and sarcasm aware cyberbullying detection from multi-modal code-mixed memes. In: Amigó, E., Castells, P., Gonzalo, J., Carterette, B., Culpepper, J.S., Kazai, G. (eds.) SIGIR 2022: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022, pp. 1739–1749. ACM (2022). https://doi.org/10.1145/3477495.3531925
Maity, K., Kumar, A., Saha, S.: A multi-task multi-modal framework for sentiment and emotion aided cyberbully detection. In: IEEE Internet Computing (2022)
Google Scholar
Maity, K., Saha, S.: BERT-capsule model for cyberbullying detection in code-mixed Indian languages. In: Métais, E., Meziane, F., Horacek, H., Kapetanios, E. (eds.) NLDB 2021. LNCS, vol. 12801, pp. 147–155. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80599-9_13
Chapter Google Scholar
Maity, K., Saha, S.: A multi-task model for sentiment aided cyberbullying detection in code-mixed Indian languages. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. LNCS, vol. 13111, pp. 440–451. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92273-3_36
Chapter Google Scholar
Maity, K., Saha, S., Bhattacharyya, P.: Emoji, sentiment and emotion aided cyberbullying detection in hinglish. In: IEEE Transactions on Computational Social Systems (2022)
Google Scholar
Nockleby, J.T.: Hate speech in context: the case of verbal threats. Buff. L. Rev. 42, 653 (1994)
Google Scholar
Pasupa, K., Karnbanjob, W., Aksornsiri, M.: Hate speech detection in Thai social media with ordinal-imbalanced text classification. In: Proceedings of the 19th International Joint Conference on Computer Science and Software Engineering (JCSSE 2022), 22–25 June 2022, Bangkok, Thailand, pp. 1–6 (2022)
Google Scholar
Paul, S., Saha, S.: Cyberbert: Bert for cyberbullying identification. Multimedia Syst. 1–8 (2020)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Reynolds, K., Kontostathis, A., Edwards, L.: Using machine learning to detect cyberbullying. In: 2011 10th International Conference on Machine Learning and Applications and Workshops, vol. 2, pp. 241–244. IEEE (2011)
Google Scholar
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. arXiv preprint arXiv:1710.09829 (2017)
Simanjuntak, D.A., Ipung, H.P., Nugroho, A.S., et al.: Text classification techniques used to faciliate cyber terrorism investigation. In: 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, pp. 198–200. IEEE (2010)
Google Scholar
Wanasukapunt, R., Phimoltares, S.: Classification of abusive thai language content in social media using deep learning. In: 2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 1–6. IEEE (2021)
Google Scholar
Watanabe, H., Bouazizi, M., Ohtsuki, T.: Hate speech on twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6, 13825–13835 (2018)
Article Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar

Download references

Acknowledgement

This work was supported by the Ministry of External Affairs (MEA) and the Department of Science & Technology (DST), India, under the ASEAN-India Collaborative R &D Scheme. The Authors also would like to acknowledge the support of Ministry of Home Affairs (MHA), India for conducting this research.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, India
Krishanu Maity, Shaubhik Bhattacharya & Sriparna Saha
School of Information Technology, King Mongkut’s Institute of Technology, Ladkrabang, Bangkok, 10520, Thailand
Suwika Janoai & Kitsuchart Pasupa

Authors

Krishanu Maity
View author publications
You can also search for this author in PubMed Google Scholar
Shaubhik Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Sriparna Saha
View author publications
You can also search for this author in PubMed Google Scholar
Suwika Janoai
View author publications
You can also search for this author in PubMed Google Scholar
Kitsuchart Pasupa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krishanu Maity .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maity, K., Bhattacharya, S., Saha, S., Janoai, S., Pasupa, K. (2023). FastThaiCaps: A Transformer Based Capsule Network for Hate Speech Detection in Thai Language. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13624. Springer, Cham. https://doi.org/10.1007/978-3-031-30108-7_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-30108-7_36
Published: 13 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30107-0
Online ISBN: 978-3-031-30108-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

FastThaiCaps: A Transformer Based Capsule Network for Hate Speech Detection in Thai Language