Ex-ThaiHate: A Generative Multi-task Framework for Sentiment and Emotion Aware Hate Speech Detection with Explanation in Thai

Maity, Krishanu; Bhattacharya, Shaubhik; Phosit, Salisa; Kongsamlit, Sawarod; Saha, Sriparna; Pasupa, Kitsuchart

doi:10.1007/978-3-031-43427-3_9

Krishanu Maity¹³,
Shaubhik Bhattacharya¹³,
Salisa Phosit¹⁴,
Sawarod Kongsamlit¹⁴,
Sriparna Saha¹³ &
…
Kitsuchart Pasupa ORCID: orcid.org/0000-0001-8359-9888¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14174))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

881 Accesses

Abstract

Social media platforms have both positive and negative impacts on users in diverse societies. One of the adverse effects of social media platforms is the usage of hate and offensive language, which not only fosters prejudice but also harms the vulnerable. Additionally, a person’s sentiment and emotional state heavily influence the intended content of any social media post. Despite extensive research being conducted to detect online hate speech in English, there is a lack of similar studies on low-resource languages such as Thai. The recent enactment of laws like the “right to explanations” in the General Data Protection Regulation has stimulated the development of interpretable models rather than solely focusing on performance. Motivated by this, we created the first benchmark hate speech corpus, called Ex-ThaiHate, in the Thai language. Each post is annotated with four labels, namely hate, sentiment, emotion, and rationales (explainability), which specify the phrases that are responsible for annotating the post as hate. In order to investigate the effect of sentiment and emotional information on detecting hate speech posts, we propose a unified generative framework called GenX, which redefines this multi-task problem as a text-to-text generation task to simultaneously solve four tasks: hate-speech identification, rationale detection, sentiment, and emotion detection. Our extensive experiments demonstrate that GenX significantly outperforms all baselines and state-of-the-art models, thereby highlighting its effectiveness in detecting hate speech and identifying the rationales in low-resource languages. The code and dataset are available at https://github.com/dsmlr/Ex-ThaiHate.

Disclaimer: The article contains offensive text and profanity. This is due to the nature of the work and does not reflect any opinion or stance of the authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://pythainlp.github.io/docs/2.2/.

References

Agrawal, S., Awekar, A.: Deep learning for detecting cyberbullying across multiple social media platforms. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 141–153. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_11
Chapter Google Scholar
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Barrett, R., Cummings, R., Agichtein, E., Gabrilovich, E. (eds.) Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, April 3–7, 2017, pp. 759–760. ACM (2017). https://doi.org/10.1145/3041021.3054223
Balakrishnan, V., Khan, S., Arabnia, H.R.: Improving cyberbullying detection using twitter users’ psychological features and machine learning. Comput. Secur. 90, 101710 (2020). https://doi.org/10.1016/j.cose.2019.101710
Article Google Scholar
Camburu, O., Rocktäschel, T., Lukasiewicz, T., Blunsom, P.: e-snli: natural language inference with natural language explanations. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, Canada, pp. 9560–9572 (2018)
Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997). https://doi.org/10.1023/A:1007379606734
Article MathSciNet Google Scholar
Chan, T.K.H., Cheung, C.M.K., Wong, R.Y.M.: Cyberbullying on social networking sites: the crime opportunity and affordance perspectives. J. Manag. Inf. Syst. 36(2), 574–609 (2019). https://doi.org/10.1080/07421222.2019.1599500
Article Google Scholar
Crawshaw, M.: Multi-task learning with deep neural networks: a survey. CoRR abs/2009.09796 (2020)
Google Scholar
Dadvar, M., Trieschnigg, D., de Jong, F.: Experts and machines against bullies: a hybrid approach to detect cyberbullies. In: Sokolova, M., van Beek, P. (eds.) AI 2014. LNCS (LNAI), vol. 8436, pp. 275–281. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06483-3_25
Chapter Google Scholar
Dinakar, K., Reichart, R., Lieberman, H.: Modeling the detection of textual cyberbullying. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 5, pp. 11–17 (2021). https://doi.org/10.1609/icwsm.v5i3.14209
European Parliament and of the Council: Protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec. EC General Data Protection Regulation 679 (2016)
Google Scholar
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)
Article Google Scholar
Ghosh, S., Roy, S., Ekbal, A., Bhattacharyya, P.: CARES: CAuse recognition for emotion in suicide notes. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13186, pp. 128–136. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99739-7_15
Chapter Google Scholar
Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., Yang, G.: XAI - explainable artificial intelligence. Sci. Robot. 4(37) (2019). https://doi.org/10.1126/scirobotics.aay7120
Ibrohim, M.O., Budi, I.: Multi-label hate speech and abusive language detection in Indonesian Twitter. In: Proceedings of the Third Workshop on Abusive Language Online, Florence, Italy, pp. 46–57. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/W19-3506
Karim, M.R., et al.: Deephateexplainer: explainable hate speech detection in under-resourced Bengali language. In: 8th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2021, Porto, Portugal, October 6–9, 2021, pp. 1–10. IEEE (2021). https://doi.org/10.1109/DSAA53316.2021.9564230
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 5–10 July 2020, pp. 7871–7880. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.703
Liu, B., Lane, I.R.: Attention-based recurrent neural network models for joint intent detection and slot filling. In: Morgan, N. (ed.) Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, 8–12 September 2016, pp. 685–689. ISCA (2016). https://doi.org/10.21437/Interspeech.2016–1352
Maity, K., Bhattacharya, S., Saha, S., Janoai, S., Pasupa, K.: Fastthaicaps: a transformer based capsule network for hate speech detection in Thai language. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds.) ICONIP 2022, Part II. LNCS, vol. 13624, pp. 425–437. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-30108-7_36
Maity, K., Kumar, A., Saha, S.: A multitask multimodal framework for sentiment and emotion-aided cyberbullying detection. IEEE Internet Comput. 26(4), 68–78 (2022). https://doi.org/10.1109/MIC.2022.3158583
Article Google Scholar
Maity, K., Saha, S.: A multi-task model for sentiment aided cyberbullying detection in code-mixed Indian languages. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. LNCS, vol. 13111, pp. 440–451. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92273-3_36
Chapter Google Scholar
Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: Hatexplain: a benchmark dataset for explainable hate speech detection. CoRR abs/2012.10289 (2020)
Google Scholar
Meta: Community standards enforcement – hate speech. Meta Transparency Centre (2022), https://transparency.fb.com/data/community-standards-enforcement/hate-speech. Accessed 1 Apr 2023
Nockleby, J.T.: Hate speech in context: the case of verbal threats. Buffalo Law Rev. 42, 653–713 (1994)
Google Scholar
Panchendrarajan, R., Amaresan, A.: Bidirectional LSTM-CRF for named entity recognition. In: Politzer-Ahles, S., Hsu, Y., Huang, C., Yao, Y. (eds.) Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation, PACLIC 2018, Hong Kong, 1–3 December 2018. Association for Computational Linguistics (2018)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6–12, 2002, Philadelphia, PA, USA, pp. 311–318. ACL (2002). https://doi.org/10.3115/1073083.1073135
Pasupa, K., Karnbanjob, W., Aksornsiri, M.: Hate speech detection in Thai social media with ordinal-imbalanced text classification. In: 19th International Joint Conference on Computer Science and Software Engineering, JCSSE 2022, Bangkok, Thailand, June 22–25, 2022, pp. 1–6. IEEE (2022). https://doi.org/10.1109/JCSSE54890.2022.9836312
Paul, S., Saha, S.: Cyberbert: BERT for cyberbullying identification. Multimedia Syst. 28(6), 1897–1904 (2022). https://doi.org/10.1007/s00530-020-00710-4
Article Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1-140:67 (2020)
MathSciNet Google Scholar
Rajani, N.F., McCann, B., Xiong, C., Socher, R.: Explain yourself! leveraging language models for commonsense reasoning. CoRR abs/1906.02361 (2019)
Google Scholar
Reynolds, K., Kontostathis, A., Edwards, L.: Using machine learning to detect cyberbullying. In: 2011 10th International Conference on Machine Learning and Applications and Workshops, vol. 2, pp. 241–244 (2011). https://doi.org/10.1109/ICMLA.2011.152
Saha, T., Upadhyaya, A., Saha, S., Bhattacharyya, P.: A multitask multimodal ensemble model for sentiment- and emotion-aided tweet act classification. IEEE Trans. Comput. Soc. Syst. 9(2), 508–517 (2022). https://doi.org/10.1109/TCSS.2021.3088714
Article Google Scholar
Sancheti, A., Krishna, K., Srinivasan, B.V., Natarajan, A.: Reinforced rewards framework for text style transfer. In: Jose, J.M., et al. (eds.) ECIR 2020, Part I. LNCS, vol. 12035, pp. 545–560. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_36
Chapter Google Scholar
Singh, A., Saha, S., Hasanuzzaman, M., Dey, K.: Multitask learning for complaint identification and sentiment analysis. Cogn. Comput. 14(1), 212–227 (2022). https://doi.org/10.1007/s12559-021-09844-7
Article Google Scholar
Thepgumpanat, P., Naing, S., Tostevin, M.: Anti-myanmar hate speech flares in thailand over virus. Reuters (2020). https://www.reuters.com/article/us-health-coronavirus-thailand-myanmar-idUSKBN28Y0KS. Accessed 1 Apr 2023
Vigna, F.D., Cimino, A., Dell’Orletta, F., Petrocchi, M., Tesconi, M.: Hate me, hate me not: Hate speech detection on facebook. In: Armando, A., Baldoni, R., Focardi, R. (eds.) Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), Venice, Italy, 17–20 January 2017. CEUR Workshop Proceedings, vol. 1816, pp. 86–95. CEUR-WS.org (2017)
Google Scholar
Wanasukapunt, R., Phimoltares, S.: Classification of abusive Thai language content in social media using deep learning. In: 18th International Joint Conference on Computer Science and Software Engineering, JCSSE 2021, Lampang, Thailand, 30 June–2 July 2021, pp. 1–6. IEEE (2021). https://doi.org/10.1109/JCSSE53117.2021.9493829
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In: Proceedings of the Student Research Workshop, SRW@HLT-NAACL 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, 12–17 June 2016, pp. 88–93. The Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/n16-2013
Wu, S.: Emmental: a framework for building multimodal multi-task learning systems (2019)
Google Scholar
Zaidan, O., Eisner, J., Piatko, C.D.: Using “annotator rationales” to improve machine learning for text categorization. In: Sidner, C.L., Schultz, T., Stone, M., Zhai, C. (eds.) Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, April 22–27, 2007, Rochester, New York, USA, pp. 260–267. The Association for Computational Linguistics (2007), https://aclanthology.org/N07-1033/
Zhou, X., et al.: Hate speech detection based on sentiment knowledge sharing. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, 1–6 August 2021, pp. 7158–7166. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.acl-long.556

Download references

Acknowledgments

This work was supported by the Ministry of External Affairs (MEA) and the Department of Science & Technology (DST), India, under the ASEAN-India Collaborative R &D Scheme.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, 801103, India
Krishanu Maity, Shaubhik Bhattacharya & Sriparna Saha
School of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, 10520, Thailand
Salisa Phosit, Sawarod Kongsamlit & Kitsuchart Pasupa

Authors

Krishanu Maity
View author publications
You can also search for this author in PubMed Google Scholar
Shaubhik Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Salisa Phosit
View author publications
You can also search for this author in PubMed Google Scholar
Sawarod Kongsamlit
View author publications
You can also search for this author in PubMed Google Scholar
Sriparna Saha
View author publications
You can also search for this author in PubMed Google Scholar
Kitsuchart Pasupa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kitsuchart Pasupa .

Editor information

Editors and Affiliations

CENTAI, Turin, Italy
Gianmarco De Francisci Morales
NYU and Two Sigma, New York, NY, USA
Claudia Perlich
Netflix, Los Angeles, CA, USA
Natali Ruchansky
Telefonica Research, Barcelona, Spain
Nicolas Kourtellis
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maity, K., Bhattacharya, S., Phosit, S., Kongsamlit, S., Saha, S., Pasupa, K. (2023). Ex-ThaiHate: A Generative Multi-task Framework for Sentiment and Emotion Aware Hate Speech Detection with Explanation in Thai. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-43427-3_9
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43426-6
Online ISBN: 978-3-031-43427-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Ex-ThaiHate: A Generative Multi-task Framework for Sentiment and Emotion Aware Hate Speech Detection with Explanation in Thai