Abstract
Patronizing and Condescending Language (PCL) is a form of implicitly toxic speech aimed at vulnerable groups with the potential to cause them long-term harm. As an emerging field of toxicity detection, it still lacks high-quality annotated corpora (especially in the Chinese field). Existing PCL datasets lack fine-grained annotation of toxicity level, resulting in a loss of edge information. In this paper, we make the first attempt at fine-grained condescending detection in Chinese. First, we propose CondescendCN Frame, a hierarchical framework for fine-grained condescending detection. On this basis, we introduce CCPC, a hierarchical Chinese corpus for PCL, with 11k structured annotations of social media comments from Sina Weibo and Zhihu. We find that adding toxicity strength (TS) can effectively improve the detection ability of PCL and demonstrate that the trained model still retains decent detection capabilities after being migrated to a larger variety of media data (over 120k).Due to the subjective ambiguity of PCL, more contextual information and subject knowledge expansion are critically required for this field.
Similar content being viewed by others
Notes
- 1.
Our dataset and code are available on https://github.com/dut-laowang/CCPC.
- 2.
- 3.
- 4.
References
Bell, K.M.: Raising Africa?: celebrity and the rhetoric of the white saviour. PORTAL: J. Multi. Int. Stud. 10(1), 1–24 (2013)
Bussone, A., Stumpf, S., O’Sullivan, D.: The role of explanations on trust and reliance in clinical decision support systems. In: 2015 International Conference on Healthcare Informatics, pp. 160–169. IEEE (2015)
Caselli, T., Basile, V., Mitrović, J., Granitzer, M.: Hatebert: Retraining BERT for abusive language detection in english. arXiv preprint arXiv:2010.12472 (2020)
Dixon, L., Li, J., Sorensen, J., Thain, N., Vasserman, L.: Measuring and mitigating unintended bias in text classification. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 67–73 (2018)
Fortuna, P., da Silva, J.R., Wanner, L., Nunes, S., et al.: A hierarchically-labeled Portuguese hate speech dataset. In: Proceedings of the Third Workshop on Abusive Language Online, pp. 94–104 (2019)
Huckin, T.: Textual silence and the discourse of homelessness. Discourse Society 13(3), 347–372 (2002)
Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: Hatexplain: a benchmark dataset for explainable hate speech detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 35, pp. 14867–14875 (2021)
Ng, S.H.: Language-based discrimination: blatant and subtle forms. J. Lang. Soc. Psychol. 26(2), 106–122 (2007)
Parekh, P., Patel, H.: Toxic comment tools: a case study. Int. J. Adv. Res. Comput. Sci. 8(5) (2017)
Pérez-Almendros, C., Espinosa-Anke, L., Schockaert, S.: Don’t patronize me! an annotated dataset with patronizing and condescending language towards vulnerable communities. arXiv preprint arXiv:2011.08320 (2020)
Price, I., et al.: Six attributes of unhealthy conversation. arXiv preprint arXiv:2010.07410 (2020)
Spertus, E.: Smokey: Automatic recognition of hostile messages. In: AAAAI/IAAI, pp. 1058–1065 (1997)
Straubhaar, R.: The stark reality of the ‘white saviour’complex and the need for critical consciousness: a document analysis of the early journals of a freirean educator. Compare: J. Comparative Int. Educ. 45(3), 381–400 (2015)
Van Aken, B., Risch, J., Krestel, R., Löser, A.: Challenges for toxic comment classification: An in-depth error analysis. arXiv preprint arXiv:1809.07572 (2018)
Wang, Z., Potts, C.: Talkdown: A corpus for condescension detection in context. arXiv preprint arXiv:1909.11272 (2019)
Wong, G., Derthick, A.O., David, E., Saw, A., Okazaki, S.: The what, the why, and the how: a review of racial microaggressions research in psychology. Race Soc. Probl. 6, 181–200 (2014)
Xu, J.: Xu at semeval-2022 task 4: Pre-BERT neural network methods vs post-BERT Roberta approach for patronizing and condescending language detection. arXiv preprint arXiv:2211.06874 (2022)
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). arXiv preprint arXiv:1903.08983 (2019)
Zhou, J., et al.: Towards identifying social bias in dialog systems: Frame, datasets, and benchmarks. arXiv preprint arXiv:2202.08011 (2022)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics, pp. 159–174 (1977)
Lu, J., Xu, B., Zhang, X., Min, C., Yang, L., Lin, H.: Facilitating fine-grained detection of Chinese toxic language: Hierarchical taxonomy, resources, and benchmarks (2023)
Pérez-Almendros, C., Anke, L.E., Schockaert, S.: Pre-training language models for identifying patronizing and condescending language: an analysis. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 3902–3911 (2022)
Lu, J., et al.: Guts at semeval-2022 task 4: Adversarial training and balancing methods for patronizing and condescending language detection. In: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pp. 432–437 (2022)
Min, C., et al.: Finding hate speech with auxiliary emotion detection from self-training multi-label learning perspective. Inform. Fusion 96, 214–223 (2023)
Lu, J., et al.: Hate speech detection via dual contrastive learning. Speech, and Language Processing. IEEE/ACM Trans. Audio (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, H., Li, M., Lu, J., Yang, L., Xia, H., Lin, H. (2023). CCPC: A Hierarchical Chinese Corpus for Patronizing and Condescending Language Detection. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14303. Springer, Cham. https://doi.org/10.1007/978-3-031-44696-2_50
Download citation
DOI: https://doi.org/10.1007/978-3-031-44696-2_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44695-5
Online ISBN: 978-3-031-44696-2
eBook Packages: Computer ScienceComputer Science (R0)