Abstract
Given the growing popularity of online games and eSports, the young generation is increasingly enjoying its video live streaming service. Offensive conversations often appear against the streamer or audience in the streaming channel’s chatroom. This research aims to detect offensive language appearing in video live streaming chats. Focusing on Twitch, the most popular live streaming platform, we created a dataset for the task of detecting offensive language. We collected chat posts across four popular game titles with genre diversity (i.e., competitive, violent, peaceful). To make use of the similarity in offensive languages among social media, we adopt the state-of-the-art models trained over the offensive language on Twitter to our Twitch data (i.e., transfer learning). Our results show that transfer from social media to live streaming is effective. However, the similarity measures we proposed show less correlation on the transferability prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alakrot, A., Murray, L., Nikolov, N.S.: Towards accurate detection of offensive language in online communication in arabic. Procedia Comput. Sci. 142, 315–320 (2018)
Baziotis, C., Pelekis, N., Doulkeridis, C.: DataStories at SemEval-2017 task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 747–754 (2017)
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 632–642 (2015)
Chapman, G.: Cranks, fetishists and monomaniacs-flamers. New Repub. 212(15), 13–15 (1995)
Cheng, J,, Bernstein, M., Danescu-Niculescu-Mizil, C., Leskovec, J.: Anyone can become a troll: causes of trolling behavior in online discussions. In: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW 2017, pp. 1217–1230 (2017)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 1107–1116 (2017)
Dai, X., Karimi, S., Hachey, B., Paris, C.: Using similarity measures to select pretraining data for NER. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 1460–1470 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Guo, H., Pasunuru, R., Bansal, M.: Multi-source domain adaptation for text classification via distancenet-bandits. In: AAAI, pp. 7830–7838 (2020)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)
Liu, P., Li, W., Zou, L. NULI at SemEval-2019 task 6: transfer learning for offensive language detection using bidirectional transformers. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 87–91 (2019a)
Liu, Y., et al.: RoBERTa: a Robustly Optimized BERT Pretraining Approach. ArXiv abs/1907.11692 (2019b)
Matsumoto, K., Ren, F., Matsuoka, M., Yoshida, M., Kita, K.: Slang feature extraction by analysing topic change on social media. CAAI Trans. Intell. Technol. 4(1), 64–71 (2019)
Mishra, S., Diesner, J.: Semi-supervised named entity recognition in noisy-text. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), pp. 203–212 (2016)
Nikolov, A., Radivchev, V.: Nikolov-radivchev at SemEval-2019 task 6: offensive tweet classification with BERT and ensembles. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 691–695 (2019)
Rezvan, M., Shekarpour, S., Alshargi, F., Thirunarayan, K., Shalin, V.L., Sheth, A.: Analyzing and learning the language for different types of harassment. PLoS ONE 15(3), 1–22 (2020)
Waseem, Z., Davidson, T., Warmsley, D., Weber, I.: Understanding abuse: A typology of abusive language detection subtasks. In: Proceedings of the First Workshop on Abusive Language Online, pp. 78–84 (2017)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. CoRR abs/1906.08237 (2019)
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, pp. 1415–1420 (2019a)
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: SemEval-2019 task 6: identifying and categorizing offensive language in social media (OffensEval). In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 75–86 (2019b)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS 2015, pp. 649–657 (2015)
Acknowledgements
This study was supported in part by JSPS KAKENHI Grant Number JP19K20279 and Health and Labor Sciences Research Grant Number H30-shinkougyousei-shitei-004.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gao, Z., Yada, S., Wakamiya, S., Aramaki, E. (2021). A Preliminary Analysis of Offensive Language Detection Transferability from Social Media to Video Live Streaming Platforms. In: Yada, K., et al. Advances in Artificial Intelligence. JSAI 2020. Advances in Intelligent Systems and Computing, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-73113-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-73113-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73112-0
Online ISBN: 978-3-030-73113-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)