A Preliminary Analysis of Offensive Language Detection Transferability from Social Media to Video Live Streaming Platforms

Gao, Zhiwei; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji

doi:10.1007/978-3-030-73113-7_11

Zhiwei Gao²³,
Shuntaro Yada²³,
Shoko Wakamiya²³ &
…
Eiji Aramaki²³

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1357))

Included in the following conference series:

Annual Conference of the Japanese Society for Artificial Intelligence

327 Accesses
11 Altmetric

Abstract

Given the growing popularity of online games and eSports, the young generation is increasingly enjoying its video live streaming service. Offensive conversations often appear against the streamer or audience in the streaming channel’s chatroom. This research aims to detect offensive language appearing in video live streaming chats. Focusing on Twitch, the most popular live streaming platform, we created a dataset for the task of detecting offensive language. We collected chat posts across four popular game titles with genre diversity (i.e., competitive, violent, peaceful). To make use of the similarity in offensive languages among social media, we adopt the state-of-the-art models trained over the offensive language on Twitter to our Twitch data (i.e., transfer learning). Our results show that transfer from social media to live streaming is effective. However, the similarity measures we proposed show less correlation on the transferability prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Alakrot, A., Murray, L., Nikolov, N.S.: Towards accurate detection of offensive language in online communication in arabic. Procedia Comput. Sci. 142, 315–320 (2018)
Article Google Scholar
Baziotis, C., Pelekis, N., Doulkeridis, C.: DataStories at SemEval-2017 task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 747–754 (2017)
Google Scholar
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 632–642 (2015)
Google Scholar
Chapman, G.: Cranks, fetishists and monomaniacs-flamers. New Repub. 212(15), 13–15 (1995)
Google Scholar
Cheng, J,, Bernstein, M., Danescu-Niculescu-Mizil, C., Leskovec, J.: Anyone can become a troll: causes of trolling behavior in online discussions. In: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW 2017, pp. 1217–1230 (2017)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Article Google Scholar
Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 1107–1116 (2017)
Google Scholar
Dai, X., Karimi, S., Hachey, B., Paris, C.: Using similarity measures to select pretraining data for NER. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 1460–1470 (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Guo, H., Pasunuru, R., Bansal, M.: Multi-source domain adaptation for text classification via distancenet-bandits. In: AAAI, pp. 7830–7838 (2020)
Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)
Google Scholar
Liu, P., Li, W., Zou, L. NULI at SemEval-2019 task 6: transfer learning for offensive language detection using bidirectional transformers. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 87–91 (2019a)
Google Scholar
Liu, Y., et al.: RoBERTa: a Robustly Optimized BERT Pretraining Approach. ArXiv abs/1907.11692 (2019b)
Google Scholar
Matsumoto, K., Ren, F., Matsuoka, M., Yoshida, M., Kita, K.: Slang feature extraction by analysing topic change on social media. CAAI Trans. Intell. Technol. 4(1), 64–71 (2019)
Article Google Scholar
Mishra, S., Diesner, J.: Semi-supervised named entity recognition in noisy-text. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), pp. 203–212 (2016)
Google Scholar
Nikolov, A., Radivchev, V.: Nikolov-radivchev at SemEval-2019 task 6: offensive tweet classification with BERT and ensembles. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 691–695 (2019)
Google Scholar
Rezvan, M., Shekarpour, S., Alshargi, F., Thirunarayan, K., Shalin, V.L., Sheth, A.: Analyzing and learning the language for different types of harassment. PLoS ONE 15(3), 1–22 (2020)
Article Google Scholar
Waseem, Z., Davidson, T., Warmsley, D., Weber, I.: Understanding abuse: A typology of abusive language detection subtasks. In: Proceedings of the First Workshop on Abusive Language Online, pp. 78–84 (2017)
Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. CoRR abs/1906.08237 (2019)
Google Scholar
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, pp. 1415–1420 (2019a)
Google Scholar
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: SemEval-2019 task 6: identifying and categorizing offensive language in social media (OffensEval). In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 75–86 (2019b)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS 2015, pp. 649–657 (2015)
Google Scholar

Download references

Acknowledgements

This study was supported in part by JSPS KAKENHI Grant Number JP19K20279 and Health and Labor Sciences Research Grant Number H30-shinkougyousei-shitei-004.

Author information

Authors and Affiliations

Nara Institute of Science and Technology, Ikoma, Japan
Zhiwei Gao, Shuntaro Yada, Shoko Wakamiya & Eiji Aramaki

Authors

Zhiwei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Shuntaro Yada
View author publications
You can also search for this author in PubMed Google Scholar
Shoko Wakamiya
View author publications
You can also search for this author in PubMed Google Scholar
Eiji Aramaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiwei Gao .

Editor information

Editors and Affiliations

Kansai University, Suita, Osaka, Japan
Katsutoshi Yada
Department of Applied Computer Science, Tokyo Polytechnic University, Atsugi, Kanagawa, Japan
Daisuke Katagami
Graduate School of System Design, Tokyo Metropolitan University, Hino, Tokyo, Japan
Yasufumi Takama
Department of Social Informatics, Kyoto University, Kyoto, Japan
Takayuki Ito
Division of Behavioral Science, Faculty of Letters, Chiba University, Chiba, Chiba, Japan
Akinori Abe
Department of Computer Science, Graduate School of System Design, Tokyo Metropolitan University, Hino, Tokyo, Japan
Eri Sato-Shimokawara
Mathematics and Informatics Center, The University of Tokyo, Tokyo, Japan
Junichiro Mori
Graduate School of Economics, Osaka University, Toyonaka, Osaka, Japan
Naohiro Matsumura
Department of Intelligence Science and Technology, Kyoto University, Kyoto, Japan
Hisashi Kashima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Z., Yada, S., Wakamiya, S., Aramaki, E. (2021). A Preliminary Analysis of Offensive Language Detection Transferability from Social Media to Video Live Streaming Platforms. In: Yada, K., et al. Advances in Artificial Intelligence. JSAI 2020. Advances in Intelligent Systems and Computing, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-73113-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-73113-7_11
Published: 23 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73112-0
Online ISBN: 978-3-030-73113-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Preliminary Analysis of Offensive Language Detection Transferability from Social Media to Video Live Streaming Platforms