Skip to main content

A Preliminary Analysis of Offensive Language Detection Transferability from Social Media to Video Live Streaming Platforms

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (JSAI 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1357))

Included in the following conference series:

Abstract

Given the growing popularity of online games and eSports, the young generation is increasingly enjoying its video live streaming service. Offensive conversations often appear against the streamer or audience in the streaming channel’s chatroom. This research aims to detect offensive language appearing in video live streaming chats. Focusing on Twitch, the most popular live streaming platform, we created a dataset for the task of detecting offensive language. We collected chat posts across four popular game titles with genre diversity (i.e., competitive, violent, peaceful). To make use of the similarity in offensive languages among social media, we adopt the state-of-the-art models trained over the offensive language on Twitter to our Twitch data (i.e., transfer learning). Our results show that transfer from social media to live streaming is effective. However, the similarity measures we proposed show less correlation on the transferability prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.twitch.tv.

  2. 2.

    https://en.wikipedia.org/wiki/Twitch_(service).

  3. 3.

    https://www.metacritic.com/.

  4. 4.

    https://www.gamepressure.com/.

  5. 5.

    https://github.com/carpedm20/emoji.

References

  • Alakrot, A., Murray, L., Nikolov, N.S.: Towards accurate detection of offensive language in online communication in arabic. Procedia Comput. Sci. 142, 315–320 (2018)

    Article  Google Scholar 

  • Baziotis, C., Pelekis, N., Doulkeridis, C.: DataStories at SemEval-2017 task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 747–754 (2017)

    Google Scholar 

  • Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 632–642 (2015)

    Google Scholar 

  • Chapman, G.: Cranks, fetishists and monomaniacs-flamers. New Repub. 212(15), 13–15 (1995)

    Google Scholar 

  • Cheng, J,, Bernstein, M., Danescu-Niculescu-Mizil, C., Leskovec, J.: Anyone can become a troll: causes of trolling behavior in online discussions. In: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW 2017, pp. 1217–1230 (2017)

    Google Scholar 

  • Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)

    Article  Google Scholar 

  • Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 1107–1116 (2017)

    Google Scholar 

  • Dai, X., Karimi, S., Hachey, B., Paris, C.: Using similarity measures to select pretraining data for NER. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 1460–1470 (2019)

    Google Scholar 

  • Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)

    Google Scholar 

  • Guo, H., Pasunuru, R., Bansal, M.: Multi-source domain adaptation for text classification via distancenet-bandits. In: AAAI, pp. 7830–7838 (2020)

    Google Scholar 

  • Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431 (2017)

    Google Scholar 

  • Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)

    Google Scholar 

  • Liu, P., Li, W., Zou, L. NULI at SemEval-2019 task 6: transfer learning for offensive language detection using bidirectional transformers. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 87–91 (2019a)

    Google Scholar 

  • Liu, Y., et al.: RoBERTa: a Robustly Optimized BERT Pretraining Approach. ArXiv abs/1907.11692 (2019b)

    Google Scholar 

  • Matsumoto, K., Ren, F., Matsuoka, M., Yoshida, M., Kita, K.: Slang feature extraction by analysing topic change on social media. CAAI Trans. Intell. Technol. 4(1), 64–71 (2019)

    Article  Google Scholar 

  • Mishra, S., Diesner, J.: Semi-supervised named entity recognition in noisy-text. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), pp. 203–212 (2016)

    Google Scholar 

  • Nikolov, A., Radivchev, V.: Nikolov-radivchev at SemEval-2019 task 6: offensive tweet classification with BERT and ensembles. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 691–695 (2019)

    Google Scholar 

  • Rezvan, M., Shekarpour, S., Alshargi, F., Thirunarayan, K., Shalin, V.L., Sheth, A.: Analyzing and learning the language for different types of harassment. PLoS ONE 15(3), 1–22 (2020)

    Article  Google Scholar 

  • Waseem, Z., Davidson, T., Warmsley, D., Weber, I.: Understanding abuse: A typology of abusive language detection subtasks. In: Proceedings of the First Workshop on Abusive Language Online, pp. 78–84 (2017)

    Google Scholar 

  • Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. CoRR abs/1906.08237 (2019)

    Google Scholar 

  • Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, pp. 1415–1420 (2019a)

    Google Scholar 

  • Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: SemEval-2019 task 6: identifying and categorizing offensive language in social media (OffensEval). In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 75–86 (2019b)

    Google Scholar 

  • Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS 2015, pp. 649–657 (2015)

    Google Scholar 

Download references

Acknowledgements

This study was supported in part by JSPS KAKENHI Grant Number JP19K20279 and Health and Labor Sciences Research Grant Number H30-shinkougyousei-shitei-004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwei Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gao, Z., Yada, S., Wakamiya, S., Aramaki, E. (2021). A Preliminary Analysis of Offensive Language Detection Transferability from Social Media to Video Live Streaming Platforms. In: Yada, K., et al. Advances in Artificial Intelligence. JSAI 2020. Advances in Intelligent Systems and Computing, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-73113-7_11

Download citation

Publish with us

Policies and ethics