Automatic Extraction of Harmful Sentence Patterns with Application in Cyberbullying Detection

Ptaszynski, Michal; Masui, Fumito; Kimura, Yasutomo; Rzepka, Rafal; Araki, Kenji

doi:10.1007/978-3-319-93782-3_25

Michal Ptaszynski¹⁶,
Fumito Masui¹⁶,
Yasutomo Kimura¹⁷,
Rafal Rzepka¹⁸ &
…
Kenji Araki¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10930))

Included in the following conference series:

Language and Technology Conference

633 Accesses

Abstract

The problem of humiliating and slandering people through Internet, generally defined as cyberbullying (later: CB), has been recently noticed as a serious social problem disturbing mental health of Internet users. In Japan, to deal with the problem, members of Parent-Teacher Association (PTA) perform Internet Patrol – a voluntary work by reading through the whole Web contents to spot cyberbullying entries. To help PTA members we propose a novel method for automatic detection of malicious contents on the Internet. The method is based on a brute force search algorithm-inspired combinatorial approach to language modeling. The method automatically extracts sophisticated sentence patterns and uses them in classification. We tested the method on actual data containing cyberbullying provided by Human Rights Center. The results show our method outperformed previous methods. It is also more efficient as it requires minimal human effort.

Previous version of this paper appeared in: Proceedings of 7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC’15), pp. 370–375.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cyberbullying Detection Using BiLSTM Model

Detection of Cyberbullying in Social Media Texts Using Explainable Artificial Intelligence

Analysing Cyberbullying Using Natural Language Processing by Understanding Jargon in Social Media

Notes

1.
An organization composed of parents and school personnel.
2.
http://www.ncpc.org/cyberbullying.
3.
http://www.pref.mie.lg.jp/jinkenc/hp/.
4.
http://taku910.github.io/mecab/.
5.
https://www.google.com/events/policy/anti-harassmentpolicy.html.
6.
https://blog.twitter.com/2014/building-a-safer-twitter.
7.
http://www.dmarc.org/.

References

Belsey, B.: Cyberbullying: An Emerging Threat for the “Always On” Generation (2007). http://www.cyberbullying.ca/pdf/Cyberbullying_ Presentation_ Description.pdf
Fujii, Y., Ando, S., Ito, T.: Yūgai jōhō firutaringu no tame no 2-tango-kan no kyori oyobi kyōki jōhō ni yoru bunshō bunrui shuhō no teian (Developing a method based on 2-word co-occurence information for filtering harmful information). In: Proceedings of the 24th Annual Conference of The Japanese Society for Artificial Intelligence (JSAI2010), paper ID: 3D2-4, pp. 1–4 (2010). (in Japanese)
Google Scholar
Hatakeyama, S., Masui, F., Ptaszynski, M., Yamamoto, K.: Improving performance of cyberbullying detection method with double filtered point-wise mutual information. In: Demo Session of the 2015 ACM Symposium on Cloud Computing 2015 (ACM-SoCC 2015), Kohala Coast, Hawaii, 27–29 August 2015
Google Scholar
Hashimoto, H., Kinoshita, T., Harada, M.: Firutaringu no tame no ingo no yūgai goi kenshutsu kinō no imi kaiseki shisutemu SAGE e no kumikomi (Implementing a function for filtering harmful slang words into the semantic analysis system SAGE), IPSJ SIG Notes 2010-SLP-81(14), pp. 1–6 (2010). (in Japanese)
Google Scholar
Hinduja, S., Patchin, J.W.: Bullying Beyond the Schoolyard: Preventing and Responding to Cyberbullying. Corwin Press, Thousand Oaks (2009)
Google Scholar
Ikeda, K., Yanagihara, T.: Kakuyōso no chūshōka ni motozuku ihō-, yūgai-bunsho kenshutsu shuhō no teian to hyōka (Proposal and evaluation of a method for illegal and harmful document detection based on the abstraction of case elements). In: Proceedings of 72nd National Convention of Information Processing Society of Japan (IPSJ72), pp. 71–72 (2010). (in Japanese)
Google Scholar
Ishisaka, T., Yamamoto, K.: 2chaeru wo taishō to shita waruguchi hyōgen no chūshutsu (Extraction of abusive expressions from 2channel). In: Proceedings of the Sixteenth Annual Meeting of The Association for Natural Language Processing (NLP2010), pp. 178–181 (2010). (in Japanese)
Google Scholar
Kilgarriff, A.: Googleology is bad science. Comput. Linguist. 33(1), 147–151 (2007)
Article Google Scholar
Krippendorff, K.: Combinatorial explosion. In: Web Dictionary of Cybernetics and Systems. Principia Cybernetica Web (1986)
Google Scholar
Matsuba, T., Masui, F., Kawai, A., Isu, N.: Gakkou hikoushiki saito ni okeru yuugai jouhou kenshutsu (Detection of harmful information on informal school websites). In: Proceedings of the 16th Annual Meeting of the Association for Natural Language Processing (NLP2010) (2010). (in Japanese)
Google Scholar
Matsuba, T., Masui, F., Kawai, A., Isu, N.: Gakkō hi-kōshiki saito ni okeru yūgai jōhō kenshutsu wo mokuteki to shita kyokusei hantei moderu ni kansuru kenkyū (A study on the polarity classification model for the purpose of detecting harmful information on informal school sites). In: Proceedings of the Seventeenth Annual Meeting of the Association for Natural Language Processing (NLP2011), pp. 388–391 (2001). (in Japanese)
Google Scholar
Ministry of Education, Culture, Sports, Science and Technology (MEXT): ‘Netto-jō no ijime’ ni kansuru taiō manyuaru jirei shū (gakkō, kyōin muke) (“Bullying on the Net” Manual for handling and collection of cases (for schools and teachers)). Published by MEXT (2008). (in Japanese)
Google Scholar
Nitta, T., Masui, F., Ptaszynski, M., Kimura, Y., Rzepka, R., Araki, K.: Detecting cyberbullying entries on informal school websites based on category relevance maximization. In: Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), pp. 579–586 (2013)
Google Scholar
Patchin, J.W., Hinduja, S.: Bullies move beyond the schoolyard: a preliminary look at cyberbullying. Youth Violence Juv. Justice 4(2), 148–169 (2006)
Article Google Scholar
Ptaszynski, M., Dybala, P., Rzepka, R., Araki, K.: Affecting corpora: experiments with automatic affect annotation system - a case study of the 2 channel forum -. In: Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING-09), pp. 223–228 (2009)
Google Scholar
Ptaszynski, M., Dybala, P., Matsuba, T., Masui, F., Rzepka, R., Araki, K., Momouchi, Y.: In the service of online order: tackling cyber-bullying with machine learning and affect analysis. Int. J. Comput. Linguist. Res. 1(3), 135–154 (2010)
Google Scholar
Ptaszynski, M., Rzepka, R., Araki, K., Momouchi, Y.: Language combinatorics: a sentence pattern extraction architecture based on combinatorial explosion. Int. J. Comput. Linguist. (IJCL) 2(1), 24–36 (2011)
Google Scholar
Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 417–424 (2002)
Google Scholar
Watanabe, H., Sunayama, W.: Denshi keijiban ni okeru yūza no seishitsu no hyōka (User nature evalution on BBS). IEICE Technical report, 105(652), 2006-KBSE, pp. 25–30 (2006). (in Japanese)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Kitami Institute of Technology, Kitami, Japan
Michal Ptaszynski & Fumito Masui
Department of Information and Management Science, Otaru University of Commerce, Otaru, Japan
Yasutomo Kimura
Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
Rafal Rzepka & Kenji Araki

Authors

Michal Ptaszynski
View author publications
You can also search for this author in PubMed Google Scholar
Fumito Masui
View author publications
You can also search for this author in PubMed Google Scholar
Yasutomo Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Rafal Rzepka
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Araki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michal Ptaszynski .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
LIMSI-CNRS, Orsay Cedex, France
Joseph Mariani
Adam Mickiewicz University, Poznań, Poland
Marek Kubis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ptaszynski, M., Masui, F., Kimura, Y., Rzepka, R., Araki, K. (2018). Automatic Extraction of Harmful Sentence Patterns with Application in Cyberbullying Detection. In: Vetulani, Z., Mariani, J., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2015. Lecture Notes in Computer Science(), vol 10930. Springer, Cham. https://doi.org/10.1007/978-3-319-93782-3_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-93782-3_25
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93781-6
Online ISBN: 978-3-319-93782-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics