Evaluating Novel Features for Aggressive Language Detection

Schuh, Tina; Dreiseitl, Stephan

doi:10.1007/978-3-319-99579-3_60

Evaluating Novel Features for Aggressive Language Detection

Tina Schuh¹⁶ &
Stephan Dreiseitl¹⁷

Conference paper
First Online: 25 August 2018

1498 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Abstract

The widespread use and abuse of social media and other platforms to voice opinions online has necessitated the development of tools to regulate this exchange of opinions in light of ethical and legal considerations. In this work, we aim to detect patterns of aggressive language to gain insight into what differentiates it from non-inflammatory language. Of particular interest are features of comments that, taken together, allow this distinction to be made automatically. To that end, we employ feature selection techniques to find optimal feature subsets.

We apply the feature selection and model evaluation process to two independent datasets. Depending on the dataset and model type, between 3 and 19 features are enough to outperform the full set of 68 features. Overall, the best \(F_1\) scores per dataset are 89.4%, using 35 features with a Gaussian SVM and 82.7%, using 17 features with a linear SVM.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://www.perspectiveapi.com/.
2.
To account for minimalistic punctuation style, we additionally regarded a newline character as a sentence delimiter.
3.
Full List of Bad Words and Top Swear Words Banned by Google at https://www.freewebheaders.com/full-list-of-bad-words-banned-by-google/.
4.
http://mpqa.cs.pitt.edu/lexicons/subj_lexicon/.
5.
Weakly subjective words counted as 0.5, strongly subjective words counted as 1.0.

References

Hart, W., Albarracín, D., Eagly, A.H., Brechan, I., Lindberg, M.J., Merrill, L.: Feeling validated versus being correct: a meta-analysis of selective exposure to information. Psychol. Bull. 135(4), 555–588 (2009)
Article Google Scholar
Tobin, A., Varner, M., Angwin, J.: Facebook’s uneven enforcement of hate speech rules allows vile posts to stay up (2017). https://www.propublica.org/article/facebook-enforcement-hate-speech-rules-mistakes
Ho, E.: Our latest update on safety (2017). https://blog.twitter.com/official/en_us/topics/product/2017/our-latest-update-on-safety.html
Roettgers, J.: Instagram starts using artificial intelligence to moderate comments. Is Facebook up next? (2017). http://variety.com/2017/digital/news/instagram-ai-machine-learning-facebook-filters-1202482031/
Spertus, E.: Smokey: Automatic recognition of hostile messages. In: Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence, Providence, Rhode Island, pp. 1058–1065 (1997)
Google Scholar
Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: Proceedings of the 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust, Amsterdam, Netherlands, pp. 71–80 (2012)
Google Scholar
Razavi, A., Inkpen, D., Uritsky, S., Matwin, S.: Offensive language detection using multi-level classification. In: Proceedings of the 23rd Canadian Conference on Advances in Artificial Intelligence, Ottawa, Canada, pp. 16–27 (2010)
Google Scholar
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, Montreal, Canada, pp. 19–26 (2012)
Google Scholar
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, Montreal, Canada, pp. 145–153 (2016)
Google Scholar
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, Valencia, Spain, pp. 1–10 (2017)
Google Scholar
Burnap, P., Williams, M.: Hate speech, machine classification and statistical modelling of information flows on twitter: Interpretation and communication for policy decision making. In: Proceedings of Internet, Policy & Politics, Oxford, UK, pp. 1–18 (2014)
Google Scholar
Wulczyn, E., Thain, N., Dixon, L.: Ex machina: personal attacks seen at scale. In: Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, pp. 1391–1399 (2017)
Google Scholar
Saleem, H.M., Dillon, K.P., Benesch, S., Ruths, D.: A web of hate: Tackling hateful speech in online social spaces. CoRR abs/1709.10159 (2017)
Google Scholar
Bourgonje, P., Moreno-Schneider, J., Srivastava, A., Rehm, G.: Automatic classification of abusive language and personal attacks in various forms of online communication. In: Rehm, G., Declerck, T. (eds.) GSCL 2017. LNCS (LNAI), vol. 10713, pp. 180–191. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73706-5_15
Chapter Google Scholar
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. CoRR abs/1706.00188 (2017)
Google Scholar
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, pp. 29–30 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Chair of Software Engineering, University of Passau, 94032, Passau, Germany
Tina Schuh
Department of Software Engineering, Upper Austria University of Applied Sciences, 4232, Hagenberg, Austria
Stephan Dreiseitl

Authors

Tina Schuh
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Dreiseitl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephan Dreiseitl .

Editor information

Editors and Affiliations

SPIIRAS, St. Petersburg, Russia
Alexey Karpov
Leipzig University of Telecommunications, Leipzig, Germany
Oliver Jokisch
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schuh, T., Dreiseitl, S. (2018). Evaluating Novel Features for Aggressive Language Detection. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_60

Download citation

DOI: https://doi.org/10.1007/978-3-319-99579-3_60
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics