Skip to main content

Evaluating Novel Features for Aggressive Language Detection

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Abstract

The widespread use and abuse of social media and other platforms to voice opinions online has necessitated the development of tools to regulate this exchange of opinions in light of ethical and legal considerations. In this work, we aim to detect patterns of aggressive language to gain insight into what differentiates it from non-inflammatory language. Of particular interest are features of comments that, taken together, allow this distinction to be made automatically. To that end, we employ feature selection techniques to find optimal feature subsets.

We apply the feature selection and model evaluation process to two independent datasets. Depending on the dataset and model type, between 3 and 19 features are enough to outperform the full set of 68 features. Overall, the best \(F_1\) scores per dataset are 89.4%, using 35 features with a Gaussian SVM and 82.7%, using 17 features with a linear SVM.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.perspectiveapi.com/.

  2. 2.

    To account for minimalistic punctuation style, we additionally regarded a newline character as a sentence delimiter.

  3. 3.

    Full List of Bad Words and Top Swear Words Banned by Google at https://www.freewebheaders.com/full-list-of-bad-words-banned-by-google/.

  4. 4.

    http://mpqa.cs.pitt.edu/lexicons/subj_lexicon/.

  5. 5.

    Weakly subjective words counted as 0.5, strongly subjective words counted as 1.0.

References

  1. Hart, W., Albarracín, D., Eagly, A.H., Brechan, I., Lindberg, M.J., Merrill, L.: Feeling validated versus being correct: a meta-analysis of selective exposure to information. Psychol. Bull. 135(4), 555–588 (2009)

    Article  Google Scholar 

  2. Tobin, A., Varner, M., Angwin, J.: Facebook’s uneven enforcement of hate speech rules allows vile posts to stay up (2017). https://www.propublica.org/article/facebook-enforcement-hate-speech-rules-mistakes

  3. Ho, E.: Our latest update on safety (2017). https://blog.twitter.com/official/en_us/topics/product/2017/our-latest-update-on-safety.html

  4. Roettgers, J.: Instagram starts using artificial intelligence to moderate comments. Is Facebook up next? (2017). http://variety.com/2017/digital/news/instagram-ai-machine-learning-facebook-filters-1202482031/

  5. Spertus, E.: Smokey: Automatic recognition of hostile messages. In: Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence, Providence, Rhode Island, pp. 1058–1065 (1997)

    Google Scholar 

  6. Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: Proceedings of the 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust, Amsterdam, Netherlands, pp. 71–80 (2012)

    Google Scholar 

  7. Razavi, A., Inkpen, D., Uritsky, S., Matwin, S.: Offensive language detection using multi-level classification. In: Proceedings of the 23rd Canadian Conference on Advances in Artificial Intelligence, Ottawa, Canada, pp. 16–27 (2010)

    Google Scholar 

  8. Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, Montreal, Canada, pp. 19–26 (2012)

    Google Scholar 

  9. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, Montreal, Canada, pp. 145–153 (2016)

    Google Scholar 

  10. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, Valencia, Spain, pp. 1–10 (2017)

    Google Scholar 

  11. Burnap, P., Williams, M.: Hate speech, machine classification and statistical modelling of information flows on twitter: Interpretation and communication for policy decision making. In: Proceedings of Internet, Policy & Politics, Oxford, UK, pp. 1–18 (2014)

    Google Scholar 

  12. Wulczyn, E., Thain, N., Dixon, L.: Ex machina: personal attacks seen at scale. In: Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, pp. 1391–1399 (2017)

    Google Scholar 

  13. Saleem, H.M., Dillon, K.P., Benesch, S., Ruths, D.: A web of hate: Tackling hateful speech in online social spaces. CoRR abs/1709.10159 (2017)

    Google Scholar 

  14. Bourgonje, P., Moreno-Schneider, J., Srivastava, A., Rehm, G.: Automatic classification of abusive language and personal attacks in various forms of online communication. In: Rehm, G., Declerck, T. (eds.) GSCL 2017. LNCS (LNAI), vol. 10713, pp. 180–191. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73706-5_15

    Chapter  Google Scholar 

  15. Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. CoRR abs/1706.00188 (2017)

    Google Scholar 

  16. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, pp. 29–30 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephan Dreiseitl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schuh, T., Dreiseitl, S. (2018). Evaluating Novel Features for Aggressive Language Detection. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99579-3_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99578-6

  • Online ISBN: 978-3-319-99579-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics