Abstract
The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mining techniques for the automatic detection of abusive content across a range of social media sources include blogs, forums, media-sharing, Q&A and chat—using datasets from Twitter, YouTube, MySpace, Kongregate, Formspring and Slashdot. Using supervised machine learning, we compare alternative text representations and dimension reduction approaches, including feature selection and feature enhancement, demonstrating the impact of these techniques on detection accuracies. In addition, we investigate the need for sampling on imbalanced datasets. Our conclusions are: (1) Dataset balancing boosts accuracies significantly for social media abusive content detection; (2) Feature reduction, important for large feature sets that are typical of social media datasets, improves efficiency whilst maintaining detection accuracies; (3) The use of generic structural features common across all our datasets proved to be of limited use in the automatic detection of abusive content. Our findings can support practitioners in selecting appropriate text mining strategies in this area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Example, on D3, SVD reduction to 10 % features took 20.75 s while chi-square reduction took 0.03 s.
References
Olweus, D.: Bullying at school. What we know and what we can do (1993)
Del Bosque, L.P., Garza, S.E.: Aggressive text detection for cyberbullying. In: Mexican International Conference on Artificial Intelligence, pp. 221–232. Springer (2014)
Smith, P.K., Mahdavi, J., Carvalho, M., Fisher, S., Russell, S., Tippett, N.: Cyberbullying: its nature and impact in secondary school pupils. J. Child Psychol. Psychiatry 49(4), 376–385 (2008)
Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L.: Detection of harassment on web 2.0. Proc. Content Anal. WEB 2, 1–7 (2009)
Pak, A., Paroubek, Patrick: Twitter as a corpus for sentiment analysis and opinion mining. LREC 10, 1320–1326 (2010)
Pang, Bo, Lee, Lillian: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)
Huang, C., Jiang, Q., Zhang, Y.: Detecting comment spam through content analysis. In: Web-Age Information Management, pp. 222–233. Springer (2010)
Mccord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: Autonomic and Trusted Computing, pp. 175–186. Springer (2011)
Cohen, R., Ruths, D.: Classifying political orientation on twitter: it’s not easy! In ICWSM (2013)
Xiaohui, Yu., Liu, Yang, Huang, Xiangji, An, Aijun: Mining online reviews for predicting sales performance: a case study in the movie domain. IEEE Trans. Knowl. Data Eng. 24(4), 720–734 (2012)
Bayzick, J., Kontostathis, A., Edwards, L.: Detecting the presence of cyberbullying using computer software (2011)
Dadvar, M., Trieschnigg, D., de Jong, F.: Experts and machines against bullies: a hybrid approach to detect cyberbullies. In: Advances in Artificial Intelligence, pp. 275–281. Springer (2014)
Mangaonkar, A., Hayrapetian, A., Raje, R.: Collaborative detection of cyberbullying behavior in twitter data. In: 2015 IEEE International Conference on Electro/Information Technology (EIT), pp. 611–616. IEEE (2015)
Xu, J.-M., Jun, K.-S., Zhu, X., Bellmore, A.: Learning from bullying traces in social media. In: Proceedings of the 2012 Conf of the Nth American chapter of the ACL: Human Language Technologies, pp. 656–666. ACL (2012)
Burnap, P., Williams, M.L.: Cyber hate speech on twitter: an application of machine classification and statistical modeling for policy and decision making. Policy Internet (2015)
Reynolds, K., Kontostathis, A., Edwards, L.: Using machine learning to detect cyberbullying. In: 2011 10th International Conference on Machine Learning and Applications and Workshops (ICMLA), vol. 2, pp. 241–244. IEEE (2011)
Sood, S., Antin, J., Churchill, E.: Profanity use in online communities. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1481–1490. ACM (2012)
Hosseinmardi, H., Mattson, S.A., Rafiq, R.I., Han, R., Lv, Q., Mishra, S.: Detection of cyberbullying incidents on the instagram social network (2015). arXiv:1503.03909
Dadvar, M., de Jong, F.M.G., Ordelman, R.J.F., Trieschnigg, R.B.: Improved cyberbullying detection using gender information (2012)
Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on Social Computing (SocialCom), pp. 71–80. IEEE (2012)
Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: European Conference on Information Retrieval, pp. 693–696. Springer (2013)
Lieberman, Henry, Dinakar, Karthik, Jones, Birago: Let’s gang up on cyberbullying. Computer 44(9), 93–96 (2011)
Dinakar, K., Reichart, R., Lieberman, H.: Modeling the detection of textual cyberbullying. In: The Social Mobile Web (2011)
Xiang, G., Fan, B., Wang, L., Hong, J., Rose, C.: Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1980–1984. ACM (2012)
da Silva, N.F.F., Hruschka, E.R., Hruschka, E.R.: Tweet sentiment analysis with classifier ensembles. Decis. Support Syst. 66, 170–179 (2014)
Dadvar, M., Trieschnigg, D., de Jong, F.: Experts and machines against bullies: a hybrid approach to detect cyberbullies. In: Canadian Conference on Artificial Intelligence, pp. 275–281. Springer (2014)
Nahar, Vinita, Li, Xue, Pang, Chaoyi: An effective approach for cyberbullying detection. Commun. Inf. Sci. Manage. Eng. 3(5), 238 (2013)
Xu, J.-M., Zhu, X., Bellmore, A.: Fast learning for sentiment analysis on bullying. In: Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining, p. 10. ACM (2012)
Ganganwar, V.: An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2(4), 42–47 (2012)
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer (2005)
Cunningham, P.: Dimension reduction. In: Machine Learning Techniques for Multimedia, pp. 91–112. Springer (2008)
Sebastiani F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Domingos P.: A few useful things to know about machine learning. CACM 55(10), 78–87 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chen, H., Mckeever, S., Delany, S.J. (2017). Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media. In: Angelov, P., Gegov, A., Jayne, C., Shen, Q. (eds) Advances in Computational Intelligence Systems. Advances in Intelligent Systems and Computing, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-319-46562-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-46562-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46561-6
Online ISBN: 978-3-319-46562-3
eBook Packages: EngineeringEngineering (R0)