Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media

Chen, Hao; Mckeever, Susan; Delany, Sarah Jane

doi:10.1007/978-3-319-46562-3_12

Hao Chen⁶,
Susan Mckeever⁶ &
Sarah Jane Delany⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 513))

Abstract

The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mining techniques for the automatic detection of abusive content across a range of social media sources include blogs, forums, media-sharing, Q&A and chat—using datasets from Twitter, YouTube, MySpace, Kongregate, Formspring and Slashdot. Using supervised machine learning, we compare alternative text representations and dimension reduction approaches, including feature selection and feature enhancement, demonstrating the impact of these techniques on detection accuracies. In addition, we investigate the need for sampling on imbalanced datasets. Our conclusions are: (1) Dataset balancing boosts accuracies significantly for social media abusive content detection; (2) Feature reduction, important for large feature sets that are typical of social media datasets, improves efficiency whilst maintaining detection accuracies; (3) The use of generic structural features common across all our datasets proved to be of limited use in the automatic detection of abusive content. Our findings can support practitioners in selecting appropriate text mining strategies in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Combatting Cybercrimes: Leveraging Natural Language Processing for Detection in Social Media

Study of violence against women and its characteristics through the application of text mining techniques

Article 14 September 2023

A multi-platform dataset for detecting cyberbullying in social media

Article 06 April 2020

Notes

1.
http://caw2.barcelonamedia.org.
2.
Example, on D3, SVD reduction to 10 % features took 20.75 s while chi-square reduction took 0.03 s.

References

Olweus, D.: Bullying at school. What we know and what we can do (1993)
Google Scholar
Del Bosque, L.P., Garza, S.E.: Aggressive text detection for cyberbullying. In: Mexican International Conference on Artificial Intelligence, pp. 221–232. Springer (2014)
Google Scholar
Smith, P.K., Mahdavi, J., Carvalho, M., Fisher, S., Russell, S., Tippett, N.: Cyberbullying: its nature and impact in secondary school pupils. J. Child Psychol. Psychiatry 49(4), 376–385 (2008)
Google Scholar
Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L.: Detection of harassment on web 2.0. Proc. Content Anal. WEB 2, 1–7 (2009)
Google Scholar
Pak, A., Paroubek, Patrick: Twitter as a corpus for sentiment analysis and opinion mining. LREC 10, 1320–1326 (2010)
Google Scholar
Pang, Bo, Lee, Lillian: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)
Article Google Scholar
Huang, C., Jiang, Q., Zhang, Y.: Detecting comment spam through content analysis. In: Web-Age Information Management, pp. 222–233. Springer (2010)
Google Scholar
Mccord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: Autonomic and Trusted Computing, pp. 175–186. Springer (2011)
Google Scholar
Cohen, R., Ruths, D.: Classifying political orientation on twitter: it’s not easy! In ICWSM (2013)
Google Scholar
Xiaohui, Yu., Liu, Yang, Huang, Xiangji, An, Aijun: Mining online reviews for predicting sales performance: a case study in the movie domain. IEEE Trans. Knowl. Data Eng. 24(4), 720–734 (2012)
Article Google Scholar
Bayzick, J., Kontostathis, A., Edwards, L.: Detecting the presence of cyberbullying using computer software (2011)
Google Scholar
Dadvar, M., Trieschnigg, D., de Jong, F.: Experts and machines against bullies: a hybrid approach to detect cyberbullies. In: Advances in Artificial Intelligence, pp. 275–281. Springer (2014)
Google Scholar
Mangaonkar, A., Hayrapetian, A., Raje, R.: Collaborative detection of cyberbullying behavior in twitter data. In: 2015 IEEE International Conference on Electro/Information Technology (EIT), pp. 611–616. IEEE (2015)
Google Scholar
Xu, J.-M., Jun, K.-S., Zhu, X., Bellmore, A.: Learning from bullying traces in social media. In: Proceedings of the 2012 Conf of the Nth American chapter of the ACL: Human Language Technologies, pp. 656–666. ACL (2012)
Google Scholar
Burnap, P., Williams, M.L.: Cyber hate speech on twitter: an application of machine classification and statistical modeling for policy and decision making. Policy Internet (2015)
Google Scholar
Reynolds, K., Kontostathis, A., Edwards, L.: Using machine learning to detect cyberbullying. In: 2011 10th International Conference on Machine Learning and Applications and Workshops (ICMLA), vol. 2, pp. 241–244. IEEE (2011)
Google Scholar
Sood, S., Antin, J., Churchill, E.: Profanity use in online communities. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1481–1490. ACM (2012)
Google Scholar
Hosseinmardi, H., Mattson, S.A., Rafiq, R.I., Han, R., Lv, Q., Mishra, S.: Detection of cyberbullying incidents on the instagram social network (2015). arXiv:1503.03909
Dadvar, M., de Jong, F.M.G., Ordelman, R.J.F., Trieschnigg, R.B.: Improved cyberbullying detection using gender information (2012)
Google Scholar
Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on Social Computing (SocialCom), pp. 71–80. IEEE (2012)
Google Scholar
Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: European Conference on Information Retrieval, pp. 693–696. Springer (2013)
Google Scholar
Lieberman, Henry, Dinakar, Karthik, Jones, Birago: Let’s gang up on cyberbullying. Computer 44(9), 93–96 (2011)
Article Google Scholar
Dinakar, K., Reichart, R., Lieberman, H.: Modeling the detection of textual cyberbullying. In: The Social Mobile Web (2011)
Google Scholar
Xiang, G., Fan, B., Wang, L., Hong, J., Rose, C.: Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1980–1984. ACM (2012)
Google Scholar
da Silva, N.F.F., Hruschka, E.R., Hruschka, E.R.: Tweet sentiment analysis with classifier ensembles. Decis. Support Syst. 66, 170–179 (2014)
Google Scholar
Dadvar, M., Trieschnigg, D., de Jong, F.: Experts and machines against bullies: a hybrid approach to detect cyberbullies. In: Canadian Conference on Artificial Intelligence, pp. 275–281. Springer (2014)
Google Scholar
Nahar, Vinita, Li, Xue, Pang, Chaoyi: An effective approach for cyberbullying detection. Commun. Inf. Sci. Manage. Eng. 3(5), 238 (2013)
Google Scholar
Xu, J.-M., Zhu, X., Bellmore, A.: Fast learning for sentiment analysis on bullying. In: Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining, p. 10. ACM (2012)
Google Scholar
Ganganwar, V.: An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2(4), 42–47 (2012)
Google Scholar
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer (2005)
Google Scholar
Cunningham, P.: Dimension reduction. In: Machine Learning Techniques for Multimedia, pp. 91–112. Springer (2008)
Google Scholar
Sebastiani F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Google Scholar
Domingos P.: A few useful things to know about machine learning. CACM 55(10), 78–87 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Dublin Institute of Technology, Dublin, Ireland
Hao Chen, Susan Mckeever & Sarah Jane Delany

Authors

Hao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Susan Mckeever
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Jane Delany
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Chen .

Editor information

Editors and Affiliations

School of Computing and Communications, Lancaster University Bailrigg School of Computing and Communications, Lancaster, United Kingdom
Plamen Angelov
School of Computing, University of Portsmouth School of Computing, Portsmouth, Hampshire, United Kingdom
Alexander Gegov
School of Comp. Sci. & Digital Media, Robert Gordon University School of Comp. Sci. & Digital Media, Aberdeen, United Kingdom
Chrisina Jayne
Ins. of Mathematics, Physics & Comp. Sci, Aberystwyth University Ins. of Mathematics, Physics & Comp. Sci, Aberystwyth, United Kingdom
Qiang Shen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, H., Mckeever, S., Delany, S.J. (2017). Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media. In: Angelov, P., Gegov, A., Jayne, C., Shen, Q. (eds) Advances in Computational Intelligence Systems. Advances in Intelligent Systems and Computing, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-319-46562-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-46562-3_12
Published: 07 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46561-6
Online ISBN: 978-3-319-46562-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics