Abstract
The quick development of communication through new technology media such as social networks and mobile phones has improved our lives. However, this also produces collateral problems such as the presence of insults and abusive comments. In this work, we address the problem of detecting violent content on text documents using Natural Language Processing techniques. Following an approach based on Machine Learning techniques, we have trained six models resulting from the combinations of two text encoders, Term Frequency-Inverse Document Frequency and Bag of Words, together with three classifiers: Logistic Regression, Support Vector Machines and Naïve Bayes. We have also assessed StarSpace, a Deep Learning approach proposed by Facebook and configured to use a Hit@1 accuracy. We evaluated these seven alternatives in two publicly available datasets from the Wikipedia Detox Project: Attack and Aggression. StarSpace achieved an accuracy of 0.938 and 0.937 in these datasets, respectively, being the algorithm recommended to detect violent content on text documents among the alternatives evaluated.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
References
Hussainalsaid, A., Azami, B.Z., Abhari, A.: Automatic classification of the emotional content of URL documents using NLP algorithms. In: Proceedings of the 18th Symposium on Communications & Networking, pp. 56–59 (2015)
Chin, H., Kim, J., Kim, Y., Shin, J., Yi, M.Y.: Explicit content detection in music lyrics using machine learning. In: IEEE International Conference on Big Data and Smart Computing, pp. 517–521 (2018)
Duarte, N., Llanso, E., Loup, A.: Mixed Messages? The Limits of Automated Social Media Content Analysis. In: FAT, vol. 106 (2018)
Mironczuk, M., Protasiewicz, J.: A recent overview of the state-of-the-art elements of text classification. Expert Syst. Appl. 106 (2016)
Bui, D.D.A., Del Fiol, G., Jonnalagadda, S.: PDF text classification to leverage information extraction from publication reports. J. Biomed. Inform. 61, 141–148 (2016)
Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with Naïve Bayes. Expert Syst. Appl. 36(3), 5432–5435 (2009)
Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 659–661 (2002)
Diab, D.M., Hindi, K.: Using differential evolution for fine tuning Naïve Bayesian classifiers and its application for text classification. Appl. Soft Comput. 54 (2016)
Chavan, V., Shylaja, S.: Machine learning approach for detection of cyber-aggressive comments by peers on social media network, pp. 2354–2358 (2015)
Hammer, H.: Automatic detection of hateful comments in online discussion. Ind. Netw. Intell. Syst., 164–173 (2017)
Eshan, S., Hasan, M.: An application of machine learning to detect abusive Bengali text. In: International Conference of Computer and Information Technology, pp. 1–6 (2017)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Chu, T., Jue, K., Wang, M.: Comment abuse classification with deep learning. Stanford University (2016)
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: International Conference on World Wide Web Companion, pp. 759–760 (2017)
Aizawa, A.: An information-theoretic perspective of TF-IDF measures. Inf. Process. Manag. 39(1), 45–65 (2003)
Harris, Z.: Distributional structure. Word 10(2–3), 146–162 (1954)
Cox, D.: The regression analysis of binary sequences. J. Roy. Stat. Soc. B 20(2), 215–232 (1958)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, vol. 752, no. 1, pp. 41–48 (1998)
Wu, L., Fisch, A., Chopra, S., Adams, K., Bordes, A., Weston, J.: StarSpace: embed all the things!. In: AAAI Conference on Artificial Intelligence, pp. 5569–5577 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Merayo-Alba, S., Fidalgo, E., González-Castro, V., Alaiz-Rodríguez, R., Velasco-Mata, J. (2019). Use of Natural Language Processing to Identify Inappropriate Content in Text. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-29859-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29858-6
Online ISBN: 978-3-030-29859-3
eBook Packages: Computer ScienceComputer Science (R0)