Use of Natural Language Processing to Identify Inappropriate Content in Text

Merayo-Alba, Sergio; Fidalgo, Eduardo; González-Castro, Víctor; Alaiz-Rodríguez, Rocío; Velasco-Mata, Javier

doi:10.1007/978-3-030-29859-3_22

Use of Natural Language Processing to Identify Inappropriate Content in Text

Sergio Merayo-Alba¹³,
Eduardo Fidalgo¹³,
Víctor González-Castro¹³,
Rocío Alaiz-Rodríguez¹³ &
…
Javier Velasco-Mata¹³

Conference paper
First Online: 26 August 2019

1406 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11734))

Abstract

The quick development of communication through new technology media such as social networks and mobile phones has improved our lives. However, this also produces collateral problems such as the presence of insults and abusive comments. In this work, we address the problem of detecting violent content on text documents using Natural Language Processing techniques. Following an approach based on Machine Learning techniques, we have trained six models resulting from the combinations of two text encoders, Term Frequency-Inverse Document Frequency and Bag of Words, together with three classifiers: Logistic Regression, Support Vector Machines and Naïve Bayes. We have also assessed StarSpace, a Deep Learning approach proposed by Facebook and configured to use a Hit@1 accuracy. We evaluated these seven alternatives in two publicly available datasets from the Wikipedia Detox Project: Attack and Aggression. StarSpace achieved an accuracy of 0.938 and 0.937 in these datasets, respectively, being the algorithm recommended to detect violent content on text documents among the alternatives evaluated.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Hussainalsaid, A., Azami, B.Z., Abhari, A.: Automatic classification of the emotional content of URL documents using NLP algorithms. In: Proceedings of the 18th Symposium on Communications & Networking, pp. 56–59 (2015)
Google Scholar
Chin, H., Kim, J., Kim, Y., Shin, J., Yi, M.Y.: Explicit content detection in music lyrics using machine learning. In: IEEE International Conference on Big Data and Smart Computing, pp. 517–521 (2018)
Google Scholar
Duarte, N., Llanso, E., Loup, A.: Mixed Messages? The Limits of Automated Social Media Content Analysis. In: FAT, vol. 106 (2018)
Google Scholar
Mironczuk, M., Protasiewicz, J.: A recent overview of the state-of-the-art elements of text classification. Expert Syst. Appl. 106 (2016)
Article Google Scholar
Bui, D.D.A., Del Fiol, G., Jonnalagadda, S.: PDF text classification to leverage information extraction from publication reports. J. Biomed. Inform. 61, 141–148 (2016)
Article Google Scholar
Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with Naïve Bayes. Expert Syst. Appl. 36(3), 5432–5435 (2009)
Article Google Scholar
Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 659–661 (2002)
Google Scholar
Diab, D.M., Hindi, K.: Using differential evolution for fine tuning Naïve Bayesian classifiers and its application for text classification. Appl. Soft Comput. 54 (2016)
Article Google Scholar
Chavan, V., Shylaja, S.: Machine learning approach for detection of cyber-aggressive comments by peers on social media network, pp. 2354–2358 (2015)
Google Scholar
Hammer, H.: Automatic detection of hateful comments in online discussion. Ind. Netw. Intell. Syst., 164–173 (2017)
Google Scholar
Eshan, S., Hasan, M.: An application of machine learning to detect abusive Bengali text. In: International Conference of Computer and Information Technology, pp. 1–6 (2017)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
Chu, T., Jue, K., Wang, M.: Comment abuse classification with deep learning. Stanford University (2016)
Google Scholar
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: International Conference on World Wide Web Companion, pp. 759–760 (2017)
Google Scholar
Aizawa, A.: An information-theoretic perspective of TF-IDF measures. Inf. Process. Manag. 39(1), 45–65 (2003)
Article Google Scholar
Harris, Z.: Distributional structure. Word 10(2–3), 146–162 (1954)
Article Google Scholar
Cox, D.: The regression analysis of binary sequences. J. Roy. Stat. Soc. B 20(2), 215–232 (1958)
MathSciNet MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, vol. 752, no. 1, pp. 41–48 (1998)
Google Scholar
Wu, L., Fisch, A., Chopra, S., Adams, K., Bordes, A., Weston, J.: StarSpace: embed all the things!. In: AAAI Conference on Artificial Intelligence, pp. 5569–5577 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical, Systems and Automation Engineering, Universidad de León, León, Spain
Sergio Merayo-Alba, Eduardo Fidalgo, Víctor González-Castro, Rocío Alaiz-Rodríguez & Javier Velasco-Mata

Authors

Sergio Merayo-Alba
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Fidalgo
View author publications
You can also search for this author in PubMed Google Scholar
Víctor González-Castro
View author publications
You can also search for this author in PubMed Google Scholar
Rocío Alaiz-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Javier Velasco-Mata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Víctor González-Castro .

Editor information

Editors and Affiliations

University of León, León, Spain
Hilde Pérez García
University of León, León, Spain
Lidia Sánchez González
University of León, León, Spain
Manuel Castejón Limas
University of A Coruña, Ferrol, Spain
Héctor Quintián Pardo
University of Salamanca, Salamanca, Spain
Emilio Corchado Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Merayo-Alba, S., Fidalgo, E., González-Castro, V., Alaiz-Rodríguez, R., Velasco-Mata, J. (2019). Use of Natural Language Processing to Identify Inappropriate Content in Text. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-29859-3_22
Published: 26 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29858-6
Online ISBN: 978-3-030-29859-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics