Abstract
The paper presents a method for pornography detection in the web pages based on natural language processing. The described classification method uses feature set of single words and groups of words. Syntax analysis is performed to extract collocations. A modification of TF-IDF is used to weight terms. An evaluation and comparison of quality and performance of classification are given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
TopTenReviews: Internet pornography statistics (March 2013), http://internet-filter-review.toptenreviews.com/internet-pornography-statistics.html
Polpinij, J., Chotthanom, A., Sibunruang, C., Chamchong, R., Puangpronpitag, S.: Content-based text classifiers for pornographic web filtering. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2006, vol. 2, pp. 1481ā1485 (2006)
Polpinij, J., Sibunruang, C., Paungpronpitag, S., Chamchong, R., Chotthanom, A.: A web pornography patrol system by content-based analysis: In particular text and image. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2008, pp. 500ā505 (2008)
Ho, W., Watters, P.: Statistical and structural approaches to filtering internet pornography. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 5, pp. 4792ā4798 (2004)
Lee, P., Hui, S., Fong, A.: A structural and content-based analysis for web filtering. Internet Research 13(1), 27ā37 (2003)
Hammami, M., Chahir, Y., Chen, L.: Webguard: Web based adult content detection and filtering system. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence, WI 2003, pp. 574ā578 (2003)
Hu, W., Wu, O., Chen, Z., Fu, Z., Maybank, S.: Recognition of pornographic web pages by classifying texts and images. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6), 1019ā1034 (2007)
eTesting Labs: U.S. department of justice: Updated web content filtering software comparison. Technical report, eTesting Labs (2001)
Chou, C.-H., Sinha, A.P., Zhao, H.: A text mining approach to internet abuse detection. Information Systems and e-Business Management (2008)
Su, G.Y., Li, J.H., Ma, Y.H., Li, S.H.: Improving the precision of the keyword-matching pornographic text filtering method using a hybrid model. Journal of Zhejiang University Science 5(9), 1106ā1113 (2004)
Churcharoenkrung, N., Kim, Y.S., Kang, B.H.: Dynamic web content filtering based on userās knowledge. In: Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC 2005), vol. I, pp. 184ā188. IEEE Computer Society, Washington, DC (2005)
Du, R., Safavi-Naini, R., Susilo, W.: Web filtering using text classification. In: The 11th IEEE International Conference on Networks, ICON 2003, pp. 325ā330 (2003)
Mbaykodzhi, A., Dral, A.A., Sochenkov, I.V.: Short text messages classification method. Information Technologies and Computational Systems (3), 93ā102 (2012)
Manning, C., Raghavan, P., Shutze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
FreeLing: An open source suite of language analyzers, http://nlp.lsi.upc.edu/freeling/
AOT: Automatic text processing, http://aot.ru/
Osipov, G., Smirnov, I., Tikhomirov, I., Shelmanov, A.: Relational-situational method for intelligent search and analysis of scientific publications. In: Proceedings of the Integrating IR Technologies for Professional Search Workshop, pp. 57ā64 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Suvorov, R., Sochenkov, I., Tikhomirov, I. (2013). Method for Pornography Filtering in the WEB Based on Automatic Classification and Natural Language Processing. In: ŽeleznĆ½, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-01931-4_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)