Abstract
Internet is a powerful source of information. However, some of the information that is available in the Internet, cannot be shown to every type of public. For instance, pornography is not desirable to be shown to children. To this end, several algorithms for text filtering have been proposed that employ a Vector Space Model representation of the webpages. Nevertheless, these type of filters can be surpassed using different attacks. In this paper, we present the first adult content filtering tool that employs compression algorithms to represent data that is resilient to these attacks. We show that this approach enhances the results of classic VSM models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gómez Hidalgo, J., Sanz, E., García, F., Rodríguez, M.: Web Content Filtering. Advances in Computers 76, 257–306 (2009)
Duan, L., Cui, G., Gao, W., Zhang, H.: Adult image detection method base-on skin color model and support vector machine. In: Asian Conference on Computer Vision, pp. 797–800 (2002)
Zheng, H., Daoudi, M., Jedynak, B.: Blocking adult images based on statistical skin detection. Electronic Letters on Computer Vision and Image Analysis 4(2), 1–14 (2004)
Lee, J., Kuo, Y., Chung, P., Chen, E., et al.: Naked image detection based on adaptive and extensible skin color model. Pattern Recognition 40(8), 2261–2270 (2007)
Choi, B., Chung, B., Ryou, J.: Adult Image Detection Using Bayesian Decision Rule Weighted by SVM Probability. In: 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology, pp. 659–662. IEEE (2009)
Poesia filter, http://www.poesia-filter.org/
Du, R., Safavi-Naini, R., Susilo, W.: Web filtering using text classification. In: The 11th IEEE International Conference on Networks, ICON 2003, pp. 325–330. IEEE (2003)
Kim, Y., Nam, T.: An efficient text filter for adult web documents. In: The 8th International Conference on Advanced Communication Technology, ICACT 2006, vol. 1, p. 3. IEEE (2006)
Ho, W., Watters, P.: Statistical and structural approaches to filtering internet pornography. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 5, pp. 4792–4798. IEEE (2004)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Wittel, G., Wu, S.: On attacking statistical spam filters. In: Proceedings of the 1st Conference on Email and Anti-Spam, CEAS (2004)
Cormack, G.V., Horspool, R.N.S.: Data compression using dynamic markov modelling. The Computer Journal 30(6), 541 (1987)
Bratko, A., Filipič, B., Cormack, G.V., Lynam, T.R., Zupan, B.: Spam filtering using statistical data compression models. The Journal of Machine Learning Research 7, 2673–2698 (2006)
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. Int. J. Comput. Appl. Technol. 35, 183–193 (2009)
Wilbur, W.J., Sirotkin, K.: The automatic identification of stop words. Journal of Information Science 18(1), 45–55 (1992)
Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)
Lovins, J.B.: Development of a Stemming Algorithm.. Mechanical Translation and Computational Linguistics 11(1), 22–31 (1992)
Garner, S.: Weka: The Waikato environment for knowledge analysis. In: Proceedings of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)
Cooper, G.F., Herskovits, E.: A bayesian method for constructing bayesian belief networks from databases. In: Proceedings of the 7th Conference on Uncertainty in Artificial Intelligence (1991)
Geiger, D., Goldszmidt, M., Provan, G., Langley, P., Smyth, P.: Bayesian network classifiers. Machine Learning, 131–163 (1997)
Lewis, D.D.: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–18. Springer, Heidelberg (1998)
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89(1-2), 31–71 (1997)
Ide, N., Véronis, J.: Introduction to the special issue on word sense disambiguation: the state of the art. Computational linguistics 24(1), 2–40 (1998)
Navigli, R.: Word sense disambiguation: a survey. ACM Computing Surveys (CSUR) 41(2), 10 (2009)
Cano, J.R., Herrera, F., Lozano, M.: On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining. Applied Soft Computing Journal 6(3), 323–332 (2006)
Czarnowski, I., Jedrzejowicz, P.: Instance reduction approach to machine learning and multi-database mining. In: Proceedings of the Scientific Session Organized During XXI Fall Meeting of the Polish Information Processing Society, Informatica, pp. 60–71. ANNALES Universitatis Mariae Curie-Skłodowska, Lublin (2006)
Pyle, D.: Data preparation for data mining. Morgan Kaufmann (1999)
Tsang, E., Yeung, D., Wang, X.: OFFSS: optimal fuzzy-valued feature subset selection. IEEE Transactions on Fuzzy Systems 11(2), 202–213 (2003)
Torkkola, K.: Feature extraction by non parametric mutual information maximization. The Journal of Machine Learning Research 3, 1415–1438 (2003)
Dash, M., Liu, H.: Consistency-based search in feature selection. Artificial Intelligence 151(1-2), 155–176 (2003)
Liu, H., Motoda, H.: Instance selection and construction for data mining. Kluwer Academic Pub. (2001)
Liu, H., Motoda, H.: Computational methods of feature selection. Chapman & Hall/CRC (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Santos, I., Galán-García, P., Santamaría-Ibirika, A., Alonso-Isla, B., Alabau-Sarasola, I., Bringas, P.G. (2013). Adult Content Filtering through Compression-Based Text Classification. In: Herrero, Á., et al. International Joint Conference CISIS’12-ICEUTE´12-SOCO´12 Special Sessions. Advances in Intelligent Systems and Computing, vol 189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33018-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-33018-6_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33017-9
Online ISBN: 978-3-642-33018-6
eBook Packages: EngineeringEngineering (R0)