Skip to main content

Abstract

Internet is a powerful source of information. However, some of the information that is available in the Internet, cannot be shown to every type of public. For instance, pornography is not desirable to be shown to children. To this end, several algorithms for text filtering have been proposed that employ a Vector Space Model representation of the webpages. Nevertheless, these type of filters can be surpassed using different attacks. In this paper, we present the first adult content filtering tool that employs compression algorithms to represent data that is resilient to these attacks. We show that this approach enhances the results of classic VSM models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Gómez Hidalgo, J., Sanz, E., García, F., Rodríguez, M.: Web Content Filtering. Advances in Computers 76, 257–306 (2009)

    Article  Google Scholar 

  2. Duan, L., Cui, G., Gao, W., Zhang, H.: Adult image detection method base-on skin color model and support vector machine. In: Asian Conference on Computer Vision, pp. 797–800 (2002)

    Google Scholar 

  3. Zheng, H., Daoudi, M., Jedynak, B.: Blocking adult images based on statistical skin detection. Electronic Letters on Computer Vision and Image Analysis 4(2), 1–14 (2004)

    Google Scholar 

  4. Lee, J., Kuo, Y., Chung, P., Chen, E., et al.: Naked image detection based on adaptive and extensible skin color model. Pattern Recognition 40(8), 2261–2270 (2007)

    Article  MATH  Google Scholar 

  5. Choi, B., Chung, B., Ryou, J.: Adult Image Detection Using Bayesian Decision Rule Weighted by SVM Probability. In: 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology, pp. 659–662. IEEE (2009)

    Google Scholar 

  6. Poesia filter, http://www.poesia-filter.org/

  7. Du, R., Safavi-Naini, R., Susilo, W.: Web filtering using text classification. In: The 11th IEEE International Conference on Networks, ICON 2003, pp. 325–330. IEEE (2003)

    Google Scholar 

  8. Kim, Y., Nam, T.: An efficient text filter for adult web documents. In: The 8th International Conference on Advanced Communication Technology, ICACT 2006, vol. 1, p. 3. IEEE (2006)

    Google Scholar 

  9. Ho, W., Watters, P.: Statistical and structural approaches to filtering internet pornography. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 5, pp. 4792–4798. IEEE (2004)

    Google Scholar 

  10. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  11. Wittel, G., Wu, S.: On attacking statistical spam filters. In: Proceedings of the 1st Conference on Email and Anti-Spam, CEAS (2004)

    Google Scholar 

  12. Cormack, G.V., Horspool, R.N.S.: Data compression using dynamic markov modelling. The Computer Journal 30(6), 541 (1987)

    MathSciNet  Google Scholar 

  13. Bratko, A., Filipič, B., Cormack, G.V., Lynam, T.R., Zupan, B.: Spam filtering using statistical data compression models. The Journal of Machine Learning Research 7, 2673–2698 (2006)

    MATH  Google Scholar 

  14. Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)

    Article  MATH  Google Scholar 

  15. Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. Int. J. Comput. Appl. Technol. 35, 183–193 (2009)

    Article  Google Scholar 

  16. Wilbur, W.J., Sirotkin, K.: The automatic identification of stop words. Journal of Information Science 18(1), 45–55 (1992)

    Article  Google Scholar 

  17. Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  18. Lovins, J.B.: Development of a Stemming Algorithm.. Mechanical Translation and Computational Linguistics 11(1), 22–31 (1992)

    Google Scholar 

  19. Garner, S.: Weka: The Waikato environment for knowledge analysis. In: Proceedings of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)

    Google Scholar 

  20. Cooper, G.F., Herskovits, E.: A bayesian method for constructing bayesian belief networks from databases. In: Proceedings of the 7th Conference on Uncertainty in Artificial Intelligence (1991)

    Google Scholar 

  21. Geiger, D., Goldszmidt, M., Provan, G., Langley, P., Smyth, P.: Bayesian network classifiers. Machine Learning, 131–163 (1997)

    Google Scholar 

  22. Lewis, D.D.: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–18. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  23. Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89(1-2), 31–71 (1997)

    Article  MATH  Google Scholar 

  24. Ide, N., Véronis, J.: Introduction to the special issue on word sense disambiguation: the state of the art. Computational linguistics 24(1), 2–40 (1998)

    Google Scholar 

  25. Navigli, R.: Word sense disambiguation: a survey. ACM Computing Surveys (CSUR) 41(2), 10 (2009)

    Article  Google Scholar 

  26. Cano, J.R., Herrera, F., Lozano, M.: On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining. Applied Soft Computing Journal 6(3), 323–332 (2006)

    Article  Google Scholar 

  27. Czarnowski, I., Jedrzejowicz, P.: Instance reduction approach to machine learning and multi-database mining. In: Proceedings of the Scientific Session Organized During XXI Fall Meeting of the Polish Information Processing Society, Informatica, pp. 60–71. ANNALES Universitatis Mariae Curie-Skłodowska, Lublin (2006)

    Google Scholar 

  28. Pyle, D.: Data preparation for data mining. Morgan Kaufmann (1999)

    Google Scholar 

  29. Tsang, E., Yeung, D., Wang, X.: OFFSS: optimal fuzzy-valued feature subset selection. IEEE Transactions on Fuzzy Systems 11(2), 202–213 (2003)

    Article  Google Scholar 

  30. Torkkola, K.: Feature extraction by non parametric mutual information maximization. The Journal of Machine Learning Research 3, 1415–1438 (2003)

    MathSciNet  MATH  Google Scholar 

  31. Dash, M., Liu, H.: Consistency-based search in feature selection. Artificial Intelligence 151(1-2), 155–176 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  32. Liu, H., Motoda, H.: Instance selection and construction for data mining. Kluwer Academic Pub. (2001)

    Google Scholar 

  33. Liu, H., Motoda, H.: Computational methods of feature selection. Chapman & Hall/CRC (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor Santos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Santos, I., Galán-García, P., Santamaría-Ibirika, A., Alonso-Isla, B., Alabau-Sarasola, I., Bringas, P.G. (2013). Adult Content Filtering through Compression-Based Text Classification. In: Herrero, Á., et al. International Joint Conference CISIS’12-ICEUTE´12-SOCO´12 Special Sessions. Advances in Intelligent Systems and Computing, vol 189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33018-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33018-6_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33017-9

  • Online ISBN: 978-3-642-33018-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics