Abstract:
The WWW is a major source of unintentional exposure to pornography. Current content filtering technology using blacklisting or simple keyword searching is ineffective - t...Show MoreMetadata
Abstract:
The WWW is a major source of unintentional exposure to pornography. Current content filtering technology using blacklisting or simple keyword searching is ineffective - today's filters have many false positives and negatives, and require tedious manual updating. This study examined how content filtering of pornographic Web page text, based on structural and statistical analysis, could greatly improve accuracy. Systematic differences between pornographic and nonpornographic Web pages were found, with Bayesian classification yielding 99.1% accuracy in text classification from pornographic and non-pornographic corpora
Published in: 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583)
Date of Conference: 10-13 October 2004
Date Added to IEEE Xplore: 07 March 2005
Print ISBN:0-7803-8566-7
Print ISSN: 1062-922X