skip to main content
10.1145/2505515.2507849acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Objectionable content filtering by click-through data

Published: 27 October 2013 Publication History

Abstract

This paper explores users' browsing intents to predict the category of a user's next access during web surfing, and applies the results to objectionable content filtering. A user's access trail represented as a sequence of URLs reveals the contextual information of web browsing behaviors. We extract behavioral features of each clicked URL, i.e., hostname, bag-of-words, gTLD, IP, and port, to develop a linear chain CRF model for context-aware category prediction. Large-scale experiments show that our method achieves a promising accuracy of 0.9396 for objectionable access identification without requesting their corresponding page content. Error analysis indicates that our proposed model results in a low false positive rate of 0.0571. In real-life filtering simulations, our proposed model accomplishes macro-averaging blocking rate 0.9271, while maintaining a favorably low macro-averaging over-blocking rate 0.0575 for collaboratively filtering objectionable content with time change on the dynamic web.

References

[1]
Beitzel, S. M., Jensen, E. C., Chowdhury, A., Frieder, O., and Grossman, D. 2007. Temporal analysis of a very large topically categorized web query log. J. Am. Soc. Inf. Sci. Tec. 58, 2 (Jan. 2007), 166--178. DOI=http://dx.doi.org/10.1002/asi.v58:2.
[2]
Deselaers, T., Pimenidis, L., and Hey, H. 2008. Bag-of-visual-words models for adult image classification and filtering. In Proceedings of the 19th International Conference on Pattern Recognition (Tampa, Florida, USA, December 08--11, 2008). ICPR'08. IEEE, Piscataway, NJ, 1--4. DOI=http://dx.doi.org/10.1109/ICPR.2008.4761366.
[3]
Eickhoff, C., Serdyukov, P., and Vries, A. P. 2010. Web page classification on child suitability. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (Toronto, Canada, October 26--30, 2010). CIKM'10. ACM, New York, NY, 1425--1428. DOI=http://doi.acm.org/10.1145/1871437.1871638.
[4]
Hammami, M., Chahir, Y., and Chen, L. 2006. WebGuard: a web filtering engine combining textual, structural, and visual content-based analysis. IEEE T. Knowl. Data En. 18, 2 (February. 2006), 272--284. DOI=http://dx.doi.org/10.1109/TKDE.2006.34.
[5]
Jansohn, C., Ulges, A. and Breuel, T. M. 2009. Detecting pornographic video content by combining image features with motion information. In Proceedings of the 17th ACM International Conference on Multimedia (Beijing, China, October 19--23, 2009). MM'09. ACM, New York, NY, 601--604. DOI=http://doi.acm.org/10.1145/1631272.1631366.
[6]
Lee, L.-H., and Chen, H.-H. 2011. Collaborative blacklist generation via searches-and-clicks. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (Glasgow, Scotland, UK, October 24--28, 2011). CIKM'11. ACM, New York, NY, 2153--2156. DOI=http://doi.acm.org/10.1145/2063576.2063914.
[7]
Lee, L.-H., and Chen, H.-H. 2011. Collaborative cyberporn filtering with collective intelligence. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (Beijing, China, July 24--28, 2011). SIGIR'11. ACM, New York, NY, 1153--1154. DOI=http://doi.acm.org/10.1145/2009916.201009
[8]
Lee, L.-H., and Chen, H.-H. 2012. Mining search intents for collaborative cyberporn filtering. J. Am. Soc. Inf. Sci. Tec. 63, 2 (February. 2012), 366--376. DOI=http://dx.doi.org/10.1002/asi.21668.
[9]
Lee, L.-H., and Luh, C.-J. 2008. Generation of pornographic blacklist and its incremental update using an inverse chi-square based method. Inform. Process. Manag. 44, 5 (September. 2008), 1698--1706. DOI=http://dx.doi.org/10.1016/j.ipm.2008.05.001.
[10]
Lee, P. Y., Hui, S. C., and Fong, A. C. M. 2002. Neural networks for web content filtering. IEEE Intell. Syst. 17, 5 (September/October. 2002), 48--57. DOI=http://dx.doi.org/10.1109/MIS.2002.1039832.
[11]
Szummer, M., and Craswell, N. 2008. Behavioral classification on the click graph. In Proceedings of the 17th International Conference on World Wide Web (Beijing, China, April 21--25, 2008). WWW'08. ACM, New York, NY, 1241--1242. DOI=http://doi.acm.org/10.1145/1367497.1367746.
[12]
Zhang. J., Qin, J., and Yan, Q. 2006. The role of URLs in objectionable web content categorization. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (Hong Kong, China, December 18--22, 2006). WI'06. IEEE, New Jersey, NJ, 277--283. DOI=http://dx.doi.org/10.1109/WI.2006.170.

Cited By

View all
  • (2023)Filtering objectionable information access based on click-through behaviours with deep learning methodsJournal of Information Science10.1177/01655515231160041Online publication date: 7-Mar-2023
  • (2023)Multi-Perspective Learning to Rank to Support Children's Information Seeking in the Classroom2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT59888.2023.00050(311-317)Online publication date: 26-Oct-2023
  • (2018)Towards ontology-based multilingual URL filteringThe Journal of Supercomputing10.5555/3288339.328835474:10(5003-5021)Online publication date: 1-Oct-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. click-through mining
  2. collaborative filtering
  3. internet censorship

Qualifiers

  • Poster

Conference

CIKM'13
Sponsor:
CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
October 27 - November 1, 2013
California, San Francisco, USA

Acceptance Rates

CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Filtering objectionable information access based on click-through behaviours with deep learning methodsJournal of Information Science10.1177/01655515231160041Online publication date: 7-Mar-2023
  • (2023)Multi-Perspective Learning to Rank to Support Children's Information Seeking in the Classroom2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT59888.2023.00050(311-317)Online publication date: 26-Oct-2023
  • (2018)Towards ontology-based multilingual URL filteringThe Journal of Supercomputing10.5555/3288339.328835474:10(5003-5021)Online publication date: 1-Oct-2018
  • (2018)Towards ontology-based multilingual URL filtering: a big data problemThe Journal of Supercomputing10.1007/s11227-018-2338-174:10(5003-5021)Online publication date: 1-Oct-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media