skip to main content
10.1145/1871437.1871638acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Web page classification on child suitability

Published:26 October 2010Publication History

ABSTRACT

Children spend significant amounts of time on the Internet. Recent studies showed, that during these periods they are often not under adult supervision. This work presents an automatic approach to identifying suitable web pages for children based on topical and non-topical web page aspects. We discuss the characteristics of children's web sites with respect to recent findings in children's psychology and cognitive sciences. We finally evaluate our approach in a large-scale user study, finding, that it compares favourably to state of the art methods while approximating human performance.

References

  1. PuppyIR: An Open Source Environment to Construct Information Services for Children. http://www.puppyir.eu.Google ScholarGoogle Scholar
  2. Ask Kids. http://www.askkids.com, 2010.Google ScholarGoogle Scholar
  3. CrowdFlower. http://www.crowdflower.com, 2010.Google ScholarGoogle Scholar
  4. The Open Directory Project - Kids & Teens. http://www.dmoz.org/kids and teens/, 2010.Google ScholarGoogle Scholar
  5. Yahoo! Kids. http://kids.yahoo.com/, 2010.Google ScholarGoogle Scholar
  6. P.N. Bennett and N. Nguyen. Refined experts: improving classification in large taxonomies. In SIGIR 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Callan and M. Eskenazi. Combining lexical and grammatical features to improve readability measures for first and second language texts. In NAACL HLT, 2007.Google ScholarGoogle Scholar
  8. C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In SIGIR 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Collins-Thompson and J. Callan. A language modeling approach to predicting reading difficulty. In Proceedings of HLT/NAACL, volume 4, 2004.Google ScholarGoogle Scholar
  10. L. Feng. Automatic readability assessment for people with intellectual disabilities. ACM SIGACCESS, (93), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Feng, N. Elhadad, and M. Huenerfauth. Cognitively motivated features for readability assessment. In EACL, pages 229--237. ACL, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Gabrilovich and S. Markovitch. Harnessing the expertise of 70,000 human editors: Knowledge-based feature generation for text categorization. Journal of Machine Learning Research, 8:2297--2345, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Golub and A. Ardo. Importance of HTML structural elements and metadata in automated subject classification. ECDL 2005, pages 368--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. R. Klare. The measurement of readability: useful information for communicators. ACM Journal of Computer Documentation (JCD), 24(3):121, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Kolari, T. Finin, and A. Joshi. SVMs for the blogosphere: Blog identification and splog detection. In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.Google ScholarGoogle Scholar
  16. A. Large, J. Beheshti, and T. Rahman. Design criteria for children's Web portals: The users speak out. JASIST, 53(2):79--94, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Liu, M. Hu, and J. Cheng. Opinion observer: Analyzing and comparing opinions on the web. In WWW 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Y. Liu, Y. Yang, H. Wan, H. J. Zeng, Z. Chen, and W. Y. Ma. Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explorations Newsletter, 7(1):43, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Naidu. Evaluating the usability of educational websites for children. Usability News, 7(2), 2005.Google ScholarGoogle Scholar
  20. A. Ntoulas, G. Chao, and J. Cho. The infocious web search engine: Improving web searching through linguistic analysis. In WWW 2005, pages 840--849. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ofcom. Uk children's media literacy: Research document. http://www.ofcom.org.uk/advice /medialiteracy/medlitpub/medlitpubrss/ukchildrensml/ukchildrensml1.pdf, March 2010.Google ScholarGoogle Scholar
  22. E. Pitler and A. Nenkova. Revisiting readability: A unified framework for predicting text quality. In EMNLP 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Schwarm and M. Ostendorf. Reading level assessment using support vector machines and statistical language models. In ACL 2005, volume 43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. A. Wartella, E. A. Vandewater, and V. J. Rideout. Introduction: electronic media use in the lives of infants, toddlers, and preschoolers. American Behavioral Scientist, 48(5):501, 2005.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Web page classification on child suitability

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
          October 2010
          2036 pages
          ISBN:9781450300995
          DOI:10.1145/1871437

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 October 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader