Skip to main content

Simple but Effective Porn Query Recognition by k-NN with Semantic Similarity Measure

  • Conference paper
Advances in Data and Web Management (APWeb 2009, WAIM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5446))

  • 1177 Accesses

Abstract

Access to sexual information has to be given some restricts on commercial search engine. Compared with filtering porn contents directly, we prefer to recognize porn queries and recommend appropriate ones considering several potential advantages. However, how to recognize them in an automatic way is not a trivial job due that its short length, in most scenarios, doesn’t allow enough information for machine to make correct decision. In this paper, a simple but effective solution is proposed to recognize porn queries as exist in very large query log. Instead of checking purely if there are sensitive words contained in the queries, which may work for some cases but has obvious limitations, we go a little further by collecting and studying the semantic content of queries. Our experiments with real data demonstrate that small cost in training a k-Nearest Neighbor classifier (k-NN) will bring us quite impressive classification performance, especially the recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Spink, A., Ozmutlu, H.C., Lorence, D.P.: Web searching for sexual information: an exploratory study. Journal of Information Processing and Management 40, 113–123 (2004)

    Article  Google Scholar 

  2. Cooper, A.: Sexuality and Internet: surfing into the new millennium. Journal of CyberPsychology and Behavior 1(2), 181–187 (1998)

    Google Scholar 

  3. Cooper, A. (ed.): Sex and the Internet: A Guidebook for Clinicians. Brunner-Routledge, New York (2002)

    Google Scholar 

  4. Yang, Y., Zhang, J., Kisiel, B.: A scalability analysis of classifiers in text categorization. In: Proceedings of 26th ACM SIGIR, pp. 96–103 (2003)

    Google Scholar 

  5. Lew, M., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. ACM Transactions on Multimedia Computing, Communications and Applications, 1–19 (2006)

    Google Scholar 

  6. Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval, ideas, influences and trends of the new age. ACM Computing Surveys 40(2) (2008)

    Google Scholar 

  7. Cramer, J.K., Hersh, W.: Medical image retrieval and automatic annotation: OHSU at ImageCLEF 2007. In: Working Notes for the Cross Language Evaluation Forum (CLEF) Workshop (2007)

    Google Scholar 

  8. Kamvar, M., Baluja, S.: Query suggestions for mobile search: understanding usage patterns. In: Proceedings of the SIGCHI conference on Human Factors in computing systems (CHI) (2008)

    Google Scholar 

  9. Sahami, M.: Mining the Web to determine similarity between words, objects, and communities. In: Proceedings of AAAI FLAIRS (2006)

    Google Scholar 

  10. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1983)

    MATH  Google Scholar 

  11. Shakhnarovich, G., Darrell, T., Indyk, P. (eds.): Nearest-Neighbor Methods in Learning and Vision: Theory and Practice. MIT Press, Cambridge (2006)

    Google Scholar 

  12. http://en.wikipedia.org/wiki/K-nearestneighbor_al-gorithm

  13. Fu, S., Pi, B., Han, S., Zou, G., Guo, J., Wang, W.: User-centered solution to detect near-duplicate pages on mobile search engine. In: Proceedings of 31th ACM SIGIR Workshop on Mobile IR, Singapore (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fu, S. et al. (2009). Simple but Effective Porn Query Recognition by k-NN with Semantic Similarity Measure. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00672-2_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00671-5

  • Online ISBN: 978-3-642-00672-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics