Skip to main content

Improving Spamdexing Detection Via a Two-Stage Classification Strategy

  • Conference paper
Information Retrieval Technology (AIRS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4993))

Included in the following conference series:

Abstract

Spamdexing is any of various methods to manipulate the relevancy or prominence of resources indexed by a search engine, usually in a manner inconsistent with the purpose of the indexing system. Combating Spamdexing has become one of the top challenges for web search. Machine learning based methods have shown their superiority for being easy to adapt to newly developed spam techniques. In this paper, we propose a two-stage classification strategy to detect web spam, which is based on the predicted spamicity of learning algorithms and hyperlink propagation. Preliminary experiments on standard WEBSPAM-UK2006 benchmark show that the two-stage strategy is reasonable and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Becchetti, L., Castillo1, C., Donato1, D., Leonardi, S., Baeza-Yates, R.: Using Rank Propagation and Probabilistic Counting for Link Based Spam Detection. In: Proc. of WebKDD 2006 (August 2006)

    Google Scholar 

  2. Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your Neighbors: Web Spam Detection using the Web Topology. Technologies Project (November 2006)

    Google Scholar 

  3. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Stanford Digital Library Technologies Project (1998)

    Google Scholar 

  4. Benczúr, A.A., Csalogány, K., Sarlós, T., Uher, M.: Spamrank: Fully Automatic Link Spam Detection. In: Proc. of AIRWeb 2005, May 2005, Chiba, Japan (2005)

    Google Scholar 

  5. Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting Spam Web Pages through Content Analysis. In: Proc. of the World Wide Web conference (May 2006)

    Google Scholar 

  6. Yahoo! Research: Web Collection UK-2006, http://research.yahoo.com/ Crawled by the Laboratory of Web Algorithmics, University of Milan (retrieved Febrary 2007), http://law.dsi.unimi.it/

  7. Gyöngyi, Z., Molina, H.G., Pedersen, J.: Combating Web Spam with TrustRank. In: Proc. of the Thirtieth International Conference on Very Large Data Bases, August 2004, Toronto, Canada (2004)

    Google Scholar 

  8. Benczúr, A., Csalogány, K., Sarlós, T.: Link-based Similarity Search to Fight Web Spam. In: Proc. of AIRWeb 2006 (2006)

    Google Scholar 

  9. Geng, G.G., Wang, C.H., Jin, X.B., Li, Q.D., Xu, L.: IACAS at Web Spam Challenge 2007 Track I, Web Spam Challenge (2007)

    Google Scholar 

  10. Wu, B.N., Davison, B.: Cloaking and Redirection: a Preliminary Study. In: Proc. of the 1st International Workshop on Adversarial Information Retrieval on the Web (May 2005)

    Google Scholar 

  11. Gyöngyi, Z., Garcia-Molina, H.: Web Spam Taxonomy. In: Proc. of First Workshop on Adversarial Information Retrieval on the Web (2005)

    Google Scholar 

  12. Weiss, G.M.: Mining with Rarity - Problems and Solutions: A Unifying Framework. In: SIGKDD Explorition (2004)

    Google Scholar 

  13. Preund, Y., Schapire, R.E.: A Decision-theoretic Generalization of on-line Learning and an Application to Boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)

    Article  MathSciNet  Google Scholar 

  14. Gyöngyi, Z., Molina, H.G.: Link Spam Alliances, Technical Report (September 2005)

    Google Scholar 

  15. Witten, I.H., Frank, E.: Data Mining: Pratical Machine Learning Tools and Techniques. 2nd edition. Morgan Kaufmann (2005)

    Google Scholar 

  16. Henzinger, M., Motwani, R., Silverstein, C.: Challenges in web search engines. SIGIR Forum (2002)

    Google Scholar 

  17. Gan, Q.Q., Suel, T.: Improving Web Spam Classifiers Using Link Structure. In: AIRWeb 2007, May 2007, Banff, Canada (2007)

    Google Scholar 

  18. Geng, G.G., Wang, C.H., Li, Q.D., Xu, L., Jin, X.B.: Boosting the Performace of Web Spam Detection with Ensemble Under-Sampling Classification. In: Proc. of the 4th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007 (August 2007)

    Google Scholar 

  19. Benczúr, A., Biró, I., Csalogány, K., Sarlós, T.: Web Spam Detection via Commercial Intent Analysis. In: Proc. of the 3rd International Workshop on Adversarial Information Retrieval on the Web, May 2007, Banff, Canada (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hang Li Ting Liu Wei-Ying Ma Tetsuya Sakai Kam-Fai Wong Guodong Zhou

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Geng, GG., Wang, CH., Li, QD. (2008). Improving Spamdexing Detection Via a Two-Stage Classification Strategy. In: Li, H., Liu, T., Ma, WY., Sakai, T., Wong, KF., Zhou, G. (eds) Information Retrieval Technology. AIRS 2008. Lecture Notes in Computer Science, vol 4993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68636-1_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68636-1_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68633-0

  • Online ISBN: 978-3-540-68636-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics