Skip to main content

Learning to Detect Web Spam by Genetic Programming

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6184))

Abstract

Web spam techniques enable some web pages or sites to achieve undeserved relevance and importance. They can seriously deteriorate search engine ranking results. Combating web spam has become one of the top challenges for web search. This paper proposes to learn a discriminating function to detect web spam by genetic programming. The evolution computation uses multi-populations composed of some small-scale individuals and combines the selected best individuals in every population to gain a possible best discriminating function. The experiments on WEBSPAM-UK2006 show that the approach can improve spam classification recall performance by 26%, F-measure performance by 11%, and accuracy performance by 4% compared with SVM.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R.: Using Rank Propagation and Probabilistic Counting for Link Based Spam Detection. In: Nasraoui, O., Spiliopoulou, M., Srivastava, J., Mobasher, B., Masand, B. (eds.) WebKDD 2006. LNCS (LNAI), vol. 4811, pp. 127–146. Springer, Heidelberg (2007)

    Google Scholar 

  2. Gyongyi, Z., Garcia-Molina, H.: Web Spam Taxonomy. In: Proc. of First Workshop on Adversarial Information Retrieval on the Web (2005)

    Google Scholar 

  3. Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th International Conference on World Wide Web, WWW 2006, Edinburgh, Scotland, May 23 - 26, pp. 83–92. ACM Press, New York (2006)

    Chapter  Google Scholar 

  4. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: Proceedings of the Thirtieth international Conference on Very Large Data Bases, vol. 30, pp. 576–587 (2004)

    Google Scholar 

  5. Krishnan, V., Raj, R.: Web Spam Detection with Anti-Trust-Rank. In: The 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb) (August 2006)

    Google Scholar 

  6. Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your Neighbors: Web Spam Detection using the Web Topology. Technologies Project (November 2006)

    Google Scholar 

  7. Geng, G.G., Wang, C.H., Li, Q.D., Xu, L., Jin, X.B.: Boosting the Performace of Web Spam Detection with Ensemble Under-Sampling Classification. In: Proc. of the 4th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007 (August 2007)

    Google Scholar 

  8. Liu, Y., Cen, R., Zhang, M., Ma, S., Ru, L.: Identifying Web Spam with User Behavior Analysis. In: AIRWeb 2008, Beijing, China, April 22 (2008)

    Google Scholar 

  9. Dai, N., Davison, B.D., Qi, X.: Looking into the Past to Better Classify Web Spam. In: AIRWeb ’09, Madrid, Spain (April 21, 2009)

    Google Scholar 

  10. Koza, J.R.: Genetic Programming: on the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  11. Zhang, M., Wong, P.: Genetic programming for medical classification: a program simplification approach. Genet. Program. Evolvable Mach. 9, 229–255 (2008)

    Article  Google Scholar 

  12. Kishore, J.K., Patnaik, L.M., Mani, V., Agrawal, V.K.: Application of genetic programming for multi-category pattern classification. IEEE Trans. Evol. Comput. 4(3), 242–258 (2000)

    Article  Google Scholar 

  13. Lin, J.-Y., Ke, H.-R., Chien, B.-C., Yang, W.-P.: Designing a classifier by a layered multi-population genetic programming approach. Pattern Recognition 40, 2211–2225 (2007)

    Article  MATH  Google Scholar 

  14. Fernández, F., Tomassini, M., Vanneschi, L.: An Empirical Study of Multi-population Genetic Programming. Genet. Programming Evolvable Mach. 4, 21–51 (2003)

    Article  MATH  Google Scholar 

  15. Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M., Vigna, S.: A reference collection for web spam detection. ACM SIGIR Forum 40(2), 11–24 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Niu, X., Ma, J., He, Q., Wang, S., Zhang, D. (2010). Learning to Detect Web Spam by Genetic Programming. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14246-8_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14245-1

  • Online ISBN: 978-3-642-14246-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics