Abstract
Web spam techniques enable some web pages or sites to achieve undeserved relevance and importance. They can seriously deteriorate search engine ranking results. Combating web spam has become one of the top challenges for web search. This paper proposes to learn a discriminating function to detect web spam by genetic programming. The evolution computation uses multi-populations composed of some small-scale individuals and combines the selected best individuals in every population to gain a possible best discriminating function. The experiments on WEBSPAM-UK2006 show that the approach can improve spam classification recall performance by 26%, F-measure performance by 11%, and accuracy performance by 4% compared with SVM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R.: Using Rank Propagation and Probabilistic Counting for Link Based Spam Detection. In: Nasraoui, O., Spiliopoulou, M., Srivastava, J., Mobasher, B., Masand, B. (eds.) WebKDD 2006. LNCS (LNAI), vol. 4811, pp. 127–146. Springer, Heidelberg (2007)
Gyongyi, Z., Garcia-Molina, H.: Web Spam Taxonomy. In: Proc. of First Workshop on Adversarial Information Retrieval on the Web (2005)
Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th International Conference on World Wide Web, WWW 2006, Edinburgh, Scotland, May 23 - 26, pp. 83–92. ACM Press, New York (2006)
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: Proceedings of the Thirtieth international Conference on Very Large Data Bases, vol. 30, pp. 576–587 (2004)
Krishnan, V., Raj, R.: Web Spam Detection with Anti-Trust-Rank. In: The 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb) (August 2006)
Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your Neighbors: Web Spam Detection using the Web Topology. Technologies Project (November 2006)
Geng, G.G., Wang, C.H., Li, Q.D., Xu, L., Jin, X.B.: Boosting the Performace of Web Spam Detection with Ensemble Under-Sampling Classification. In: Proc. of the 4th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007 (August 2007)
Liu, Y., Cen, R., Zhang, M., Ma, S., Ru, L.: Identifying Web Spam with User Behavior Analysis. In: AIRWeb 2008, Beijing, China, April 22 (2008)
Dai, N., Davison, B.D., Qi, X.: Looking into the Past to Better Classify Web Spam. In: AIRWeb ’09, Madrid, Spain (April 21, 2009)
Koza, J.R.: Genetic Programming: on the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Zhang, M., Wong, P.: Genetic programming for medical classification: a program simplification approach. Genet. Program. Evolvable Mach. 9, 229–255 (2008)
Kishore, J.K., Patnaik, L.M., Mani, V., Agrawal, V.K.: Application of genetic programming for multi-category pattern classification. IEEE Trans. Evol. Comput. 4(3), 242–258 (2000)
Lin, J.-Y., Ke, H.-R., Chien, B.-C., Yang, W.-P.: Designing a classifier by a layered multi-population genetic programming approach. Pattern Recognition 40, 2211–2225 (2007)
Fernández, F., Tomassini, M., Vanneschi, L.: An Empirical Study of Multi-population Genetic Programming. Genet. Programming Evolvable Mach. 4, 21–51 (2003)
Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M., Vigna, S.: A reference collection for web spam detection. ACM SIGIR Forum 40(2), 11–24 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Niu, X., Ma, J., He, Q., Wang, S., Zhang, D. (2010). Learning to Detect Web Spam by Genetic Programming. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-14246-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)