Abstract
Web spam uses numerous techniques to misguide Web search engines in exchange of financial profit. A myriad of semi-automatic propagation model has been proposed with the purpose of combating Web spam. In this paper, distrust propagation is used to detect Web spam. An automatic distrust seed set propagation algorithm (DSP), which acts as an extension to the seed set to propagate distrust further to detect more Web spam. Experiments are conducted on WEBSPAM-UK2006 and WEBSPAM-UK2007 dataset; the results have shown that DSP enhanced the baseline algorithms and detected 17.73 % more spam hosts in the former dataset and detected 8.59 % more spam hosts in later dataset.
Similar content being viewed by others
References
Brin, S., & Page, L (1998). The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1), 107–117.
Brinkmeier, M. (2006). Pagerank revisited. ACM Transactions on Internet Technology (TOIT), 6(3), 282–301.
Castillo, C., Chellapilla, K., & Davison, B.D. (2007). Web spam challenge track i.
Castillo, C, Chellapilla, K, & Denoyer, L (2008). Web spam challenge 2008.
Chen, Q., Yu, S. N., & Cheng, S. (2008). Link variable trustrank for fighting web spam. In Computer science and software engineering, 2008 international conference on, IEEE, (Vol. 4 pp. 1004–1007).
Eiron, N., McCurley, K.S., & Tomlin, J.A. (2004). Ranking the web frontier. In Proceedings of the 13th international conference on World Wide Web (pp. 309–318): ACM.
Goh, K. L., & Singh, A. K. (2015). Comprehensive literature review on machine learning structures for web spam classification. Procedia Computer Science, 70, 434–441.
Goh, K.L., Patchmuthu, R.K., & Singh, A.K. (2014a). Link-based web spam detection using weight properties. Journal of Intelligent Information Systems, 43(1), 129–145.
Goh, K.L.A., Kumar Singh, A., Ravi Kumar, P., & Mohan, A. (2014b). Tprank: Contend with web spam using trust propagation. Cybernetics and Systems, 45(4), 307–323.
Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, VLDB Endowment (pp. 576–587).
Gyongyi, Z., Berkhin, P., Garcia-Molina, H., & Pedersen, J. (2006). Link spam detection based on mass estimation. In Proceedings of the 32nd international conference on Very large data bases, VLDB Endowment (pp. 439–450).
Krishnan, V. (2006). Web spam detection with anti-trust rank. In In AIRWEB (pp. 37–40).
Leng, A.G.K., Kumar, P.R., Singh, A.K., & Mohan, A. (2012a). Link-based spam algorithms in adversarial information retrieval. Cybernetics and Systems, 43(6), 459–475.
Leng, A.G.K., Patchmuthu, R., & Singh, A.K. (2012b). Incorporating weight properties in detection of web spam. In The 2nd international conference on uncertainty reasoning and knowledge engineering, 14-15 August (pp. 18–21).
Li, Z., Qiancheng, J., & Yan, Z. (2008). From good to bad ones: Making spam detection easier. In IEEE 8th International Conference on Computer and Information Technology Workshops (pp. 129–134), DOI 10.1109/CIT. 2008.Workshops.49, (to appear in print).
Liang, C., Ru, L., & Zhu, X. (2007). R-spamrank: a spam detection algorithm based on link analysis. Journal of Computational Information Systems, 3(4), 1705–1712.
Nie, L., Wu, B., & Davison, B.D. (2007). Winnowing wheat from the chaff: Propagating trust to sift spam from the web. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 869–870): ACM.
Shen, G., Gao, B., Liu, T.Y., Feng, G., Song, S., & Li, H. (2006). Detecting link spam using temporal information, IEEE.
Sobek, M. (2002). Pr0 - google’s pagerank 0 penalty. URL http://pr.efactory.de/e-pr0.shtml.
Wu, B., & Davison, B.D. (2005). Identifying link farm spam pages. In Special interest tracks and posters of the 14th international conference on World Wide Web (pp. 820–829): ACM.
Wu, B., Goel, V., & Davison, B.D. (2006a). Propagating trust and distrust to demote web spam. MTW 190.
Wu, B., Goel, V., & Davison, B.D. (2006b). Topical trustrank: Using topicality to combat web spam. In Proceedings of the 15th international conference on World Wide Web (pp. 63–72): ACM.
Yang, H., King, I., & Lyu, M.R. (2007). Diffusionrank: a possible penicillin for web spamming. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 431–438): ACM.
Zhang, X., Han, B., & Liang, W. (2009a). Automatic seed set expansion for trust propagation based anti-spamming algorithms. In Proceedings of the eleventh international workshop on Web information and data management (pp. 31–38): ACM.
Zhang, X., Wang, Y., Mou, N., & Liang, W. (2011). Propagating both trust and distrust with target differentiation for combating web spam. In: AAAI.
Zhang, Y., Jiang, Q., Zhang, L., & Zhu, Y. (2009b). Exploiting bidirectional links: making spamming detection easier. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 1839–1842): ACM.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Goh, K.L., Patchmuthu, R.K. & Singh, A.K. Distrust seed set propagation algorithm to detect web spam. J Intell Inf Syst 49, 213–235 (2017). https://doi.org/10.1007/s10844-016-0439-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-016-0439-y