Skip to main content
Log in

Distrust seed set propagation algorithm to detect web spam

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Web spam uses numerous techniques to misguide Web search engines in exchange of financial profit. A myriad of semi-automatic propagation model has been proposed with the purpose of combating Web spam. In this paper, distrust propagation is used to detect Web spam. An automatic distrust seed set propagation algorithm (DSP), which acts as an extension to the seed set to propagate distrust further to detect more Web spam. Experiments are conducted on WEBSPAM-UK2006 and WEBSPAM-UK2007 dataset; the results have shown that DSP enhanced the baseline algorithms and detected 17.73 % more spam hosts in the former dataset and detected 8.59 % more spam hosts in later dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  • Brin, S., & Page, L (1998). The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1), 107–117.

    Article  Google Scholar 

  • Brinkmeier, M. (2006). Pagerank revisited. ACM Transactions on Internet Technology (TOIT), 6(3), 282–301.

    Article  Google Scholar 

  • Castillo, C., Chellapilla, K., & Davison, B.D. (2007). Web spam challenge track i.

  • Castillo, C, Chellapilla, K, & Denoyer, L (2008). Web spam challenge 2008.

  • Chen, Q., Yu, S. N., & Cheng, S. (2008). Link variable trustrank for fighting web spam. In Computer science and software engineering, 2008 international conference on, IEEE, (Vol. 4 pp. 1004–1007).

  • Eiron, N., McCurley, K.S., & Tomlin, J.A. (2004). Ranking the web frontier. In Proceedings of the 13th international conference on World Wide Web (pp. 309–318): ACM.

  • Goh, K. L., & Singh, A. K. (2015). Comprehensive literature review on machine learning structures for web spam classification. Procedia Computer Science, 70, 434–441.

    Article  Google Scholar 

  • Goh, K.L., Patchmuthu, R.K., & Singh, A.K. (2014a). Link-based web spam detection using weight properties. Journal of Intelligent Information Systems, 43(1), 129–145.

    Article  Google Scholar 

  • Goh, K.L.A., Kumar Singh, A., Ravi Kumar, P., & Mohan, A. (2014b). Tprank: Contend with web spam using trust propagation. Cybernetics and Systems, 45(4), 307–323.

    Article  Google Scholar 

  • Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, VLDB Endowment (pp. 576–587).

  • Gyongyi, Z., Berkhin, P., Garcia-Molina, H., & Pedersen, J. (2006). Link spam detection based on mass estimation. In Proceedings of the 32nd international conference on Very large data bases, VLDB Endowment (pp. 439–450).

  • Krishnan, V. (2006). Web spam detection with anti-trust rank. In In AIRWEB (pp. 37–40).

  • Leng, A.G.K., Kumar, P.R., Singh, A.K., & Mohan, A. (2012a). Link-based spam algorithms in adversarial information retrieval. Cybernetics and Systems, 43(6), 459–475.

    Article  Google Scholar 

  • Leng, A.G.K., Patchmuthu, R., & Singh, A.K. (2012b). Incorporating weight properties in detection of web spam. In The 2nd international conference on uncertainty reasoning and knowledge engineering, 14-15 August (pp. 18–21).

  • Li, Z., Qiancheng, J., & Yan, Z. (2008). From good to bad ones: Making spam detection easier. In IEEE 8th International Conference on Computer and Information Technology Workshops (pp. 129–134), DOI 10.1109/CIT. 2008.Workshops.49, (to appear in print).

  • Liang, C., Ru, L., & Zhu, X. (2007). R-spamrank: a spam detection algorithm based on link analysis. Journal of Computational Information Systems, 3(4), 1705–1712.

    Google Scholar 

  • Nie, L., Wu, B., & Davison, B.D. (2007). Winnowing wheat from the chaff: Propagating trust to sift spam from the web. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 869–870): ACM.

  • Shen, G., Gao, B., Liu, T.Y., Feng, G., Song, S., & Li, H. (2006). Detecting link spam using temporal information, IEEE.

  • Sobek, M. (2002). Pr0 - google’s pagerank 0 penalty. URL http://pr.efactory.de/e-pr0.shtml.

  • Wu, B., & Davison, B.D. (2005). Identifying link farm spam pages. In Special interest tracks and posters of the 14th international conference on World Wide Web (pp. 820–829): ACM.

  • Wu, B., Goel, V., & Davison, B.D. (2006a). Propagating trust and distrust to demote web spam. MTW 190.

  • Wu, B., Goel, V., & Davison, B.D. (2006b). Topical trustrank: Using topicality to combat web spam. In Proceedings of the 15th international conference on World Wide Web (pp. 63–72): ACM.

  • Yang, H., King, I., & Lyu, M.R. (2007). Diffusionrank: a possible penicillin for web spamming. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 431–438): ACM.

  • Zhang, X., Han, B., & Liang, W. (2009a). Automatic seed set expansion for trust propagation based anti-spamming algorithms. In Proceedings of the eleventh international workshop on Web information and data management (pp. 31–38): ACM.

  • Zhang, X., Wang, Y., Mou, N., & Liang, W. (2011). Propagating both trust and distrust with target differentiation for combating web spam. In: AAAI.

  • Zhang, Y., Jiang, Q., Zhang, L., & Zhu, Y. (2009b). Exploiting bidirectional links: making spamming detection easier. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 1839–1842): ACM.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kwang Leng Goh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goh, K.L., Patchmuthu, R.K. & Singh, A.K. Distrust seed set propagation algorithm to detect web spam. J Intell Inf Syst 49, 213–235 (2017). https://doi.org/10.1007/s10844-016-0439-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-016-0439-y

Keywords

Navigation