Distrust seed set propagation algorithm to detect web spam

Goh, Kwang Leng; Patchmuthu, Ravi Kumar; Singh, Ashutosh Kumar

doi:10.1007/s10844-016-0439-y

Distrust seed set propagation algorithm to detect web spam

Published: 10 January 2017

Volume 49, pages 213–235, (2017)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Kwang Leng Goh¹,
Ravi Kumar Patchmuthu² &
Ashutosh Kumar Singh³

384 Accesses
3 Citations
Explore all metrics

Abstract

Web spam uses numerous techniques to misguide Web search engines in exchange of financial profit. A myriad of semi-automatic propagation model has been proposed with the purpose of combating Web spam. In this paper, distrust propagation is used to detect Web spam. An automatic distrust seed set propagation algorithm (DSP), which acts as an extension to the seed set to propagate distrust further to detect more Web spam. Experiments are conducted on WEBSPAM-UK2006 and WEBSPAM-UK2007 dataset; the results have shown that DSP enhanced the baseline algorithms and detected 17.73 % more spam hosts in the former dataset and detected 8.59 % more spam hosts in later dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Brin, S., & Page, L (1998). The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1), 107–117.
Article Google Scholar
Brinkmeier, M. (2006). Pagerank revisited. ACM Transactions on Internet Technology (TOIT), 6(3), 282–301.
Article Google Scholar
Castillo, C., Chellapilla, K., & Davison, B.D. (2007). Web spam challenge track i.
Castillo, C, Chellapilla, K, & Denoyer, L (2008). Web spam challenge 2008.
Chen, Q., Yu, S. N., & Cheng, S. (2008). Link variable trustrank for fighting web spam. In Computer science and software engineering, 2008 international conference on, IEEE, (Vol. 4 pp. 1004–1007).
Eiron, N., McCurley, K.S., & Tomlin, J.A. (2004). Ranking the web frontier. In Proceedings of the 13th international conference on World Wide Web (pp. 309–318): ACM.
Goh, K. L., & Singh, A. K. (2015). Comprehensive literature review on machine learning structures for web spam classification. Procedia Computer Science, 70, 434–441.
Article Google Scholar
Goh, K.L., Patchmuthu, R.K., & Singh, A.K. (2014a). Link-based web spam detection using weight properties. Journal of Intelligent Information Systems, 43(1), 129–145.
Article Google Scholar
Goh, K.L.A., Kumar Singh, A., Ravi Kumar, P., & Mohan, A. (2014b). Tprank: Contend with web spam using trust propagation. Cybernetics and Systems, 45(4), 307–323.
Article Google Scholar
Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, VLDB Endowment (pp. 576–587).
Gyongyi, Z., Berkhin, P., Garcia-Molina, H., & Pedersen, J. (2006). Link spam detection based on mass estimation. In Proceedings of the 32nd international conference on Very large data bases, VLDB Endowment (pp. 439–450).
Krishnan, V. (2006). Web spam detection with anti-trust rank. In In AIRWEB (pp. 37–40).
Leng, A.G.K., Kumar, P.R., Singh, A.K., & Mohan, A. (2012a). Link-based spam algorithms in adversarial information retrieval. Cybernetics and Systems, 43(6), 459–475.
Article Google Scholar
Leng, A.G.K., Patchmuthu, R., & Singh, A.K. (2012b). Incorporating weight properties in detection of web spam. In The 2nd international conference on uncertainty reasoning and knowledge engineering, 14-15 August (pp. 18–21).
Li, Z., Qiancheng, J., & Yan, Z. (2008). From good to bad ones: Making spam detection easier. In IEEE 8th International Conference on Computer and Information Technology Workshops (pp. 129–134), DOI 10.1109/CIT. 2008.Workshops.49, (to appear in print).
Liang, C., Ru, L., & Zhu, X. (2007). R-spamrank: a spam detection algorithm based on link analysis. Journal of Computational Information Systems, 3(4), 1705–1712.
Google Scholar
Nie, L., Wu, B., & Davison, B.D. (2007). Winnowing wheat from the chaff: Propagating trust to sift spam from the web. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 869–870): ACM.
Shen, G., Gao, B., Liu, T.Y., Feng, G., Song, S., & Li, H. (2006). Detecting link spam using temporal information, IEEE.
Sobek, M. (2002). Pr0 - google’s pagerank 0 penalty. URL http://pr.efactory.de/e-pr0.shtml.
Wu, B., & Davison, B.D. (2005). Identifying link farm spam pages. In Special interest tracks and posters of the 14th international conference on World Wide Web (pp. 820–829): ACM.
Wu, B., Goel, V., & Davison, B.D. (2006a). Propagating trust and distrust to demote web spam. MTW 190.
Wu, B., Goel, V., & Davison, B.D. (2006b). Topical trustrank: Using topicality to combat web spam. In Proceedings of the 15th international conference on World Wide Web (pp. 63–72): ACM.
Yang, H., King, I., & Lyu, M.R. (2007). Diffusionrank: a possible penicillin for web spamming. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 431–438): ACM.
Zhang, X., Han, B., & Liang, W. (2009a). Automatic seed set expansion for trust propagation based anti-spamming algorithms. In Proceedings of the eleventh international workshop on Web information and data management (pp. 31–38): ACM.
Zhang, X., Wang, Y., Mou, N., & Liang, W. (2011). Propagating both trust and distrust with target differentiation for combating web spam. In: AAAI.
Zhang, Y., Jiang, Q., Zhang, L., & Zhu, Y. (2009b). Exploiting bidirectional links: making spamming detection easier. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 1839–1842): ACM.

Download references

Author information

Authors and Affiliations

Department of Mechanical Engineering, Curtin University, Bentley Campus, Bentley, Australia
Kwang Leng Goh
Department of Computer and Network Engineering, Jefri Bolkiah College of Engineering, Kuala Belait, Brunei
Ravi Kumar Patchmuthu
National Institute of Technology, Kurukshetra, Haryana, India
Ashutosh Kumar Singh

Authors

Kwang Leng Goh
View author publications
You can also search for this author in PubMed Google Scholar
Ravi Kumar Patchmuthu
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kwang Leng Goh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goh, K.L., Patchmuthu, R.K. & Singh, A.K. Distrust seed set propagation algorithm to detect web spam. J Intell Inf Syst 49, 213–235 (2017). https://doi.org/10.1007/s10844-016-0439-y

Download citation

Received: 04 June 2015
Revised: 11 December 2016
Accepted: 13 December 2016
Published: 10 January 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s10844-016-0439-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distrust seed set propagation algorithm to detect web spam

Abstract

Access this article

Similar content being viewed by others

Web Spam Detection: New Approach with Hidden Markov Models

Unsupervised Spam Detection in Hyves Using SALSA

A Study of Spam Detection Algorithm on Social Media Networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distrust seed set propagation algorithm to detect web spam

Abstract

Access this article

Similar content being viewed by others

Web Spam Detection: New Approach with Hidden Markov Models

Unsupervised Spam Detection in Hyves Using SALSA

A Study of Spam Detection Algorithm on Social Media Networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation