Web Spam Identification with User Browsing Graph

Yu, Huijia; Liu, Yiqun; Zhang, Min; Ru, Liyun; Ma, Shaoping

doi:10.1007/978-3-642-04769-5_4

Huijia Yu²³,
Yiqun Liu²³,
Min Zhang²³,
Liyun Ru²³ &
…
Shaoping Ma²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5839))

Included in the following conference series:

Asia Information Retrieval Symposium

919 Accesses
5 Citations

Abstract

Combating Web spam has become one of the top challenges for Web search engines. Most previous researches in link-based Web spam identification focus on exploiting hyperlink graphs and corresponding user-behavior models. However, the fact that hyperlinks can be easily added and removed by Web spammers makes hyperlink graph unreliable. We construct a user browsing graph based on users’ Web access log and adopt link analysis algorithms on this graph to identify Web spam pages. The constructed graph is much smaller than the original Web Graph, and link analysis algorithms can perform efficiently on them. Comparative experimental results also show that algorithms performed on the constructed graph outperforms those on the original graph.

Supported by the Chinese National Key Foundation Research & Development Plan (2004CB318108), Natural Science Foundation (60621062, 60503064, 60736044) and National 863 High Technology Project (2006AA01Z141).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Advanced Approach for Link-Based Spam Detection Using Machine Learning

Detecting spam web pages using content and link-based techniques

Article 01 February 2016

AdaGraph: Adaptive Graph-Based Algorithms for Spam Detection in Social Networks

References

CNNIC (China Internet Network Information Center), the 23th report in development of Internet in China, http://www.cnnic.net.cn/uploadfiles/pdf/2009/1/13/92458.pdf
Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large Web search engine query log. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 6–12. ACM Press, California (1999)
Google Scholar
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating Web spam with TrustRank. In: Proceedings of the 30th VLDB Conference, pp. 576–587. ACM Press, Toronto (2004)
Google Scholar
Benczúr, A.A., Csalogány, K., Sarlós, T., et al.: SpamRank-Fully Automatic Link Spam Detection Work in progress. In: 1st international Workshop on Adversarial information Retrieval on the Web, Chiba (2005), http://airweb.cse.lehigh.edu/2005/benczur.pdf
Craswell, N., Hawking, D., Robertson, S.: Effective site finding using link anchor information. In: Proceedings of the 24th SIGIR Conference, pp. 250–257. ACM Press, New Orleans (2001)
Google Scholar
Liu, Y., Gao, B., Liu, T., Zhang, Y., Ma, Z., He, S., Li, H.: BrowseRank: letting Web users vote for page importance. In: Proceedings of the 31st SIGIR Conference, pp. 451–458. ACM Press, Singapore (2008)
Google Scholar
Bilenko, M., White, R.W.: Mining the search trails of surfing crowds: identifying relevant Websites from user activity. In: Proceeding of the 17th WWW Conference, pp. 51–60. ACM Press, Beijing (2008)
Google Scholar
Liu, Y., Cen, R., Zhang, M., Ma, S., Ru, L.: Identifying Web spam with user behavior analysis. In: 4th international Workshop on Adversarial information Retrieval on the Web, pp. 9–16. ACM Press, Beijing (2008)
Google Scholar
Wu, B., Goel, V., Davison, B.D.: Topical TrustRank: Using topicality to combat web spam. In: Proceedings of the 15th WWW Conference, pp. 63–72. ACM Press, Scotland (2006)
Google Scholar
Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th WWW Conference, pp. 83–92. ACM Press, Scotland (2006)
Google Scholar
Svore, K., Wu, Q., Burges, C., Raman, A.: Improving Web Spam Classification using Rank-time Features. In: Proceedings of AIRWeb 2007, pp. 9–16. ACM Press, New York (2007)
Google Scholar
Liu, Y., Zhang, M., Ma, S.: Web key resource page selection based on non content information. J. Transactions on Intelligent System 2(1), 45–52 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Lab of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, P.R. China
Huijia Yu, Yiqun Liu, Min Zhang, Liyun Ru & Shaoping Ma

Authors

Huijia Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yiqun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liyun Ru
View author publications
You can also search for this author in PubMed Google Scholar
Shaoping Ma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
School of Computing, The Robert Gordon University, St Andrew Street, AB25 1HG, Aberdeen, UK
Dawei Song
Microsoft Reseach Asia, 5F Beijing Sigma Center, 49 Zhichun Road, Haidian District, 100190, Beijing, P.R. China
Chin-Yew Lin
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, 101-8430, Tokyo, Japan
Akiko Aizawa
School of Literature, Shirayuri College, 1-25 Midorigaoka, Chofu-shi, 182-8525, Tokyo, Japan
Kazuko Kuriyama
Graduate School of Information Science and Technology, Hokkaido University, North 14 West 9, Kita-ku. Sapporo-shi, 060-0814, Hokkaido, Japan
Masaharu Yoshioka
Microsoft Research Asia, 5F Beijing Sigma Center, 49 Zhichun Road, Haidian District, 100190, Beijing, P.R. China
Tetsuya Sakai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, H., Liu, Y., Zhang, M., Ru, L., Ma, S. (2009). Web Spam Identification with User Browsing Graph. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-04769-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04768-8
Online ISBN: 978-3-642-04769-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Web Spam Identification with User Browsing Graph

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Advanced Approach for Link-Based Spam Detection Using Machine Learning

Detecting spam web pages using content and link-based techniques

AdaGraph: Adaptive Graph-Based Algorithms for Spam Detection in Social Networks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Web Spam Identification with User Browsing Graph

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Advanced Approach for Link-Based Spam Detection Using Machine Learning

Detecting spam web pages using content and link-based techniques

AdaGraph: Adaptive Graph-Based Algorithms for Spam Detection in Social Networks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation