Skip to main content

Web Spam Identification with User Browsing Graph

  • Conference paper
Information Retrieval Technology (AIRS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5839))

Included in the following conference series:

Abstract

Combating Web spam has become one of the top challenges for Web search engines. Most previous researches in link-based Web spam identification focus on exploiting hyperlink graphs and corresponding user-behavior models. However, the fact that hyperlinks can be easily added and removed by Web spammers makes hyperlink graph unreliable. We construct a user browsing graph based on users’ Web access log and adopt link analysis algorithms on this graph to identify Web spam pages. The constructed graph is much smaller than the original Web Graph, and link analysis algorithms can perform efficiently on them. Comparative experimental results also show that algorithms performed on the constructed graph outperforms those on the original graph.

Supported by the Chinese National Key Foundation Research & Development Plan (2004CB318108), Natural Science Foundation (60621062, 60503064, 60736044) and National 863 High Technology Project (2006AA01Z141).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. CNNIC (China Internet Network Information Center), the 23th report in development of Internet in China, http://www.cnnic.net.cn/uploadfiles/pdf/2009/1/13/92458.pdf

  2. Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large Web search engine query log. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 6–12. ACM Press, California (1999)

    Google Scholar 

  3. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating Web spam with TrustRank. In: Proceedings of the 30th VLDB Conference, pp. 576–587. ACM Press, Toronto (2004)

    Google Scholar 

  4. Benczúr, A.A., Csalogány, K., Sarlós, T., et al.: SpamRank-Fully Automatic Link Spam Detection Work in progress. In: 1st international Workshop on Adversarial information Retrieval on the Web, Chiba (2005), http://airweb.cse.lehigh.edu/2005/benczur.pdf

  5. Craswell, N., Hawking, D., Robertson, S.: Effective site finding using link anchor information. In: Proceedings of the 24th SIGIR Conference, pp. 250–257. ACM Press, New Orleans (2001)

    Google Scholar 

  6. Liu, Y., Gao, B., Liu, T., Zhang, Y., Ma, Z., He, S., Li, H.: BrowseRank: letting Web users vote for page importance. In: Proceedings of the 31st SIGIR Conference, pp. 451–458. ACM Press, Singapore (2008)

    Google Scholar 

  7. Bilenko, M., White, R.W.: Mining the search trails of surfing crowds: identifying relevant Websites from user activity. In: Proceeding of the 17th WWW Conference, pp. 51–60. ACM Press, Beijing (2008)

    Google Scholar 

  8. Liu, Y., Cen, R., Zhang, M., Ma, S., Ru, L.: Identifying Web spam with user behavior analysis. In: 4th international Workshop on Adversarial information Retrieval on the Web, pp. 9–16. ACM Press, Beijing (2008)

    Google Scholar 

  9. Wu, B., Goel, V., Davison, B.D.: Topical TrustRank: Using topicality to combat web spam. In: Proceedings of the 15th WWW Conference, pp. 63–72. ACM Press, Scotland (2006)

    Google Scholar 

  10. Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th WWW Conference, pp. 83–92. ACM Press, Scotland (2006)

    Google Scholar 

  11. Svore, K., Wu, Q., Burges, C., Raman, A.: Improving Web Spam Classification using Rank-time Features. In: Proceedings of AIRWeb 2007, pp. 9–16. ACM Press, New York (2007)

    Google Scholar 

  12. Liu, Y., Zhang, M., Ma, S.: Web key resource page selection based on non content information. J. Transactions on Intelligent System 2(1), 45–52 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yu, H., Liu, Y., Zhang, M., Ru, L., Ma, S. (2009). Web Spam Identification with User Browsing Graph. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04769-5_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04768-8

  • Online ISBN: 978-3-642-04769-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics