Skip to main content

Using Patterns in the Behavior of the Random Surfer to Detect Webspam Beneficiaries

  • Conference paper
Web Information Systems Engineering – WISE 2010 Workshops (WISE 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6724))

Included in the following conference series:

Abstract

In order to appear in a good position on a search engine’s result list it is not enough to be relevant regarding the request. Someone also have to be “popular”. This notion of popularity is calculated by the search engine and is related to links made to the webpage.

In order to artificially increase their popularity, webmasters sometimes use malicious techniques referred to as Webspam. It can take many forms and is in constant evolution, but Webspam usually consists of building a specific dedicated structure of spam pages around a given target page.

It is really important for a search engine to address the issue of Webspam otherwise it won’t be able to provide users with fair and reliable results.

In this paper we propose a technique to identify webspam through the frequency language associated with random walks amongst those dedicated structures. We identify the language by calculating the frequency of appearance of k-grams on random walks launch from every node.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)

    Google Scholar 

  2. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  3. Gyöngyi, Z., Garcia-Molina, H.: Web spam taxonomy. Adversarial Information Retrieval on the Web (2005)

    Google Scholar 

  4. Fischer, E., Magniez, F., Rougemont, M.d.: Approximate satisfiability and equivalence. In: Symposium on Logic in Computer Science, pp. 421–430 (2006)

    Google Scholar 

  5. de Kerchove, C., Ninove, L., Van Dooren, P.: Maximizing PageRank via outlinks. Linear Algebra and its Applications 429(5-6), 1254–1276 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  6. Chung, Y.-j., Toyoda, M., Kitsuregawa, M.: A study of link farm distribution and evolution using a time series of web snapshots. In: AIRWeb 2009: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pp. 9–16. ACM, New York (2009)

    Google Scholar 

  7. Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: WWW 2006: Proceedings of the 15th International Conference on World Wide Web, pp. 83–92. ACM, New York (2006)

    Google Scholar 

  8. Gyongyi, Z., Berkhin, P., Garcia-Molina, H., Pedersen, J.: Link spam detection based on mass estimation. In: VLDB 2006: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 439–450. VLDB Endowment (2006)

    Google Scholar 

  9. Benczur, A.A., Csalogany, K., Sarlos, T., Uher, M.: Spamrank - fully automatic link spam detection. In: Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb (2005)

    Google Scholar 

  10. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: VLDB 2004: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 576–587. VLDB Endowment (2004)

    Google Scholar 

  11. Krishnan, V., Raj, R.: Web Spam Detection with Anti-Trust Rank. AIRWeb 2006 Program, 37 (2006)

    Google Scholar 

  12. Wu, B., Goel, V., Davison, B.D.: Topical trustrank: using topicality to combat web spam. In: WWW 2006: Proceedings of the 15th International Conference on World Wide Web, pp. 63–72. ACM, New York (2006)

    Google Scholar 

  13. Andersen, R., Borgs, C., Chayes, J., Hopcroft, J., Jain, K., Mirrokni, V., Teng, S.: Robust pagerank and locally computable spam detection features. In: AIRWeb 2008: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web, pp. 69–76. ACM, New York (2008)

    Google Scholar 

  14. Largillier, T., Peyronnet, S.: Lightweight clustering methods for webspam demotion. In: Proceedings of the Ninth International Conference on Web Intelligence. IEEE Press, Los Alamitos (2010)

    Google Scholar 

  15. Boldi, P., Vigna, S.: The webgraph framework I: Compression techniques. In: Proc. of the Thirteenth International World Wide Web Conference, pp. 595–601. ACM Press, New York (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Largillier, T., Peyronnet, S. (2011). Using Patterns in the Behavior of the Random Surfer to Detect Webspam Beneficiaries. In: Chiu, D.K.W., et al. Web Information Systems Engineering – WISE 2010 Workshops. WISE 2010. Lecture Notes in Computer Science, vol 6724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24396-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24396-7_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24395-0

  • Online ISBN: 978-3-642-24396-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics