Abstract
In order to appear in a good position on a search engine’s result list it is not enough to be relevant regarding the request. Someone also have to be “popular”. This notion of popularity is calculated by the search engine and is related to links made to the webpage.
In order to artificially increase their popularity, webmasters sometimes use malicious techniques referred to as Webspam. It can take many forms and is in constant evolution, but Webspam usually consists of building a specific dedicated structure of spam pages around a given target page.
It is really important for a search engine to address the issue of Webspam otherwise it won’t be able to provide users with fair and reliable results.
In this paper we propose a technique to identify webspam through the frequency language associated with random walks amongst those dedicated structures. We identify the language by calculating the frequency of appearance of k-grams on random walks launch from every node.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Gyöngyi, Z., Garcia-Molina, H.: Web spam taxonomy. Adversarial Information Retrieval on the Web (2005)
Fischer, E., Magniez, F., Rougemont, M.d.: Approximate satisfiability and equivalence. In: Symposium on Logic in Computer Science, pp. 421–430 (2006)
de Kerchove, C., Ninove, L., Van Dooren, P.: Maximizing PageRank via outlinks. Linear Algebra and its Applications 429(5-6), 1254–1276 (2008)
Chung, Y.-j., Toyoda, M., Kitsuregawa, M.: A study of link farm distribution and evolution using a time series of web snapshots. In: AIRWeb 2009: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pp. 9–16. ACM, New York (2009)
Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: WWW 2006: Proceedings of the 15th International Conference on World Wide Web, pp. 83–92. ACM, New York (2006)
Gyongyi, Z., Berkhin, P., Garcia-Molina, H., Pedersen, J.: Link spam detection based on mass estimation. In: VLDB 2006: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 439–450. VLDB Endowment (2006)
Benczur, A.A., Csalogany, K., Sarlos, T., Uher, M.: Spamrank - fully automatic link spam detection. In: Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb (2005)
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: VLDB 2004: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 576–587. VLDB Endowment (2004)
Krishnan, V., Raj, R.: Web Spam Detection with Anti-Trust Rank. AIRWeb 2006 Program, 37 (2006)
Wu, B., Goel, V., Davison, B.D.: Topical trustrank: using topicality to combat web spam. In: WWW 2006: Proceedings of the 15th International Conference on World Wide Web, pp. 63–72. ACM, New York (2006)
Andersen, R., Borgs, C., Chayes, J., Hopcroft, J., Jain, K., Mirrokni, V., Teng, S.: Robust pagerank and locally computable spam detection features. In: AIRWeb 2008: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web, pp. 69–76. ACM, New York (2008)
Largillier, T., Peyronnet, S.: Lightweight clustering methods for webspam demotion. In: Proceedings of the Ninth International Conference on Web Intelligence. IEEE Press, Los Alamitos (2010)
Boldi, P., Vigna, S.: The webgraph framework I: Compression techniques. In: Proc. of the Thirteenth International World Wide Web Conference, pp. 595–601. ACM Press, New York (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Largillier, T., Peyronnet, S. (2011). Using Patterns in the Behavior of the Random Surfer to Detect Webspam Beneficiaries. In: Chiu, D.K.W., et al. Web Information Systems Engineering – WISE 2010 Workshops. WISE 2010. Lecture Notes in Computer Science, vol 6724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24396-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-24396-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24395-0
Online ISBN: 978-3-642-24396-7
eBook Packages: Computer ScienceComputer Science (R0)