skip to main content
10.1145/1451983acmotherconferencesBook PagePublication Pagesiea-aeiConference Proceedingsconference-collections
AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web
ACM2008 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
AIRWeb'08: AIRWeb '08, Third International Workshop on Adversarial Information Retrieval on the Web Beijing China 22 April 2008
ISBN:
978-1-60558-159-0
Published:
22 April 2008
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN

Reflects downloads up to 05 Mar 2025Bibliometrics
Skip Abstract Section
Abstract

Before the advent of the World Wide Web, information retrieval algorithms were developed for relatively small and coherent document collections such as newspaper articles or book catalogs in a library. In comparison to these collections, the Web is massive, much less cohe-rent, changes more rapidly, and is spread over geographically distributed computers. Scal-ing information retrieval algorithms to the World Wide Web is a challenging task. Success to date is depicted by the ubiquitous use of search engines to access Internet content.

From the point of view of a search engine, the Web is a mix of two types of content: the "closed Web" and the "open Web". The closed web comprises a few high-quality controlled collections which a search engine can fully trust. The "open Web," on the other hand, in-cludes the vast majority of Web pages, which lack an authority asserting their quality. The openness of the Web has been the key to its rapid growth and success. However, this open-ness is also a major source of new challenges for information retrieval methods.

Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, re-trieving and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is "search engine spamming" or spamdexing, i.e.: malicious attempts to influence the outcome of ranking al-gorithms, aimed at getting an undeserved high ranking for some items in the collection. There is an economic incentive to rank higher in search engines, considering that a good ranking on them is strongly correlated with more traffic, which often translates to more revenue.

Skip Table Of Content Section
SESSION: Usage analysis
research-article
A large-scale study of automated web search traffic

As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a ...

research-article
Identifying web spam with user behavior analysis

Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam detection techniques are usually designed for specific known types of Web spam and are incapable and inefficient for newly-appeared spam. With user ...

research-article
Query-log mining for detecting spam

Every day millions of users search for information on the web via search engines, and provide implicit feedback to the results shown for their queries by clicking or not onto them. This feedback is encoded in the form of a query log that consists of a ...

SESSION: Text analysis
research-article
Cleaning search results using term distance features

The presence of Web spam in query results is one of the critical challenges facing search engines today. While search engines try to combat the impact of spam pages on their results, the incentive for spammers to use increasingly sophisticated ...

research-article
Exploring linguistic features for web spam detection: a preliminary study

We study the usability of linguistic features in the Web spam classification task. The features were computed on two Web spam corpora: Webspam-Uk2006 and Webspam-Uk2007, we make them publicly available for other researchers. Preliminary analysis seems ...

research-article
Latent dirichlet allocation in web spam filtering

Latent Dirichlet allocation (LDA) (Blei, Ng, Jordan 2003) is a fully generative statistical language model on the content and topics of a corpus of documents. In this paper we apply a modification of LDA, the novel multi-corpus LDA technique for web ...

SESSION: General
research-article
Analysing features of Japanese splogs and characteristics of keywords

This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in ...

research-article
Web spam identification through content and hyperlinks

We present an algorithm, witch, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as well as page contents and features. The method is efficient, scalable, and ...

SESSION: Social networks
research-article
Identifying video spammers in online social networks

In many video social networks, including YouTube, users are permitted to post video responses to other users' videos. Such a response can be legitimate or can be a video response spam, which is a video response whose content is not related to the topic ...

research-article
A few bad votes too many?: towards robust ranking in social media

Online social media draws heavily on active reader participation, such as voting or rating of news stories, articles, or responses to a question. This user feedback is invaluable for ranking, filtering, and retrieving high quality content - tasks that ...

research-article
The anti-social tagger: detecting spam in social bookmarking systems

The annotation of web sites in social bookmarking systems has become a popular way to manage and find information on the web. The community structure of such systems attracts spammers: recent post pages, popular pages or specific tag pages can be ...

SESSION: Link analysis
research-article
Robust PageRank and locally computable spam detection features

Since the link structure of the web is an important element in ranking systems on search engines, web spammers widely use the link structure of the web to increase the rank of their pages. Various link-based features of web pages have been introduced ...

Cited By

  1. HUSSAIN O, BIN AHMAD M and ZAIDI F (2022). BENCHMARKING THE INFLUENTIAL NODES IN COMPLEX NETWORKS, Advances in Complex Systems, 10.1142/S0219525922500102, 25:07, Online publication date: 1-Nov-2022.
  2. Usman U, Mahmood A and Wang L (2019). Robust Control Centrality 2019 Chinese Control Conference (CCC), 10.23919/ChiCC.2019.8866402, 978-9-8815-6397-2, (5486-5491)
  3. El-Daghar O, Lundberg E and Bridges R (2018). EGBTER: Capturing Degree Distribution, Clustering Coefficients, and Community Structure in a Single Random Graph Model 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 10.1109/ASONAM.2018.8508598, 978-1-5386-6051-5, (282-289)
  4. Zhuang X, Zhu Y, Chang C and Peng Q Feature bundling in decision tree algorithm, Intelligent Data Analysis, 10.3233/IDA-150322, 21:2, (371-383)
  5. Erdélyi M, Benczúr A, Daróczy B, Garzó A, Kiss T and Siklósi D (2014). The Classification Power of Web Features, Internet Mathematics, 10.1080/15427951.2013.850456, 10:3-4, (421-457), Online publication date: 3-Jul-2014.
  6. Goh K, Singh A and Lim K (2013). Multilayer perceptrons neural network based Web spam detection application 2013 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), 10.1109/ChinaSIP.2013.6625419, 978-1-4799-1043-4, (636-640)
  7. Scarselli F, Tsoi A, Hagenbuchner M and Noi L (2013). Solving graph data issues using a layered architecture approach with applications to web spam detection, Neural Networks, 48, (78-90), Online publication date: 1-Dec-2013.
  8. ACM
    Garzó A, Daróczy B, Kiss T, Siklósi D and Benczúr A Cross-lingual web spam classification Proceedings of the 22nd International Conference on World Wide Web, (1149-1156)
  9. ACM
    Erdélyi M, Benczúr A, Masanés J and Siklósi D Web spam filtering in internet archives Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, (17-20)
Contributors
  • Pompeu Fabra University Barcelona
  • Microsoft Research
  • Google LLC
  1. Proceedings of the 4th international workshop on Adversarial information retrieval on the web

    Recommendations