skip to main content
10.1145/1531914acmotherconferencesBook PagePublication Pagesiea-aeiConference Proceedingsconference-collections
AIRWeb '09: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
ACM2009 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
AIRWeb '09: AIRWeb '09, 5th International Workshop on Adversarial Information Retrieval on the Web Madrid Spain 21 April 2009
ISBN:
978-1-60558-438-6
Published:
21 April 2009

Bibliometrics
Skip Abstract Section
Abstract

Before the advent of the World Wide Web, information retrieval algorithms were developed for relatively small and coherent document collections such as newspaper articles or book catalogs in a library. In comparison to these collections, the Web is massive, much less coherent, changes more rapidly, and is spread over geographically distributed computers. Scaling information retrieval algorithms to the World Wide Web is a challenging task. Success to date is depicted by the ubiquitous use of search engines to access Internet content.

Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, retrieving, and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is "search engine spamming" or spamdexing, i.e., malicious attempts to influence the outcome of ranking algorithms, aimed at getting an undeserved high ranking for some items in the collection. There is an economic incentive to rank higher in search engines, considering that a favorable position in search engine result pages is strongly correlated with more traffic, which often translates to more revenue.

As in previous years, automatic detection of search engine spam was the dominant theme of this workshop. A significant fraction of the accepted papers utilized temporal information to aid in detection of adversarial behavior. In addition to short and long papers that had been accepted in previous years, this year we introduced an additional category: position papers on challenges in Adversarial Information Retrieval, and we were excited to have two papers accepted in that category, as we believe in their potential to stimulate discussion at the workshop and beyond.

Skip Table Of Content Section
SESSION: Temporal analysis
research-article
Looking into the past to better classify web spam

Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However, existing spam detection work only considers current information. We argue ...

research-article
A study of link farm distribution and evolution using a time series of web snapshots

In this paper, we study the overall link-based spam structure and its evolution which would be helpful for the development of robust analysis tools and research for Web spamming as a social activity in the cyber space. First, we use strongly connected ...

research-article
Web spam filtering in internet archives

While Web spam is targeted for the high commercial value of top-ranked search-engine results, Web archives observe quality deterioration and resource waste as a side effect. So far Web spam filtering technologies are rarely used by Web archivists but ...

SESSION: Content analyis
research-article
Web spam identification through language model analysis

This paper applies a language model approach to different sources of information extracted from a Web page, in order to provide high quality indicators in the detection of Web Spam. Two pages linked by a hyperlink should be topically related, even ...

research-article
An empirical study on selective sampling in active learning for splog detection

This paper studies how to reduce the amount of human supervision for identifying splogs / authentic blogs in the context of continuously updating splog data sets year by year. Following the previous works on active learning, against the task of splog / ...

research-article
Linked latent Dirichlet allocation in web spam filtering

Latent Dirichlet allocation (LDA) (Blei, Ng, Jordan 2003) is a fully generative statistical language model on the content and topics of a corpus of documents. In this paper we apply an extension of LDA for web spam classification. Our linked LDA ...

SESSION: Social spam
research-article
Social spam detection

The popularity of social bookmarking sites has made them prime targets for spammers. Many of these systems require an administrator's time and energy to manually filter or remove spam. Here we discuss the motivations of social spam, and present a study ...

research-article
Tag spam creates large non-giant connected components

Spammers in social bookmarking systems try to mimick bookmarking behaviour of real users to gain the attention of other users or search engines. Several methods have been proposed for the detection of such spam, including domain-specific features (like ...

SESSION: Spam research collections
research-article
Nullification test collections for web spam and SEO

Research in the area of adversarial information retrieval has been facilitated by the availability of the UK-2006/UK-2007 collections, comprising crawl data, link graph, and spam labels. However, research into nullifying the negative effect of spam or ...

research-article
Web spam challenge proposal for filtering in archives

In this paper we propose new tasks for a possible future Web Spam Challenge motivated by the needs of the archival community. The Web archival community consists of several relatively small institutions that operate independently and possibly over ...

Contributors
  • Google LLC
  1. Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web

    Recommendations