Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web

AIRWeb '09: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web

April 2009

2009 Proceeding

Editors:
Dennis Fetterly
Microsoft Research
,
Zoltán Gyöngyi
Google Research

Publisher:

Association for Computing Machinery
New York
NY
United States

Conference:

AIRWeb '09: AIRWeb '09, 5th International Workshop on Adversarial Information Retrieval on the Web Madrid Spain 21 April 2009

ISBN:

978-1-60558-438-6

Published:

21 April 2009

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Get Alerts for this ConferenceAlerts Save to BinderBinder

Save to Binder

Create a New Binder

Name

Export CitationCitation

Share on

Bibliometrics

Citation count

285

Downloads (6 weeks)

Downloads (12 months)

Downloads (cumulative)

5,133

Sections

AIRWeb '09: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web

2009

Previous Next

Skip Abstract Section

Abstract

Before the advent of the World Wide Web, information retrieval algorithms were developed for relatively small and coherent document collections such as newspaper articles or book catalogs in a library. In comparison to these collections, the Web is massive, much less coherent, changes more rapidly, and is spread over geographically distributed computers. Scaling information retrieval algorithms to the World Wide Web is a challenging task. Success to date is depicted by the ubiquitous use of search engines to access Internet content.

Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, retrieving, and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is "search engine spamming" or spamdexing, i.e., malicious attempts to influence the outcome of ranking algorithms, aimed at getting an undeserved high ranking for some items in the collection. There is an economic incentive to rank higher in search engines, considering that a favorable position in search engine result pages is strongly correlated with more traffic, which often translates to more revenue.

As in previous years, automatic detection of search engine spam was the dominant theme of this workshop. A significant fraction of the accepted papers utilized temporal information to aid in detection of adversarial behavior. In addition to short and long papers that had been accepted in previous years, this year we introduced an additional category: position papers on challenges in Adversarial Information Retrieval, and we were excited to have two papers accepted in that category, as we believe in their potential to stimulate discussion at the workshop and beyond.

Proceeding Downloads

PDFFront matter (Title page, Contents, Committees, Introduction)

Skip Table Of Content Section

Select All

Export Citations Save to Binder

SESSION: Temporal analysis

research-article

Looking into the past to better classify web spam

Na Dai,
Brian D. Davison,
Xiaoguang Qi

pp 1–8https://doi.org/10.1145/1531914.1531916

Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However, existing spam detection work only considers current information. We argue ...

- 28
- 380
Metrics
Total Citations28
Total Downloads380
Last 12 Months2
Last 6 weeks0

Abstract
Get Access

research-article

A study of link farm distribution and evolution using a time series of web snapshots

Young-joo Chung,
Masashi Toyoda,
Masaru Kitsuregawa

pp 9–16https://doi.org/10.1145/1531914.1531917

In this paper, we study the overall link-based spam structure and its evolution which would be helpful for the development of robust analysis tools and research for Web spamming as a social activity in the cyber space. First, we use strongly connected ...

- 25
- 285
Metrics
Total Citations25
Total Downloads285
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

research-article

Web spam filtering in internet archives

Miklós Erdélyi,
András A. Benczúr,
Julien Masanés,
Dávid Siklósi

pp 17–20https://doi.org/10.1145/1531914.1531918

While Web spam is targeted for the high commercial value of top-ranked search-engine results, Web archives observe quality deterioration and resource waste as a side effect. So far Web spam filtering technologies are rarely used by Web archivists but ...

- 12
- 416
Metrics
Total Citations12
Total Downloads416
Last 12 Months2
Last 6 weeks0

Abstract
Get Access

SESSION: Content analyis

research-article

Web spam identification through language model analysis

Juan Martinez-Romo,
Lourdes Araujo

pp 21–28https://doi.org/10.1145/1531914.1531920

This paper applies a language model approach to different sources of information extracted from a Web page, in order to provide high quality indicators in the detection of Web Spam. Two pages linked by a hyperlink should be topically related, even ...

- 43
- 473
Metrics
Total Citations43
Total Downloads473
Last 12 Months5
Last 6 weeks1

Abstract
Get Access

research-article

An empirical study on selective sampling in active learning for splog detection

Taichi Katayama,
Takehito Utsuro,
Yuuki Sato,
Takayuki Yoshinaka,
Yasuhide Kawada,
Tomohiro Fukuhara

pp 29–36https://doi.org/10.1145/1531914.1531921

This paper studies how to reduce the amount of human supervision for identifying splogs / authentic blogs in the context of continuously updating splog data sets year by year. Following the previous works on active learning, against the task of splog / ...

- 7
- 207
Metrics
Total Citations7
Total Downloads207
Last 12 Months3
Last 6 weeks2

Abstract
Get Access

research-article

Linked latent Dirichlet allocation in web spam filtering

István Bíró,
Dávid Siklósi,
Jácint Szabó,
András A. Benczúr

pp 37–40https://doi.org/10.1145/1531914.1531922

Latent Dirichlet allocation (LDA) (Blei, Ng, Jordan 2003) is a fully generative statistical language model on the content and topics of a corpus of documents. In this paper we apply an extension of LDA for web spam classification. Our linked LDA ...

- 35
- 572
Metrics
Total Citations35
Total Downloads572
Last 12 Months10
Last 6 weeks2

Abstract
Get Access

SESSION: Social spam

research-article

Social spam detection

Benjamin Markines,
Ciro Cattuto,
Filippo Menczer

pp 41–48https://doi.org/10.1145/1531914.1531924

The popularity of social bookmarking sites has made them prime targets for spammers. Many of these systems require an administrator's time and energy to manually filter or remove spam. Here we discuss the motivations of social spam, and present a study ...

- 121
- 1,976
Metrics
Total Citations121
Total Downloads1,976
Last 12 Months38
Last 6 weeks6

Abstract
Get Access

research-article

Tag spam creates large non-giant connected components

Nicolas Neubauer,
Robert Wetzker,
Klaus Obermayer

pp 49–52https://doi.org/10.1145/1531914.1531925

Spammers in social bookmarking systems try to mimick bookmarking behaviour of real users to gain the attention of other users or search engines. Several methods have been proposed for the detection of such spam, including domain-specific features (like ...

- 3
- 171
Metrics
Total Citations3
Total Downloads171
Last 12 Months2
Last 6 weeks1

Abstract
Get Access

SESSION: Spam research collections

research-article

Nullification test collections for web spam and SEO

Timothy Jones,
Ramesh Sankaranarayana,
David Hawking,
Nick Craswell

pp 53–60https://doi.org/10.1145/1531914.1531927

Research in the area of adversarial information retrieval has been facilitated by the availability of the UK-2006/UK-2007 collections, comprising crawl data, link graph, and spam labels. However, research into nullifying the negative effect of spam or ...

- 4
- 435
Metrics
Total Citations4
Total Downloads435
Last 12 Months4
Last 6 weeks2

Abstract
Get Access

research-article

Web spam challenge proposal for filtering in archives

András A. Benczúr,
Miklós Erdélyi,
Julien Masanés,
Dávid Siklósi

pp 61–62https://doi.org/10.1145/1531914.1531928

In this paper we propose new tasks for a possible future Web Spam Challenge motivated by the needs of the archival community. The Web archival community consists of several relatively small institutions that operate independently and possibly over ...

- 5
- 187
Metrics
Total Citations5
Total Downloads187
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

Cited By

Papadopoulos S, Bontcheva K, Jaho E, Lupu M and Castillo C (2016). Overview of the Special Issue on Trust and Veracity of Information in Social Media, ACM Transactions on Information Systems, 10.1145/2870630, 34:3, (1-5), Online publication date: 11-Apr-2016.
Daroczy B, Siklois D, Palovics R and Benczur A Text Classification Kernels for Quality Prediction over the C3 Data Set Proceedings of the 24th International Conference on World Wide Web, (1441-1446)

Save to Binder

Create a New Binder

Name

Contributors

Dennis Fetterly
Google LLC
- Publication Years2003 - 2016
- Publication counts21
- Citation count5,396
- Available for Download14
- Downloads (cumulative)29,226
- Downloads (12 months)987
- Downloads (6 weeks)149
- Average Downloads per Article2,088
- Average Citation per Article257
View Full Profile
Zoltán Istvan Gyöngyi
Google LLC
- Publication Years2004 - 2013
- Publication counts12
- Citation count528
- Available for Download11
- Downloads (cumulative)6,348
- Downloads (12 months)50
- Downloads (6 weeks)9
- Average Downloads per Article577
- Average Citation per Article44
View Full Profile

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
1. Information systems

Recommendations

WIDM '03: Proceedings of the 5th ACM international workshop on Web information and data management
Read More
WOWMOM '02: Proceedings of the 5th ACM international workshop on Wireless mobile multimedia
Read More
Fourth international workshop on adversarial information retrieval on the web (AIRWeb 2008)
WWW '08: Proceedings of the 17th international conference on World Wide Web

Adversarial IR in general, and search engine spam, in particular, are engaging research topics with a real-world impact for Web users, advertisers and publishers. The AIRWeb workshop will bring researchers and practitioners in these areas together, to ...
Read More

Comments

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation

Save to Binder

Sections

Proceeding Downloads

Cited By

Save to Binder

Recommendations

WIDM '03: Proceedings of the 5th ACM international workshop on Web information and data management

WOWMOM '02: Proceedings of the 5th ACM international workshop on Wireless mobile multimedia

Fourth international workshop on adversarial information retrieval on the web (AIRWeb 2008)