skip to main content
10.1145/1526709.1526878acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

Combining anchor text categorization and graph analysis for paid link detection

Published: 20 April 2009 Publication History

Abstract

In order to artificially boost the rank of commercial pages in search engine results, search engine optimizers pay for links to these pages on other websites. Identifying paid links is important for a web search engine to produce highly relevant results. In this paper we introduce a novel method of identifying such links. We start with training a classifier of anchor text topics and analyzing web pages for diversity of their outgoing commercial links. Then we use this information and analyze link graph of the Russian Web to find pages that sell links and sites that buy links and to identify the paid links. Testing on manually marked samples showed high efficiency of the algorithm.

References

[1]
Kleinberg, J. (1997). Authoritative sources in a hyperlinked environment. Journal of the ACM 46 (5): 604--632.
[2]
T. H. Haveliwala. Topic-sensitive pagerank. In Proc. 11th International WWW Conference, pages 517--526, 2002.
[3]
Lafferty J., Zhai, C. Document language models, query models, and risk minimization for IR. In Proceedings of SIGIR--2001, pp 111--119.
[4]
K. Bharat and M.R. Henzinger, Improved algorithms for topic distillation in a hyperlinked environment, Proc. 21st Annual International ACM SIGIR, pp.104--111, 1998.
[5]
B. Wu and B. Davison. Undue influence: Eliminating the impact of link plagiarism on web search rankings. Technical report, Lehigh University, 2005.
[6]
Yasuhito Asano, Yu Tezuka, Takao Nishizeki. Improvement of HITS algorithms for spam links. APWeb/WAIM 2007, LNCS 4505, pp 479--490, 2007.
[7]
S. Chakrabarti. Integrating the Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction. ACM 1-58113-348-0/01/0005, 2001.

Cited By

View all
  • (2013)Quality-biased ranking for queries with commercial intentProceedings of the 22nd International Conference on World Wide Web10.1145/2487788.2488137(1145-1148)Online publication date: 13-May-2013
  • (2013)Russian web spam evolutionProceedings of the 22nd International Conference on World Wide Web10.1145/2487788.2488135(1137-1140)Online publication date: 13-May-2013
  • (2012)AutoWebProceedings of the 14th international conference on Human-computer interaction with mobile devices and services10.1145/2371574.2371604(191-200)Online publication date: 21-Sep-2012
  • Show More Cited By

Index Terms

  1. Combining anchor text categorization and graph analysis for paid link detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '09: Proceedings of the 18th international conference on World wide web
      April 2009
      1280 pages
      ISBN:9781605584874
      DOI:10.1145/1526709

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 April 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. categorization
      2. language model
      3. link analysis
      4. machine learning
      5. search engines
      6. web mining

      Qualifiers

      • Poster

      Conference

      WWW '09
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2013)Quality-biased ranking for queries with commercial intentProceedings of the 22nd International Conference on World Wide Web10.1145/2487788.2488137(1145-1148)Online publication date: 13-May-2013
      • (2013)Russian web spam evolutionProceedings of the 22nd International Conference on World Wide Web10.1145/2487788.2488135(1137-1140)Online publication date: 13-May-2013
      • (2012)AutoWebProceedings of the 14th international conference on Human-computer interaction with mobile devices and services10.1145/2371574.2371604(191-200)Online publication date: 21-Sep-2012
      • (2012)Clustering web pages to facilitate revisitation on mobile devicesProceedings of the 2012 ACM international conference on Intelligent User Interfaces10.1145/2166966.2167010(249-252)Online publication date: 14-Feb-2012
      • (2010)Adaptive combination of tag and link-based user similarity in flickrProceedings of the 18th ACM international conference on Multimedia10.1145/1873951.1874049(675-678)Online publication date: 25-Oct-2010

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media