skip to main content
research-article
Free access

Ranking billions of web pages using diodes

Published: 01 August 2009 Publication History

Abstract

Introduction
Because of the web's rapid growth and lack of central organization, Internet search engines play a vital role in assisting the users of the Web in retrieving relevant information out of the tens of billions of documents available. With millions of dollars of potential revenue at stake, commercial Web sites compete fiercely to be placed prominently within the first page returned by a search engine. As a result, search engine optimizers (SEOs) developed various forms of search engine spamming (or spamdexing) techniques to artificially inflate the rankings of Web pages. Link-based ranking algorithms, such as Google's PageRank, have been largely effective against most conventional spamming techniques.
However, PageRank has three fundamental flaws that, when exploited aggressively, can be proven to be its Achilles' heel: First, PageRank gives a minimum guaranteed score to every page on the Web; second, it rewards all incoming links as valid endorsements; and third, it imposes no penalty for making links to low-quality pages. SEOs can take advantage of these shortcomings to the extreme by employing an Artificial Web, a collection of an extremely large number of computer-generated Web pages containing many links to only a few target pages. Each page of the Artificial Web collects the minimum PageRank and feeds it back to the target pages. Although the individual endorsements are small, the flaws of PageRank make it possible for an Artificial Web to accumulate sizable PageRank values for the target pages. The SEOs can even download a substantial portion of the real Web and modify only the destinations of the hyperlinks, thus circumventing any detection algorithms based on the quality or the size of pages. As the size of an Artificial Web can be comparable to that of the real Web, SEOs can seriously compromise the objectivity of the results that PageRank provides. Although some statistical measures can be employed to identify specific attributes associated with an Artificial Web and filter them out of search results, it is far more desirable to develop a new ranking model that is free of such exploits to begin with.

References

[1]
Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., and Raghavan, S. Searching the Web. ACM Transactions on Internet Technology 1, 1, (June 2001) 2--43.
[2]
Brin, S., and Page, L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, (1-7) (1998) 107--117.
[3]
Bronshtein, N. I., and Semendyayev, A. K. Handbook of Mathematics, 892. Springer-Verlag, New York, 3rd edition, 1997.
[4]
Faloutsos, C., Mccurley, S. K., and Tomkins, A. Fast discovery of connection subgraphs. In Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, NY, (2004), 118--127.
[5]
Fetterly, D., Manasse, M., and Najork, M. Spam, damn spam, and statistics: using statistical analysis to locate spam Web pages. In Proceedings of the 7th International Workshop on the Web and Databases, ACM, NY, (2004) 1--6.
[6]
Gulli, A., and Signorini, A. The indexable Web is more than 11.5 billion pages. In Special interest tracks and posters of the 14th International Conference on World Wide Web, ACM, NY, (2005), 902--903.
[7]
Gyongyi, Z., Garcia-Molina, H., and Pedersen, J. Combating Web spam with TrustRank. In Proceedings of the 30th VLDB Conference, 2004, 576--587.
[8]
Henzinger, M., Motwani, R., and Silverstein, C. Challenges in Web search engines. SIGIR Forum 36, 2, 2002.
[9]
Kleinberg, J., and Lawrence, S. The structure of the Web. Science, 294:1849, 2001.
[10]
Kleinberg, M. J. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 5, (1999), 604--632.
[11]
Korniss, G., Hastings, B. M., Bassler, E. K., Berryman, J. M., Kozma, B., and Abbott, D. Scaling in small-world resistor networks. Physics Letters A, 350:324, 2006.
[12]
Lopez, E., Buldyrev, V. S., Havlin, S., and Stanley, S. H. Anomalous transport in scale-free networks. Physical Review Letters 94, 24, 2005.
[13]
Lyman, P., Varian, R. H., Swearingen, K., Charles, P., Good, N., Lamar Jordan, L., and Pal, J. How much information? 2003; http://www.sims.berkeley.edu/howmuch-info-2003, 2003.
[14]
Newman, M. E. J., and Girvan, M. Finding and evaluating community structure in networks. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) 69, 2, 2004.
[15]
Press, H. W., Flannery, P. B., Teukolsky, A. S., and Vetterling, T. W. Numerical Recipes in FORTRAN: The Art of Scientific Computing. Cambridge University Press, Cambridge, U.K, 1992, 864--866.
[16]
Yang, H., King, I., and Lyu, M. R. DiffusionRank: A possible penicillin for Web spamming. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, NY, (2007), 431--438.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 52, Issue 8
A Blind Person's Interaction with Technology
August 2009
132 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/1536616
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2009
Published in CACM Volume 52, Issue 8

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)228
  • Downloads (Last 6 weeks)43
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Content and link-structure perspective of ranking webpages: A reviewComputer Science Review10.1016/j.cosrev.2021.10039740(100397)Online publication date: May-2021
  • (2015)On characterizing and computing the diversity of hyperlinks for anti-spamming page rankingKnowledge-Based Systems10.1016/j.knosys.2014.12.02877:C(56-67)Online publication date: 1-Mar-2015
  • (2014)Identifying and Characterizing Key Nodes among Communities Based on Electrical-Circuit NetworksPLoS ONE10.1371/journal.pone.00970219:6(e97021)Online publication date: 4-Jun-2014
  • (2013)Predicting the Future Impact of Academic PublicationsProgress in Artificial Intelligence10.1007/978-3-642-40669-0_32(366-377)Online publication date: 2013
  • (2012)Distributed flow optimization and cascading effects in weighted complex networksThe European Physical Journal B10.1140/epjb/e2012-30122-385:8Online publication date: 20-Aug-2012
  • (2012)Constructing a reliable Web graph with information on browsing behaviorDecision Support Systems10.1016/j.dss.2012.06.00154:1(390-401)Online publication date: 1-Dec-2012
  • (2011)Optimizing Synchronization, Flow, and Robustness in Weighted Complex NetworksHandbook of Optimization in Complex Networks10.1007/978-1-4614-0857-4_3(61-96)Online publication date: 27-Oct-2011
  • (2010)Discover Information and Knowledge from Websites Using an Integrated Summarization and Visualization FrameworkProceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining10.1109/WKDD.2010.109(232-235)Online publication date: 9-Jan-2010

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media