research-article

Ranking web sites with real user traffic

Authors:
Mark R. Meiss

Indiana University, Bloomington, IN

Indiana University, Bloomington, IN
View Profile

,
Filippo Menczer

Indiana University, Bloomington, IN & ISI Foundation, Torino, Italy

Indiana University, Bloomington, IN & ISI Foundation, Torino, Italy
View Profile

,
Santo Fortunato

ISI Foundation, Torino, Italy

ISI Foundation, Torino, Italy
View Profile

,
Alessandro Flammini

Indiana University, Bloomington, IN

Indiana University, Bloomington, IN
View Profile

,
Alessandro Vespignani

Indiana University, Bloomington, IN & ISI Foundation, Torino, Italy

Indiana University, Bloomington, IN & ISI Foundation, Torino, Italy
View Profile

WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data MiningFebruary 2008Pages 65–76https://doi.org/10.1145/1341531.1341543

Published:11 February 2008Publication History

WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining

Pages 65–76

ABSTRACT

We analyze the traffic-weighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the well-studied boolean link host graph and others pointing to important differences. We find that while search is directly involved in a surprisingly small fraction of user clicks, it leads to a much larger fraction of all sites visited. The temporal traffic patterns display strong regularities, with a large portion of future requests being statistically predictable by past ones. Given the importance of topological measures such as PageRank in modeling user navigation, as well as their role in ranking sites for Web search, we use the traffic data to validate the PageRank random surfing model. The ranking obtained by the actual frequency with which a site is visited by users differs significantly from that approximated by the uniform surfing/teleportation behavior modeled by PageRank, especially for the most important sites. To interpret this finding, we consider each of the fundamental assumptions underlying PageRank and show how each is violated by actual user behavior

References

L. Adamic and B. Huberman. Power-law distribution of the World Wide Web. Science, 287:2115, 2000.Google ScholarCross Ref
E. Agichtein, E. Brill, and S. Dumais. Improving Web search ranking by incorporating user behavior information. In Proc. 29th ACM SIGIR Conf., 2006. Google ScholarDigital Library
R. Albert, H. Jeong, and A.-L. Barabási. Diameter of the World Wide Web. Nature, 401(6749):130--131, 1999.Google ScholarCross Ref
E. Almaas, B. Kovacs, T. Vicsek, Z. N. Oltvai, and A.-L. Barabasi. Global organization of metabolic fluxes in the bacterium escherichia coli. Nature, 427(6977):839--843, 2004.Google ScholarCross Ref
R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web structure, dynamics and page quality. In A. H. F. Laender and A. L. Oliveira, editors, Proc. 9th Intl. Symp. on String Processing and Information Retrieval (SPIRE 2002), volume 2476 of Lecture Notes in Computer Science, pages 117--130. Springer, 2002. Google ScholarDigital Library
M. Barthelemy, B. Gondranb, and E. Guichardc. Spatial structure of the internet traffic. Physica A, 319:633--642, March 2003.Google ScholarCross Ref
K. Bharat, B.-W. Chang, M. Kenzinger, and M. Ruhl. Who links to whom: Mining linkage between web sites. In Proceedings of First IEEE International Conference on Data Mining (ICDM'01), 2001. Google ScholarDigital Library
P. Boldi, M. Santini, and S. Vigna. Do your worst to make the best: Paradoxical effects in pagerank incremental computations. Internet Mathematics, 2(3):387--404, 2005.Google ScholarCross Ref
P. Boldi, M. Santini, and S. Vigna. Pagerank as a function of the damping factor. In WWW'05: Proceedings of the 14th international conference on World Wide Web, pages 557--566, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 30(1-7):107--117, 1998. Google ScholarDigital Library
A. Broder, S. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the Web. Computer Networks, 33(1-6):309--320, 2000. Google ScholarDigital Library
L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems, 27(6):1065--1073, 1995. Google ScholarDigital Library
J. Cho and S. Roy. Impact of search engines on page popularity. In S. I. Feldman, M. Uretsky, M. Najork, and C. E. Wills, editors, Proc. 13th intl. conf. on World Wide Web, pages 20--29. ACM, 2004. Google ScholarDigital Library
A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. Technical report, arXiv:0706.1062v1 {physics.data-an}, 2007.Google Scholar
A. Cockburn and B. McKenzie. What do Web users do? An empirical analysis of Web use. Intl. Journal of Human-Computer Studies, 54(6):903--922, 2001. Google ScholarDigital Library
S. Dill, R. Kumar, K. S. McCurley, S. Rajagopalan, D. Sivakumar, and A. Tomkins. Self-similarity in the web. ACM Transactions on Internet Technology, 2(3):205--223, 2002. Google ScholarDigital Library
D. Donato, L. Laura, S. Leonardi, and S. Millozzi. Large scale properties of the webgraph. Eur. Phys. J. B, 38:239--243, 2004.Google ScholarCross Ref
J. Erman, A. Mahanti, M. Arlitt, and C. Williamson. Identifying and discriminating between web and peer-to-peer traffic in the network core. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 883--892, New York, NY, USA, 2007. ACM Press. Google ScholarDigital Library
S. Fortunato and A. Flammini. Random walks on directed networks: the case of pagerank. International Journal of Bifurcation and Chaos, 2007. Forthcoming.Google ScholarCross Ref
S. Fortunato, A. Flammini, and F. Menczer. Scale-free network growth by ranking. Phys. Rev. Lett., 96(21):218701, 2006.Google ScholarCross Ref
S. Fortunato, A. Flammini, F. Menczer, and A. Vespignani. Topical interests and the mitigation of search engine bias. Proc. Natl. Acad. Sci. USA, 103(34):12684--12689, 2006.Google ScholarCross Ref
M. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform URL sampling. In Proc. 9th International World Wide Web Conference, 2000. Google ScholarDigital Library
O. Herfindahl. Copper Costs and Prices: 1870--1957. John Hopkins University Press, Baltimore, MD, 1959.Google Scholar
A. Hirschman. The paternity of an index. American Economic Review, 54(5):761--762, 1964.Google Scholar
L. Introna and H. Nissenbaum. Defining the web: The politics of search engines. IEEE Computer, 33(1):54--62, January 2000. Google ScholarDigital Library
M. Kendall. A new measure of rank correlation. Biometrika, 30:81--89, 1938.Google ScholarCross Ref
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarDigital Library
J. Luxenburger and G. Weikum. Query-Log Based Authority Analysis for Web Information Search, volume 3306 of Lecture Notes in Computer Science, pages 90--101. Springer Berlin/Heidelberg, 2004.Google Scholar
M. Meiss, F. Menczer, and A. Vespignani. On the lack of typical behavior in the global Web traffic network. In Proc. 14th International World Wide Web Conference, pages 510--518, 2005. Google ScholarDigital Library
B. Mobasher, R. Cooley, and J. Srivastava. Automatic personalization based on web usage mining. Communications of the ACM, 43(8):141--151, 2000. Google ScholarDigital Library
A. Mowshowitz and A. Kawaguchi. Bias on the Web. Commun. ACM, 45(9):56--60, 2002. Google ScholarDigital Library
M. Najork and J. L. Wiener. Breadth-first search crawling yields high-quality pages. In Proc. 10th International World Wide Web Conference, 2001. Google ScholarDigital Library
F. Qiu, Z. Liu, and J. Cho. Analysis of user web traffic with a focus on search activities. In A. Doan, F. Neven, R. McCann, and G. J. Bex, editors, Proc. 8th International Workshop on the Web and Databases (WebDB), pages 103--108, 2005.Google Scholar
M. Richardson, A. Prakash, and E. Brill. Beyond pagerank: machine learning for static ranking. In Proc. 15th International World Wide Web Conference, pages 707--715, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
M. A. Serrano, A. Maguitman, M. Boguna, S. Fortunato, and A. Vespignani. Decoding the structure of the WWW: A comparative analysis of Web crawls. ACM Trans. Web, 1(2):10, 2007. Google ScholarDigital Library
M. Sydow. Can link analysis tell us about web traffic? In WWW '05: Special interest tracks and posters of the 14th international conference on World Wide Web, pages 954--955, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
Q. Yang and H. H. Zhang. Web-log mining for predictive web caching. IEEE Trans. on Knowledge and Data Engineering, 15(4):1050--1053, 2003. Google ScholarDigital Library

Index Terms

Ranking web sites with real user traffic

Recommendations

Focused ranking in a vertical search engine
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Since the debut of PageRank and HITS, hyperlink-induced Web document ranking has come a long way. The Web has become increasingly vast and topically diverse. Such vastness has led many into the area of topic-sensitive ranking and its variants. We ...
Read More
Ranking web sites using domain ontology concepts

Many web search engines retrieve enormous amounts of irrelevant information in answer to users' queries. The semantic web provides a promising approach to improve search operation. For specific domains, ontologies can capture concepts to help machines ...
Read More
Content and link-structure perspective of ranking webpages: A review
Abstract
The delivery of ranked relevant results is probably the most important factor in making a web search engine acceptable to its users. This inspiration has led the search engine engineers and researchers to conceive ranking algorithms ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining
February 2008
270 pages
ISBN:9781595939272
DOI:10.1145/1341531
General Chair:
Marc Najork
Microsoft, USA
,
Program Chairs:
Andrei Broder
Yahoo!, USA
,
Soumen Chakrabarti
IIT Bombay, India
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 February 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
navigation
pagerank
ranking
search
teleportation
web traffic
weighted host graph
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate498of2,863submissions,17%
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 54
  Total Citations
  View Citations
- 1,126
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ranking web sites with real user traffic

WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Focused ranking in a vertical search engine

Ranking web sites using domain ontology concepts

Content and link-structure perspective of ranking webpages: A review