skip to main content
10.1145/1341531.1341543acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Ranking web sites with real user traffic

Published:11 February 2008Publication History

ABSTRACT

We analyze the traffic-weighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the well-studied boolean link host graph and others pointing to important differences. We find that while search is directly involved in a surprisingly small fraction of user clicks, it leads to a much larger fraction of all sites visited. The temporal traffic patterns display strong regularities, with a large portion of future requests being statistically predictable by past ones. Given the importance of topological measures such as PageRank in modeling user navigation, as well as their role in ranking sites for Web search, we use the traffic data to validate the PageRank random surfing model. The ranking obtained by the actual frequency with which a site is visited by users differs significantly from that approximated by the uniform surfing/teleportation behavior modeled by PageRank, especially for the most important sites. To interpret this finding, we consider each of the fundamental assumptions underlying PageRank and show how each is violated by actual user behavior

References

  1. L. Adamic and B. Huberman. Power-law distribution of the World Wide Web. Science, 287:2115, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  2. E. Agichtein, E. Brill, and S. Dumais. Improving Web search ranking by incorporating user behavior information. In Proc. 29th ACM SIGIR Conf., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Albert, H. Jeong, and A.-L. Barabási. Diameter of the World Wide Web. Nature, 401(6749):130--131, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  4. E. Almaas, B. Kovacs, T. Vicsek, Z. N. Oltvai, and A.-L. Barabasi. Global organization of metabolic fluxes in the bacterium escherichia coli. Nature, 427(6977):839--843, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web structure, dynamics and page quality. In A. H. F. Laender and A. L. Oliveira, editors, Proc. 9th Intl. Symp. on String Processing and Information Retrieval (SPIRE 2002), volume 2476 of Lecture Notes in Computer Science, pages 117--130. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Barthelemy, B. Gondranb, and E. Guichardc. Spatial structure of the internet traffic. Physica A, 319:633--642, March 2003.Google ScholarGoogle ScholarCross RefCross Ref
  7. K. Bharat, B.-W. Chang, M. Kenzinger, and M. Ruhl. Who links to whom: Mining linkage between web sites. In Proceedings of First IEEE International Conference on Data Mining (ICDM'01), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Boldi, M. Santini, and S. Vigna. Do your worst to make the best: Paradoxical effects in pagerank incremental computations. Internet Mathematics, 2(3):387--404, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  9. P. Boldi, M. Santini, and S. Vigna. Pagerank as a function of the damping factor. In WWW'05: Proceedings of the 14th international conference on World Wide Web, pages 557--566, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 30(1-7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Broder, S. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the Web. Computer Networks, 33(1-6):309--320, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems, 27(6):1065--1073, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Cho and S. Roy. Impact of search engines on page popularity. In S. I. Feldman, M. Uretsky, M. Najork, and C. E. Wills, editors, Proc. 13th intl. conf. on World Wide Web, pages 20--29. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. Technical report, arXiv:0706.1062v1 {physics.data-an}, 2007.Google ScholarGoogle Scholar
  15. A. Cockburn and B. McKenzie. What do Web users do? An empirical analysis of Web use. Intl. Journal of Human-Computer Studies, 54(6):903--922, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Dill, R. Kumar, K. S. McCurley, S. Rajagopalan, D. Sivakumar, and A. Tomkins. Self-similarity in the web. ACM Transactions on Internet Technology, 2(3):205--223, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Donato, L. Laura, S. Leonardi, and S. Millozzi. Large scale properties of the webgraph. Eur. Phys. J. B, 38:239--243, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Erman, A. Mahanti, M. Arlitt, and C. Williamson. Identifying and discriminating between web and peer-to-peer traffic in the network core. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 883--892, New York, NY, USA, 2007. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Fortunato and A. Flammini. Random walks on directed networks: the case of pagerank. International Journal of Bifurcation and Chaos, 2007. Forthcoming.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Fortunato, A. Flammini, and F. Menczer. Scale-free network growth by ranking. Phys. Rev. Lett., 96(21):218701, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. Fortunato, A. Flammini, F. Menczer, and A. Vespignani. Topical interests and the mitigation of search engine bias. Proc. Natl. Acad. Sci. USA, 103(34):12684--12689, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  22. M. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform URL sampling. In Proc. 9th International World Wide Web Conference, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O. Herfindahl. Copper Costs and Prices: 1870--1957. John Hopkins University Press, Baltimore, MD, 1959.Google ScholarGoogle Scholar
  24. A. Hirschman. The paternity of an index. American Economic Review, 54(5):761--762, 1964.Google ScholarGoogle Scholar
  25. L. Introna and H. Nissenbaum. Defining the web: The politics of search engines. IEEE Computer, 33(1):54--62, January 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Kendall. A new measure of rank correlation. Biometrika, 30:81--89, 1938.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Luxenburger and G. Weikum. Query-Log Based Authority Analysis for Web Information Search, volume 3306 of Lecture Notes in Computer Science, pages 90--101. Springer Berlin/Heidelberg, 2004.Google ScholarGoogle Scholar
  29. M. Meiss, F. Menczer, and A. Vespignani. On the lack of typical behavior in the global Web traffic network. In Proc. 14th International World Wide Web Conference, pages 510--518, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. Mobasher, R. Cooley, and J. Srivastava. Automatic personalization based on web usage mining. Communications of the ACM, 43(8):141--151, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Mowshowitz and A. Kawaguchi. Bias on the Web. Commun. ACM, 45(9):56--60, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Najork and J. L. Wiener. Breadth-first search crawling yields high-quality pages. In Proc. 10th International World Wide Web Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. Qiu, Z. Liu, and J. Cho. Analysis of user web traffic with a focus on search activities. In A. Doan, F. Neven, R. McCann, and G. J. Bex, editors, Proc. 8th International Workshop on the Web and Databases (WebDB), pages 103--108, 2005.Google ScholarGoogle Scholar
  34. M. Richardson, A. Prakash, and E. Brill. Beyond pagerank: machine learning for static ranking. In Proc. 15th International World Wide Web Conference, pages 707--715, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. A. Serrano, A. Maguitman, M. Boguna, S. Fortunato, and A. Vespignani. Decoding the structure of the WWW: A comparative analysis of Web crawls. ACM Trans. Web, 1(2):10, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Sydow. Can link analysis tell us about web traffic? In WWW '05: Special interest tracks and posters of the 14th international conference on World Wide Web, pages 954--955, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Q. Yang and H. H. Zhang. Web-log mining for predictive web caching. IEEE Trans. on Knowledge and Data Engineering, 15(4):1050--1053, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ranking web sites with real user traffic

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining
          February 2008
          270 pages
          ISBN:9781595939272
          DOI:10.1145/1341531

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 February 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate498of2,863submissions,17%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader