Skip to main content

Modeling Traffic on the Web Graph

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6516))

Abstract

Analysis of aggregate and individual Web requests shows that PageRank is a poor predictor of traffic. We use empirical data to characterize properties of Web traffic not reproduced by Markovian models, including both aggregate statistics such as page and link traffic, and individual statistics such as entropy and session size. As no current model reconciles all of these observations, we present an agent-based model that explains them through realistic browsing behaviors: (1) revisiting bookmarked pages; (2) backtracking; and (3) seeking out novel pages of topical interest. The resulting model can reproduce the behaviors we observe in empirical data, especially heterogeneous session lengths, reconciling the narrowly focused browsing patterns of individual users with the extreme variance in aggregate traffic measurements. We can thereby identify a few salient features that are necessary and sufficient to interpret Web traffic data. Beyond the descriptive and explanatory power of our model, these results may lead to improvements in Web applications such as search and crawling.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adar, E., Teevan, J., Dumais, S.: Large scale analysis of web revisitation patterns. In: Proc. CHI (2008)

    Google Scholar 

  2. Adar, E., Teevan, J., Dumais, S.: Resonance on the web: Web dynamics and revisitation patterns. In: Proc. CHI (2009)

    Google Scholar 

  3. Gonçalves, B., Meiss, M.R., Ramasco, J.J., Flammini, A., Menczer, F.: Remembering what we like: Toward an agent-based model of Web traffic. Late Breaking Results WSDM (2009)

    Google Scholar 

  4. Beauvisage, T.: The dynamics of personal territories on the web. In: Proc. HT (2009)

    Google Scholar 

  5. Bouklit, M., Mathieu, F.: BackRank: an alternative for PageRank? In: Proc. WWW Special Interest Tracks and Posters, pp. 1122–1123 (2005)

    Google Scholar 

  6. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks 30(1-7), 107–117 (1998)

    Google Scholar 

  7. Broder, A., Kumar, S., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the Web. Computer Networks 33(1-6), 309–320 (2000)

    Article  Google Scholar 

  8. Chierichetti, F., Kumar, R., Tomkins, A.: Stochastic models for tabbed browsing. In: Proc. WWW, pp. 241–250 (2010)

    Google Scholar 

  9. Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through URL ordering. Computer Networks 30(1-7), 161–172 (1998)

    Google Scholar 

  10. Cockburn, A., McKenzie, B.: What do web users do? an empirical analysis of web use. Int. J. of Human-Computer Studies 54(6), 903–922 (2001)

    Article  MATH  Google Scholar 

  11. Davison, B.: Topical locality in the Web. In: Proc. 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 272–279 (2000)

    Google Scholar 

  12. Douglis, F.: What’s your PageRank? IEEE Internet Computing 11(4), 3–4 (2007)

    Article  Google Scholar 

  13. Fortunato, S., Boguna, M., Flammini, A., Menczer, F.: Approximating PageRank from in-degree. In: Aiello, W., Broder, A., Janssen, J., Milios, E.E. (eds.) WAW 2006. LNCS, vol. 4936, pp. 59–71. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Fortunato, S., Flammini, A., Menczer, F., Vespignani, A.: Topical interests and the mitigation of search engine bias. Proc. Natl. Acad. Sci. USA 103(34), 12684–12689 (2006)

    Article  Google Scholar 

  15. Gonçalves, B., Ramasco, J.J.: Human dynamics revealed through web analytics. Phys. Rev. E 78, 026123 (2008)

    Article  Google Scholar 

  16. Huberman, B., Pirolli, P., Pitkow, J., Lukose, R.: Strong regularities in World Wide Web surfing. Science 280(5360), 95–97 (1998)

    Article  Google Scholar 

  17. Catledge, L.D., Pitkow, J.E.: Characterizing browsing strategies in the World-Wide-Web. Computer Networks and ISDN Systems 27, 1065–1073 (1995)

    Article  Google Scholar 

  18. Liu, Y., Gao, B., Liu, T.Y., Zhang, Y., Ma, Z., He, S., Li, H.: BrowseRank: letting Web users vote for page importance. In: Proc. SIGIR, pp. 451–458 (2008)

    Google Scholar 

  19. Mathieu, F., Bouklit, M.: The effect of the back button in a random walk: application for PageRank. In: Proc. WWW Alternate Track Papers & Posters, pp. 370–371 (2004)

    Google Scholar 

  20. Meiss, M., Duncan, J., Gonçalves, B., Ramasco, J.J., Menczer, F.: What’s in a session: tracking individual behavior on the Web. In: Proc. HT (2009)

    Google Scholar 

  21. Meiss, M., Gonçalves, B., Ramasco, J.J., Flammini, A., Menczer, F.: Agents, bookmarks and clicks: A topical model of Web navigation. In: Proc. HT (2010)

    Google Scholar 

  22. Meiss, M., Menczer, F., Fortunato, S., Flammini, A., Vespignani, A.: Ranking web sites with real user traffic. In: Proc. WSDM, pp. 65–75 (2008)

    Google Scholar 

  23. Menczer, F.: Mapping the semantics of web text and links. IEEE Internet Computing 9(3), 27–36 (2005)

    Article  Google Scholar 

  24. Menczer, F., Pant, G., Srinivasan, P.: Topical web crawlers: Evaluating adaptive algorithms. ACM Transactions on Internet Technology 4(4), 378–419 (2004)

    Article  Google Scholar 

  25. Molloy, M., Reed, B.: A critical point for random graphs with a given degree sequence. Random Structures and Algorithms 6(2-3), 161–180 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  26. Noh, J.D., Rieger, H.: Random walks on complex networks. Phys. Rev. Lett. 92, 118701 (2004)

    Article  Google Scholar 

  27. Qiu, F., Liu, Z., Cho, J.: Analysis of user web traffic with a focus on search activities. In: Proc. 8th International Workshop on the Web and Databases (WebDB), pp. 103–108 (2005)

    Google Scholar 

  28. Radlinski, F., Joachims, T.: Active exploration for learning rankings from clickthrough data. In: Proc. KDD (2007)

    Google Scholar 

  29. Fortunato, S., Flammini, A., Menczer, F.: Scale-free network growth by ranking. Phys. Rev. Lett. 96, 218701 (2006)

    Article  Google Scholar 

  30. Tauscher, L., Greenberg, S.: How people revisit web pages: Empirical findings and implications for the design of history systems. Int. J. of Human-Computer Studies 47(1), 97–137 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Meiss, M.R., Gonçalves, B., Ramasco, J.J., Flammini, A., Menczer, F. (2010). Modeling Traffic on the Web Graph. In: Kumar, R., Sivakumar, D. (eds) Algorithms and Models for the Web-Graph. WAW 2010. Lecture Notes in Computer Science, vol 6516. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18009-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-18009-5_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-18008-8

  • Online ISBN: 978-3-642-18009-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics