Skip to main content
Log in

Summary of WWW characterizations

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

To date there have been a number of efforts that attempt to characterize various aspects of the World Wide Web. This paper presents a summary of these efforts, highlighting regularities and insights that have been discovered across the variety of access points available for instrumentation. Characterizations that are derived from client, proxy, and server instrumentation are reviewed as well as efforts to characterize the entire structure of the WWW. Given the dynamic nature of the Web, it may be surprising for some readers to find that many properties of the Web follow regular and predictable patterns that have not changed in form over the Web's lifetime. Understanding these aspects as well as those that vary is critical to designing a better Web, and as a direct consequence, creating a more enjoyable user experience.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abdulla, G., E. Fox, and M. Abrams (1997), "Shared User Behavior on the World wide Web," In Proceedings of WebNet 97, Toronto, Canada. http://www.cs.vt.edu/_chitra/docs/97webnet/

  • Abdulla, G., B. Liu, R. Saad, and E.A. Fox (1997), "Characterizing WWW Queries," Technical Report TR-97-04, Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA. http://csgrad.cs.vt.edu/_abdulla/ckim/ WWWquery.ps

    Google Scholar 

  • Abdulla, G., A. Nayfeh, and E. Fox (1997), "A Realistic Model of Request Arrival Rate to Caching Proxies," submitted for publication. http://www.cs.vt.edu/_chitra/docs/ abdulla-nayfeh-fox/paper.pdf

  • Abdulla, G. (1998), "Analysis and Modeling of World Wide Web Traffic," Doctoral thesis, Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA. http://www.cs.vt.edu/_chitra/docs/nrgpub/ gdiss.pdf

    Google Scholar 

  • Abrams, M., C.R. Standridge, G. Abdulla, S. Williams, and E. Fox (1995), "Caching Proxies: Limitations and Potentials," The World Wide Web Journal 1, 1. http://www.w3.org/Journal/1/abrams.155/paper/ 155.html

    Google Scholar 

  • Almeida, V., A. Bestavros, M. Crovella, and A. de Oliveira (1996), "Characterizing Reference Locality in the WWW," In Proceedings of PDIS '96: The IEEE Conference on Parallel and Distributed Information Systems, Miami Beach, FL. http://www.cs.bu.edu/_best/res/papers/pdis96.ps

  • Arlitt, M. and C. Williamson (1996), "Web server workload characterization: the search for invariants," In Proceedings of the ACM SIGMETRICS Conference, Philadelphia, PA.

  • Barford, P. and M. Crovella (1997), "Generating Representative Web Workloads for Network and Server Performance Evaluation," In Proceedings of ACM SIGMETRICS Conference. http://www.cs.bu.edu/techreports/ 97-006-surge.ps.Z

  • Bestavros, A. (1995), "Demand-Based Resource Allocation to Reduce Traffic and Balance Load in Distributed Information Systems," In Proceedings of SPDP '95: The 7th IEEE Symposium on Parallel and Distributed Processing, San Antonio, TX. http://www.cs.bu.edu/faculty/best/res/papers/ spdp95.ps

  • Bestavros, A., R. Carter, and M. Crovella (1995), "Application-Level Document Caching in the Internet," In Proceedings of the 2nd International Workshop on Services in Distributed and Networked Environments (SDNE '95), Whistler, Canada. http://www.cs.bu.edu/faculty/best/res/papers/ sdne95.ps

  • Bestavros, A. (1996), "Speculative Data Dissemination and Service to Reduce Server Load, Network Traffic and Service Time for Distributed Information Systems," In Proceedings of ICDE '96: The 1996 International Conference on Data Engineering, New Orleans, LA. http://www.cs.bu.edu/faculty/best/res/papers/ icde96.ps

  • Bharat, K. and A. Broder (1998), "A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines," In Proceedings of the 7th International World Wide Web ConferenceElsevier Science, Brisbane, Australia. http://decweb.ethz.ch/WWW7/1937/com1937.ht

    Google Scholar 

  • Bolot, J. and P. Hoschka (1996), "Performance Engineering of the World Wide Web: Application to Dimensioning and Cache Design," In Proceedings of the 5th International WWW Conference, Paris, France. http://www5conf.inria.fr/fich html/papers/P44/ Overview.html

  • Braun, H. and K. Claffy (1994), "Web Traffic Characterization: An Assessment of the Impact of Caching Documents From NCSA's Web Server," In Proceedings of the 2nd International WWW Conference, Chicago, IL. http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/ DDay/claffy/main.html

  • Bray, T. (1996), "Measuring the Web," The World Wide Web Journal 1, 3. http://www5conf.inria.fr/fich html/papers/P9/ Overview.html

  • Burchard, P. (1995), "Statistical Properties of the WWW," http://www.cs.princeton.edu/_burchard/www/ stats/

  • Cáceres, R., F. Douglis, A. Feldmann, G. Glass, and M. Rabinovich (1998), "Web Proxy Caching: The Devil is in the Details," In Proceedings of the ACM SIGMETRICS Workshop on Internet Server Performance. http://www.research.att.com/_ramon/papers/ wisp98.ps.gz

  • Cao, P. and S. Irani (1997), "Cost-Aware WWW Proxy Caching Algorithms," In Proceedings of the 1997 USENIX Symposium on Internet Technologies and Systems, Monterey, CA. http://www.usenix.org/publications/library/ proceedings/usits97/cao.html

  • Catledge, L.D. and J.E. Pitkow (1995), "Characterizing Browsing Strategies in the World-Wide Web," Computer Networks and ISDN Systems 26, 6, 1065–1073. http://www.igd.fhg.de/www/www95/papers/80/ userpatters/UserPaterns.Paper4.formatted.html

    Article  Google Scholar 

  • Crovella, M. and A. Bestavros (1995), "Explaining World Wide Web Traffic Self-Similarity," Technical Report BUCS-TR-95-015, Department of Computer Science, Boston University, Boston, MA. http://www.cs.bu.edu/techreports/ 95-015-explaining-web-self-similarity.ps.Z

    Google Scholar 

  • Crovella, M. and A. Bestavros (1996), "Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes", In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems, Philadelphia, PA. http://www.cs.bu.edu/_best/res/papers/ sigmetrics96.ps

  • Crovella, M., M. Taqqu, and A. Bestavros (1997), "Heavy-Tailed Probability Distributions in the World Wide Web," In Applications of Heavy-Tailed Probability Distributions,, <nt>Eds.</nt>, Birkhäuser, Boston, MA

  • Cunha, C.R., A. Bestavros, and M. Crovella (1995), "Characteristics of WWW Client-Based Traces," Department of Computer Science, Boston University, Boston, MA. http://www.cs.bu.edu/techreports/ 95-010-www-client-traces.ps.Z

    Google Scholar 

  • Cunha, C. (1997), "Trace Analysis and Its Application to Performance Enhancements of Distributed Information Systems," Doctoral thesis, Department of Computer Science, Boston University, Boston, MA. http://www.cs.bu.edu/students/grads/carro/ thesis.ps.Z

    Google Scholar 

  • Cunha, C. and C.F.B. Joccoud (1997), "Determining WWW User's Next Access and Its Application to Pre-Fetching," In Proceedings of the International Symposium on Computers and Communication, Alexandria, Egypt.

  • Douglis, F., A. Feldmann, B. Krishnamurthy, and J. Mogul (1997), "Rate of Change and Other Metrics: A Live Study of the World Wide Web," In Proceedings of the 1997 USENIX Symposium on Internet Technologies and Systems, Monterey, CA. http://www.usenix.org/publications/library/ proceedings/usits97/douglis rate.html

  • Duska, B.M., D. Marwood, and M.J. Feeley (1997), "The Measured Access Characteristics of World-Wide-Web Client Proxy Caches," In Proceedings of the 1997 USENIX Symposium on Internet Technologies and Systems, Monterey, CA. http://www/usenix.org/publications/library/ proceedings/usits97/duska.html

  • Glassman, S. (1994), "A Caching Relay for the World Wide Web," Computer Networks and ISDN Systems 27, 2. http://www1.cern.ch/PapersWWW94/steveg.ps

  • Gribble, S.D. and E.A. Brewer (1997), "System Design Issues for Internet Middleware Services: Eductions From a Large Client Trace," In Proceedings of the USENIX Symposium on Internet Technologies and Systems, Monterey, CA. http://HTTP.CS.Berkeley.EDU/_gribble/papers/ sys trace.ps.gz

  • Gwertzman, J. and M. Seltzer (1996), "World Wide Web Cache Consistency," In Proceedings of the 1996 Usenix Technical Conference, Harvard College, Boston, MA. http://www.eecs.harvard.edu/_vino/web/ usenix.196/

    Google Scholar 

  • Huberman, B., P. Pirolli, J. Pitkow, and R. Lukose (1998) "Strong Regularities in WWW Surfing," Science 280. http://www.sciencemag.org/cgi/content/abstract/ 280/5360/95

  • Lawrence, S. and C.L. Giles (1998), "Searching the world Wide Web," Science 280. http://www.sciencemag.org/cgi/content/abstract/ 280/5360/98

  • Leland, W., M. Taqqu, W. Willinger, and D. Wilson (1993), On the self-similar nature of Ethernet traffic, In Proceedings of ACM SIGCOMM '93, San Francisco, CA.

  • Lorenzetti, P., L. Rizzo, and L. Vicisanno (1996), "Replacement Policies for a Proxy Cache," Technical Report LR-960731, DEIT, University of Pisa, Italy. http://www.iet.unipi.it/luigi/caching.ps Rewritten version available as Technical Report RN-98-13 by Rizzo, L. and L. Vicasano (1998). http://www.iet.unipi.it/luigi/lrv98.ps.gz

    Google Scholar 

  • Luotonen, A. and K. Altis (1994), "World-Wide Web Proxies," Computer Networks and ISDN Systems 27, 2. http://www1.cern.ch/PapersWWW94/luotonen.ps

  • Manley, S. "An Analysis of Issues Facing World Wide Web Servers," Bachelor of Arts, Department of Computer Science, Harvard College, Cambridge, MA. http://www.eecs.harvard.edu/~vino/web/manley thesis.ps.gz

  • Manley, S. and M. Seltzer (1997), "Web Facts and Fantasy," In Proceedings of the 1997 USENIX Symposium on Internet Technologies and Systems, Monterey, CA. http://www.eecs.harvard.edu/~vino/web/ sits.97.html

  • Manley, S., M. Courage, and M. Seltzer (1997) "A Self-Scaling and Self-Configuring Benchmark for Web Servers," unpublished document. http://www.eecs.harvard.edu/~margo/papers/ hbench-web.ps

  • Mauldin, M. and J. Leavitt (1994) "Web Agent Related Research at the Center for Machine Translation," Meeting of the ACM Special Interest Group on Networked Information Discovery and Retrieval, McLean, http://fuzine.mt.cs.cmu.edu/mlm/signidr94.html

  • Mogul, J. (1995), "Network Behavior of a Busy Web Server and Its Clients," Digital Western Research Laboratory, CA. ftp://gatekeeper.dec.com/pub/DEC/WRL/ research-reports/WRL-TR-95.5.ps

    Google Scholar 

  • Mogul, J. (1996), "Digital's Web Proxy Traces," <nt>Online reference</nt>. ftp://ftp.digital.com/pub/DEC/traces/proxy/ webtraces.v1.2.html

  • Nabeshima, M. (1997), "The Japan Cache Project: An Experiment on Domain Cache," In Proceedings of the Sixth International WWW Conference, Santa Clara, CA. http://www6.nttlabs.com/HyperNews/get/ PAPER21.html

  • O'Callaghan, D. (1995), "A Central Caching Proxy Server for WWW Users at the University of Melbourne," In Proceedings of AusWeb95, the 1st Australian WWW Conference, University of Melbourne, Australia. http://www.its.unimelb.edu.au/papers/AW12-02/

    Google Scholar 

  • Padmanabhan, V.N. and J.C. Mogul (1996), "Using Predictive Pre-fetching to Improve World Wide Web Latency," Computer Communications Review 26(July 1996).

  • Pitkow, J. and M. Recker (1994), "A Simple Yet Robust Caching Algorithm Based on Document Access Patterns," In Proceedings of the Second International WWW Conference, Chicago, IL. http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/ DDay/pitkow/caching.html

  • Pitkow, J. and P. Pirolli (1997), "Life, Death and Lawfulness on the Electronic Frontier," In Proceedings of Human Factors in Computing Systems (CHI '97), Atlanta, GA. http://www.acm.org/sigchi/chi97/proceedings/ paper/jp-www.htm

  • Pitkow, J. (1997), "In Search of Reliable Usage Data," In Proceedings of the 6th International WWW Conference, Santa Clara, CA. http://www6.nttlabs.com/HyperNews/get/ PAPER126.html

  • Scheuermann, P., J. Shim, and R. Vingralek (1997), "A Case for Delay-Conscious Caching of Web Documents," In Proceedings of the 6th International WWW Conference, Santa Clara, CA. http://www6.nttlabs.com/HyperNews/get/ PAPER20.html

  • Sedayao, J. (1994), " 'Mosaic Will Kill Me Network!' Studying Network Traffic Patterns of Mosaic Use," In Proceedings of the Second International WWW Conference, Chicago, IL. http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/ DDay/sedayao/mos traf paper.html

  • Smith, N. (1994), "What Can Archives Offer the World-Wide Web," In Proceedings of the First International WWW Conference, Geneva, Switzerland. http://www1.cern.ch/PapersWWW94/ngs.ps

  • Tauscher, L. (1996), "Evaluating History Mechanisms: An Empirical Study of Reuse Patterns in World Wide Web Navigation," M.S. thesis, Department of Computer Science, University of Calgary, Alberta, Canada. http://www.cpsc.ucalgary.ca/Redirect/grouplab/ paper/96-Tauscher.Thesis/thesis.html

    Google Scholar 

  • Tauscher, L. and S. Greenberg (1996), "How People Revisit Web Pages: Empirical Findings and Implications for the Design of History Systems," International Journal of Human Computer Studies 47, 1. http://www/cpsc.ucalgary.ca/Redirect/ grouplab/papers/97-HowUsersRepeat.IJHCS/ RevisitArticle.ps.zip

  • The W3C HTTP-NG Web Characterization Group: Boston University Ocean's Group, Harvard College's Vino Group, INRIA, Microsoft, Netscape, Virginia Tech's Network Resource Group, and Xerox PARC (1997), Work in progress.

  • Williams, S., M. Abrams, C. Standridge, G. Abdulla, and E. Fox (1996), "Removal Policies in Network Caches for World-Wide Web Documents," In Proceedings of ACM SIGCOMM '96, Stanford, CA. http://ei.cs.vt.edu/_succeed/96sigcomm/

  • Woodruff, A., P. Aoki, E. Brewer, P. Gauthier, and L. Rowe (1996), "An Investigation of Documents From the World Wide Web," The World Wide Web Journal 1, 3. http://www5conf.inria.fr/fich html/papers/P7/ Overview.html

  • Wooster, R. (1996), "Optimizing Response Time, Rather Than Hit Rates, of WWW Proxy Caches," M.S. thesis, Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA. http://scholar.lib.vt.edu/theses/materials/ public/etd-34131420119653540/etd-title.html

    Google Scholar 

  • Wooster, R. and M. Abrams (1997), "Proxy Caching That Estimates Page Load Delays," In Proceedings of the Sixth International WWW Conference, Santa Clara, CA. http://www6.nttlabs.com/HyperNews/get/ PAPER250.html

  • Worrell, K. (1994) "Invalidation in Large Scale Network Object Caches," M.S. thesis, Department of Computer Science, University of Colorado, Boulder, CO. ftp://ftp.cs.colorado.edu/pub/techreports/ schwartz/WorrellThesis.ps.Z

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pitkow, J.E. Summary of WWW characterizations. World Wide Web 2, 3–13 (1999). https://doi.org/10.1023/A:1019284202914

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1019284202914

Keywords

Navigation