Skip to main content

World Wide Web, Graph Structure

  • Reference work entry
Encyclopedia of Complexity and Systems Science
  • 591 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 3,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 549.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

Graph:

A  set of nodes (vertices) connected by links (edges, arcs). In the Web graph, the nodes are webpages, and the edges are the hyperlinks between them.

Indegree:

The number incoming edges to a node; in the case of the Web, it is the number of webpagespointing to a page.

Outdegree:

The number of outgoing edges; in the case of the Web, it is the number of webpages a webpage points to.

Strongly connected component:

A set of webpages such that any page can be reached from any other page by following hyperlinks.

Weakly connected component:

A set of webpages such that any page can be reached from any other page by treating the hyperlinks as undirected.

URL:

A unique resource locator that corresponds to an online information source.

Hypertext transfer protocol (HTTP):

The communications protocol that allows web clients to communicate with web servers.

Web server:

A program that responds to requests to for web pages.

Webpage:

An information resource, identified bya URL, that is usually but not necessarily in the HTML (Hypertext Markup Language) format.A webpage may be static, meaning that it is stored on theserver as a document, or dynamic, meaning that it isgenerated dynamically at the point that it is requested by thebrowser, using scripts and/or back‐end databases.

Domain:

A name that identifies one or more IP addresses, e. g. umich.edu.

Top level domain:

The suffix of the domain name, sometimescorresponding to the purpose of the website or the country of origin for the website, e. g. “edu”, “com”, “gov”, “uk”, “cn”, etc.

Website:

A collection of webpages that is hosted on one or more web servers and that share a common root URL, e. g. “www.springer.com”.

Randomized network:

A network that preserves the degrees of each node relative to the original, but the edges themselves are rewired.

Bibliography

Primary Literature

  1. Gray M (1997)Web growth summary.www.mit.edu/%7Emkgray/net/web-growth-summary.html. Accessed 9 March 2008

  2. Walton M (2006)Web reaches new milestone: 100 million sites.www.cnn.com/2006/TECH/internet/11/01/100millionwebsites/index.html.Accessed 9 March 2008

  3. Cho J, Garcia-Molina H, Haveliwala T, Lam W, Paepcke A, Raghavan S, Wesley G (2006)Stanford webbase components and applications.ACM Trans Inter Tech 6(2):153–186

    Google Scholar 

  4. Boldi P, Vigna S (2004) The webgraph framework I: compressiontechniques. In: Proceedings of the 13th International Conference on World Wide Web, New York, NY, pp 595–602

    Google Scholar 

  5. Boldi P, Codenotti B, Santini M, Vigna S (2004)UbiCrawler: a scalable fully distributed Web crawler.Softw Pract Experience 34(8):711–726

    Google Scholar 

  6. Adamic LA (1999)The Small World Web.Proc ECDL 99:443–452

    Google Scholar 

  7. Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S,Stata R, Tomkins A, Wiener J (2000)Graph structure in the Web.Comput Netw 33(1–6):309–320

    Google Scholar 

  8. Joshi A, Kumar R, Reed B, Tomkins A (2007)Anchor-based proximity measures. In: Proceedings of the 16th International World Wide Web Conference, Banff, Canada

    Google Scholar 

  9. Fetterly D, Manasse M, Najork M (2004) Spam, damn spam, and statistics:using statistical analysis to locate spam web pages. In: Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACMSIGMOD/PODS 2004, Paris, France, pp 1–6

    Google Scholar 

  10. Zhang B, Li H, Liu Y, Ji L, Xi W, Fan W, Chen Z, Ma WY (2005)Improving web search results using affinity graph.In: SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM, New York, pp 504–511

    Google Scholar 

  11. Broder AZ, Lempel R, Maghoul F, Pedersen J (2006)Efficient PageRank approximation via graph aggregation.Inf Retr 9(2):123–138

    Google Scholar 

  12. Adamic LA, Huberman BA (2002)Zipfs law and the internet.Glottometrics 3:143–150

    Google Scholar 

  13. Mitzenmacher M (2003)A brief history of generative models for power law and lognormal distributions.Internet Math 1(2):226–251

    MathSciNet  Google Scholar 

  14. Newman MEJ (2005)Power laws, Pareto distributions and Zipf's law.Contemp Phys 46(5):323–351

    ADS  Google Scholar 

  15. Barabási AL, Albert R (1999)Emergence of Scaling in Random Networks.Science 286(5439):509

    Google Scholar 

  16. Pennock DM, Flake GW, Lawrence S, Glover EJ, Giles CL (2002)Winners don't take all: Characterizing the competition for links on the web.Proc Natl Acad Sci 99(8):5207

    ADS  MATH  Google Scholar 

  17. Newman MEJ (2002)Assortative Mixing in Networks.Phys Rev Lett 89(20):208701

    ADS  Google Scholar 

  18. Dill S, Kumar R, McCurley KS, Rajogopalan S, Sivakumar D, Tomkins A (2002)Self-Similarity In the Web.ACM Trans Internet Technol 2(3):205–223

    Google Scholar 

  19. Albert R, Jeong H, Barabási AL (1999) Internet: Diameter of theworld-wide web. Nature 401:130–131. doi:10.1038/43601

  20. Garlaschelli D, Loffredo MI (2004)Patterns of link reciprocity in directed networks.Phys Rev Lett 93(26):268701

    ADS  Google Scholar 

  21. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002)Network Motifs: Simple Building Blocks of Complex Networks. Science 298(5594):824–827

    ADS  Google Scholar 

  22. Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U (2004) Superfamilies of designed and evolved networks.Science 303(5663):1538–1542

    ADS  Google Scholar 

  23. Brewington BE, Cybenko G (2000)How dynamic is the Web?Computer Netw 33(1–6):257–276

    Google Scholar 

  24. Fetterly D, Manasse M, Najork M, Wiener JL (2004)A large-scale study of the evolution of Web pages.Softw Pract Experience 34(2):213–237

    Google Scholar 

  25. Cho J, Garcia-Molina H (2003)Estimating frequency of change.ACM Trans Internet Technol 3(3):256–290

    Google Scholar 

  26. Douglis F, Feldmann A, Krishnamurthy B, Mogul J (1997)Rate of change and other metrics: a live study of the world wide web.USENIX Symposium on Internet Technologies and Systems, vol 119

    Google Scholar 

  27. Dasgupta A, Ghosh A, Kumar R, Olston C, Pandey S, Tomkins A (2007) Thediscoverability of the web. In: Proceedings of the 16th International Conference on World Wide Web, Banff, Canada,pp 421–430

    Google Scholar 

  28. Huberman BA, Pirolli PLT, Pitkow JE, Lukose RM (1998)Strong Regularities in World Wide Web Surfing.Science 280(5360):95

    ADS  Google Scholar 

  29. Lawrence S, Giles CL (1998)Searching the World Wide Web.Science 280(5360):98

    ADS  Google Scholar 

  30. Albert R, Barabási AL (2002)Statistical mechanics of complex networks.Rev Mod Phys 74(1):47–97

    Google Scholar 

  31. Simon HA (1955)On a Class of Skew Distribution Functions.Biometrika 42(3/4):425–440

    MathSciNet  MATH  Google Scholar 

  32. Yule GU (1925)A Mathematical Theory of Evolution, Based on the Conclusions of Dr. JC Willis, FRS.Philos Trans Roy Soc Lond Ser B, Containing Papers of a Biological Character 213:21–87

    Google Scholar 

  33. Price DS (1976)A general theory of bibliometric and other cumulative advantage processes.J Am Soc Inf Sci 27(5–6):292–306

    Google Scholar 

  34. Adamic LA, Huberman BA (2000)Power-law distribution of the world wide web.Science 287(5461):2115a

    ADS  Google Scholar 

  35. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’networks. Nature 393(6684):440–442

    ADS  Google Scholar 

  36. Albert R, Barabási A-L (2000) Topology of Evolving Networks: Local Events andUniversality. Phys Rev Lett 85(24):5234–5237

    Google Scholar 

  37. Dorogovtsev SN, Mendes JFF (2000) Scaling behaviour of developing and decayingnetworks. Europhys Lett 52(1):33–39

    ADS  Google Scholar 

  38. Bianconi G, Barabasi AL (2001) Competition and multiscaling in evolvingnetworks. Europhys Lett 54(4):436–442

    ADS  Google Scholar 

  39. Jeong H, Neda Z, Barabasi AL (2003) Measuring preferential attachment in evolvingnetworks. Europhys Lett 61(4):567–572

    ADS  Google Scholar 

  40. Vázquez A (2003) Growing network with local rules: Preferential attachment,clustering hierarchy, and degree correlations. Phys Rev E 67(5):56104

    Google Scholar 

  41. Jackson MO, Rogers BW (2007) Meeting Strangers and Friends of Friends: How RandomAre Social Networks? Am Econ Rev 97(3):890–915

    Google Scholar 

  42. Kumar R, Raghavan P, Rajagopalan S, Sivakumar D, Tomkins A, Upfal E (2000)Stochastic models for the web graph. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, Redondo Beach,p 57

    Google Scholar 

  43. Leskovec J, Kleinberg J, Faloutsos C (2007)Graph evolution: Densification and shrinking diameters.ACM Trans Knowl Discov Data (TKDD) 1(1)

    Google Scholar 

  44. Menczer F (2002)Growing and navigating the small world Web by local content.Proc Natl Acad Sci 99(22):14014–14019

    ADS  Google Scholar 

  45. Brin S, Page L (1998)The anatomy of a large-scale hypertextual Web search engine.Comput Netw ISDN Syst 30(1–7):107–117

    Google Scholar 

  46. Litvak N, Volkovich Y, Donato D (2007) Determining factors behind the pageranklog-log plot. Lecture notes in computer science 4863:108

    MathSciNet  Google Scholar 

  47. Haveliwala Taher H (2002) Topic-sensitive pagerank. In: WWW '02: Proceedings of the11th International Conference on World Wide Web, New York, pp 517–526

    Google Scholar 

  48. Henzinger MR, Heydon A, Mitzenmacher M, Najork M (2000) On near-uniformURL sampling. Comput Netw 33(1–6):295–308

    Google Scholar 

  49. Kleinberg JM (1999)Authoritative sources in a hyperlinked environment.J ACM 46(5):604–632

    MathSciNet  MATH  Google Scholar 

  50. Pass G, Chowdhury A, Torgeson C (2006) A picture of search. Infoscale '06, HongKong. In: Proceedings of the 1st international conference on Scalable information systems. ACM, New York, pp 1

    Google Scholar 

  51. Fortunato S, Flammini A, Menczer F, Vespignani A (2006)Topical interests and the mitigation of search engine bias.Proc Natl Acad Sci 103(34):12684

    ADS  Google Scholar 

  52. Kleinberg JM, Kumar R, Raghavan P, Rajagopalan S, Tomkins AS (1999) TheWeb as a Graph: Measurements, Models, and Methods. Computing and Combinatorics: 5th Annual International Conference, Cocoon'99, Tokyo,Japan

    Google Scholar 

  53. Dourisboure Y, Geraci F, Pellegrini M (2007)Extraction and classification of dense communities in the web. In:Proceedings of the 16th International Conference on World Wide Web, Banff, Canada, pp 461–470

    Google Scholar 

  54. Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Extractinglarge-scale knowledge bases from the web. In: Proceedings of the 25th Very Large Data Bases Conference, Edinburgh, UK,pp 639–650

    Google Scholar 

  55. Flake GW, Lawrence S, Giles CL, Coetzee FM (2002)Self-organization and identification of Web communities.Computer 35(3):66–70

    Google Scholar 

  56. Clauset A, Newman MEJ, Moore C (2004)Finding community structure in very large networks.Phys Rev E 70(6):66111

    ADS  Google Scholar 

  57. Danon L, Díaz-Guilera A, Duch J, Arenas A (2005)Comparing community structure identification.J Stat Mech Theor Exp 9:P09008

    Google Scholar 

  58. Castillo C, Donato D, Gionis A, Murdock V, Silvestri F (2007)Know your neighbors: Web spam detection using the web topology.In: Proceedings of SIGIR, pp 423–430.ACM Press, Amsterdam

    Google Scholar 

  59. Leskovec J, Dumais S, Horvitz E (2007) Web projections: learning fromcontextual subgraphs of the web. In: Proceedings of the 16th International Conference on World Wide Web, Banff, Canada,pp 471–480

    Google Scholar 

  60. Anderson C (2006) The long tail. Hyperion, NewYork

    Google Scholar 

  61. Kumar R, Novak J, Raghavan P, Tomkins A (2004)Structure and evolution of blogspace.Commun ACM 47(12):35–39

    Google Scholar 

  62. Liben-Nowell D, Novak J, Kumar R, Raghavan P, Tomkins A (2005)Geographic routing in social networks.Proc Natl Acad Sci 102(33):11623–11628

    ADS  Google Scholar 

  63. Travers J, Milgram S (1969)An experimental study of the small world problem.Sociometry 32:425–443

    Google Scholar 

  64. Kleinberg J (2000)Navigation in a small world.Nature 406(6798):845

    ADS  Google Scholar 

  65. Kumar R, Novak J, Raghavan P, Tomkins A (2005)On the bursty evolution of blogspace.World Wide Web 8(2):159–178

    Google Scholar 

  66. Hindman M, Tsioutsiouliklis K, Johnson JA (2003)Googlearchy: How a Few Heavily-Linked Sites Dominate Politics on the Web.Annual Meeting of the Midwest Political Science Association

    Google Scholar 

  67. Drezner DW, Farrell H (2004)The power and politics of blogs.Download at http://www.danieldrezner.com/research/blogpaperfinal.pdf

  68. Adamic LA, Glance N (2005) The political blogosphere and the 2004 USelection: divided they blog. In: Proceedings of the 3rd International Workshop on Link Discovery, Aug 21–25, Chicago, IL,pp 36–43

    Google Scholar 

  69. Newman MEJ (2006)Modularity and community structure in networks.Proc Natl Acad Sci 103(23):8577–8582

    ADS  Google Scholar 

  70. Gruhl D, Guha R, Liben-Nowell D, Tomkins A (2004)Information diffusion through blogspace.In: WWW '04: Proceedings of the 13th International Conference on World Wide Web.ACM Press, New York, pp 491–501

    Google Scholar 

  71. Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread ofinfluence through a social network. In: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Washington, DC, USA, pp 137–146

    Google Scholar 

  72. Adar E, Zhang L, Adamic LA, Lukose RM (2004) Implicit structure and the dynamics ofblogspace. Workshop on the Weblogging Ecosystem, New York, NY

    Google Scholar 

  73. Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N(2007) Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and DataMining, Washington, DC, USA, pp 420–429

    Google Scholar 

  74. Zlatic V, Bozicevic M, Stefancic H, Domazet M (2006)Wikipedias: Collaborative web-based encyclopedias as complex networks.Phys Rev E 74(1):016115

    ADS  Google Scholar 

  75. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time:densification laws, shrinking diameters and possible explanations. In: Conference on Knowledge Discovery in Data, Chicago, IL, USA,pp 177–187

    Google Scholar 

  76. Shi X, Tseng B, Adamic LA (2007)Looking at the Blogosphere Topology through Different Lenses.Proceedings of ICWSM'07.Boulder, CO, USA

    Google Scholar 

  77. Capocci A, Servedio VDP, Colaiori F, Buriol LS, Donato D, Leonardi S, Caldarelli G (2006)Preferential attachment in the growth of social networks: The internet encyclopedia wikipedia.Phys Rev E 74(3):036116

    ADS  Google Scholar 

  78. Halvey M, Keane MT, Smyth B (2006)Mobile web surfing is the same as web surfing.Commun ACM 49(3):76–81

    Google Scholar 

  79. Buyukkokten O, Garcia-Molina H, Paepcke A, Winograd T (2000)Power browser: efficient web browsing for pdas.In: CHI '00: Proceedings of the SIGCHI conference on Human Factors in Computing Systems.ACM, New York, pp 430–437

    Google Scholar 

Books and Reviews

  1. Aiello W, Broder A, Janssen J, Milios E (eds) (2006) Algorithms and Modelsfor the WebGraph: Proceedings of the 4th International Workshop, WAW 2006, Banff, Canada, 30 Nov–1 Dec

    Google Scholar 

  2. Bonato A (2005) A Survey of Models of theWeb graph. In: López-Ortiz A, Hamel A (eds) Combinatorial and Algorithmic Aspects of Networking. Lecture Notes inComputer Science, vol 3405. Springer, Berlin, pp 159

    Google Scholar 

  3. Bonato A, Chung FRK (eds) (2007) Algorithms and Models for the WebGraph:Proceedings of the 5th International Workshop, WAW 2007, San Diego, CA, USA

    Google Scholar 

  4. Caldarelli G (2007)Technological Networks: Internet and WWW in Scale-Free Networks: Complex Webs in Nature and Technology.Oxford University Press, Oxford

    Google Scholar 

  5. Huberman BA (2001)The Laws of the Web: Patterns in the Ecology of Information.MIT, Cambridge

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag

About this entry

Cite this entry

Adamic, L.A. (2009). World Wide Web, Graph Structure. In: Meyers, R. (eds) Encyclopedia of Complexity and Systems Science. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30440-3_591

Download citation

Publish with us

Policies and ethics