Skip to main content
Log in

On Web’s contact structure

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The explosive growth of Web and social networks has revealed the need to (re-)analyze the connection structure of the underlying graphs. Kleinberg et al. in 1999 has defined Web-Graph as the graph induced by the (directed) hyperlinks between the (static) Web pages. Since then, the Web-graph model has been the basis for representing the Web connection structure. In the early ‘90s, many authors conjectured that Web-graph was a scale-free network containing very few huge-degree nodes and many small-degree nodes. Over the years, this conjecture has established itself, although with some minor limitations. Our paper aims to contribute to a deeper understanding of some Web dynamics such as the spreading of information and viruses, the measurement of the real attractiveness of a site or the estimation of a potential power of influence in the propagating of ideas or in the promotion of products. Our working hypothesis was that Web’s spreading dynamics are better analyzable considering the actual contact between nodes in a time interval, since the physical link has some drawbacks: (a) it is quite static, (b) it may remain inactive, (c) many accesses to Web resources are performed online, typing something in an address-bar. We show that Web is still a scale-free network, with three main classes of nodes: very few huge nodes, the hubs, a significant number of intermediate nodes, an huge number of small nodes. Note that mini-hubs meet regularly in the real world: many sites present millions of daily contacts; however, they do not reach the size of an hub, but they are not so small that they can be ignored, like small-degree nodes. We suspect that they may be responsible for the local spread of viruses or information on the Web. To extract the data presented in this paper, we analyzed 19 categories of coherent sites, sub-divided into 309 sub-categories, for a total of 56,450 different sites, and we considered the contacts for the entire month of July 2017, where the average number of unique visitors were 257,022, who visited an average of 4,050,957 pages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://www.worldwidewebsize.com/.

  2. These statistics are drawn from http://www.internetlivestats.com/.

  3. https://www.netcraft.com/active-sites/.

  4. http://publicsuffix.org/list/.

  5. https://www.alexa.com/.

  6. https://www.alexa.com/about/.

  7. http://www.alexa.com/about.

  8. https://support.alexa.com/hc/en-us/articles/200449614.

References

  • Albert R, Jeong H, Barabási AL (1999) Internet: diameter of the world-wide web. Nature 401(6749):130–131

    Article  Google Scholar 

  • Borodin A, Braverman M, Lucier B, Oren J (2017) Strategyproof mechanisms for competitive influence in networks. Algorithmica 78(2):425–452

    Article  MathSciNet  MATH  Google Scholar 

  • Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J (2000) Graph structure in the web. Comput Netw 33(1–6):309–320

    Article  Google Scholar 

  • Chen W, Lakshmanan LVS, Castillo C (2013) Information and influence propagation in social networks. Synthesis lectures on data management. Morgan & Claypool Publishers, San Rafael

    Google Scholar 

  • Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703

    Article  MathSciNet  MATH  Google Scholar 

  • Diestel R (2017) Graph theory. Graduate texts in mathematics, 5th edn. Springer, New York

    Google Scholar 

  • Dill S, Kumar R, Mccurley KS, Rajagopalan S, Sivakumar D, Tomkins A (2002) Self-similarity in the web. ACM Trans Internet Technol 2(3):205–223

    Article  Google Scholar 

  • Domingos P, Richardson M (2001) Mining the network value of customers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’01, pp 57–66

  • Donato D, Laura L, Leonardi S, Millozzi S (2007) The web as a graph: how far we are. ACM Trans Internet Technol 7:1

    Article  Google Scholar 

  • Harary F (1999) Graph theory. Perseus Books, Reading

    MATH  Google Scholar 

  • Italiano GF, Parotsidis N, Perekhodko E (2017) What’s inside a bow-tie: analyzing the core of the web and of social networks. In: Proceedings of the 2017 international conference on information system and data mining, ACM, New York, NY, USA, ICISDM ’17, pp 39–43

  • Kempe D, Kleinberg J, Tardos E (2015) Maximizing the spread of influence through a social network. Theory Comput 11(4):105–147

    Article  MathSciNet  MATH  Google Scholar 

  • Kleinberg JM, Kumar R, Raghavan P, Rajagopalan S, Tomkins AS (1999) The web as a graph: measurements, models, and methods. In: Proceedings of the 5th annual international conference on computing and combinatorics, Springer-Verlag, Berlin, Heidelberg, COCOON’99, pp 1–17

  • Meusel R, Vigna S, Lehmberg O, Bizer C (2015) The graph structure in the web analyzed on different aggregation levels. J Web Sci 1(1):33–47

    Article  Google Scholar 

  • Mossel E, Roch S (2010) Submodularity of influence in social networks: from local to global. SIAM J Comput 39(6):2176–2188

    Article  MathSciNet  MATH  Google Scholar 

  • Narayanan L, Wu K (2018) How to choose friends strategically. Theor Comput Sci. https://doi.org/10.1016/j.tcs.2018.07.013

    MATH  Google Scholar 

  • Newman M (2005) Power laws, pareto distributions and Zipf’s law. Contemp Phys 46(5):323–351

    Article  Google Scholar 

  • Newman M (2010) Networks: an introduction. Oxford University Press Inc, New York

    Book  MATH  Google Scholar 

  • Seeman L, Singer Y (2013) Adaptive seeding in social networks. In: Proceedings of the 2013 IEEE 54th annual symposium on foundations of computer science, IEEE Computer Society, Washington, DC, USA, FOCS ’13, pp 459–468

  • Serrano MA, Maguitman A, Boguñá M, Fortunato S, Vespignani A (2007) Decoding the structure of the www: a comparative analysis of web crawls. ACM Trans Web 1:2

    Article  Google Scholar 

  • Singer Y (2012) How to win friends and influence people, truthfully: influence maximization mechanisms for social networks. In: Proceedings of the fifth ACM international conference on web search and data mining, ACM, New York, NY, USA, WSDM ’12, pp 733–742

  • Tan S, Lu J, Lin Z (2016) Emerging behavioral consensus of evolutionary dynamics on complex networks. SIAM J Control Optim 54:3258–3272

    Article  MathSciNet  MATH  Google Scholar 

  • van den Bosch A, Bogers T, de Kunder M (2016) Estimating search engine index size variability: a 9-year longitudinal study. Scientometrics 107(2):839–856

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Postiglione.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Graphics

Appendix: Graphics

Fig. 1
figure 1

Unique visitors: entire corpus (56,450 sites)

Fig. 2
figure 2

Pageviews: entire corpus (56,450 sites)

Fig. 3
figure 3

Unique visitors: 9248 sites with rank \(\le 100,000\)

Fig. 4
figure 4

Pageviews: 9248 sites with rank \(\le 100,000\)

Fig. 5
figure 5

Unique visitors: first 1000 sites by rank

Fig. 6
figure 6

Pageviews: first 1000 sites by rank

Fig. 7
figure 7

Pageviews: 9248—log sites with rank \(\le 100,000\)

Fig. 8
figure 8

Pageviews: 54,450—log sites

Fig. 9
figure 9

Pageviews: 9248—\(log^2\) sites with rank \(\le 100,000\)

Fig. 10
figure 10

Pageviews: 54,450—\(log^2\) sites

Fig. 11
figure 11

Pageviews: 9248—2.5% sites with rank \(\le 100,000\)

Fig. 12
figure 12

Pageviews: 54,450—2.5% sites

Fig. 13
figure 13

Pageviews: 9248—5% sites with rank \(\le 100,000\)

Fig. 14
figure 14

Pageviews: 54,450—5% sites

Fig. 15
figure 15

Pageviews: 9248—10% sites with rank \(\le 100,000\)

Fig. 16
figure 16

Pageviews: 54,450—10% sites

Fig. 17
figure 17

Pageviews: 9248—15% sites with rank \(\le 100,000\)

Fig. 18
figure 18

Pageviews: 54,450—15% sites

Fig. 19
figure 19

Pageviews: 9248—25% sites with rank \(\le 100,000\)

Fig. 20
figure 20

Pageviews: 54,450—25% sites

Fig. 21
figure 21

Pageviews: 9248—50% sites with rank \(\le 100,000\)

Fig. 22
figure 22

Pageviews: 54,450—50% sites

Fig. 23
figure 23

Pageviews: first 25% sites with rank \(\le 100,000\)

Fig. 24
figure 24

Pageviews: first 25% sites

Fig. 25
figure 25

Pageviews: second 25% sites with rank \(\le 100,000\)

Fig. 26
figure 26

Pageviews: second 25% sites

Fig. 27
figure 27

Pageviews: third 25% sites with rank \(\le 100,000\)

Fig. 28
figure 28

Pageviews: third 25% sites

Fig. 29
figure 29

Pageviews: last 25% sites with rank \(\le 100,000\)

Fig. 30
figure 30

Pageviews: last 25% sites

Fig. 31
figure 31

Pageviews: 54,450—25% random sites

Fig. 32
figure 32

Pageviews: 54,450—50% random sites

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Postiglione, A., De Bueriis, G. On Web’s contact structure. J Ambient Intell Human Comput 10, 2829–2841 (2019). https://doi.org/10.1007/s12652-018-1002-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-1002-1

Keywords