Skip to main content
Log in

Robustness of social and web graphs to node removal

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Given a social network, which of its nodes have a stronger impact in determining its structure? More precisely, which node-removal order has the greatest impact on the network structure? We approach this well-known problem for the first time in a setting that combines both web graphs and social networks. Our experiments are performed on datasets that are of orders of magnitude larger than those appearing in the previous literature: this is possible, thanks to some recently developed algorithms and software tools that approximate accurately the number of reachable pairs and the distribution of distances in large graphs. Our experiments highlight deep differences in the structure of social networks and web graphs, show significant limitations of previous experimental results; at the same time, they reveal clustering by label propagation as a new and very effective way of locating nodes that are important from a structural viewpoint.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The reader might find this definition a bit vague, and some variants are often spotted in the literature: this is a general problem, also highlighted recently by Li et al. (2005).

  2. It should be remarked that the Milgram’s experiment tried to prove two properties at the same time. First, the average distance between individuals is much smaller than expected; second, the individuals are able to exploit such a feature to route messages along short paths, albeit they only possess local information about the network they live in. This second property is, in a sense, not only more interesting than the former, but also more difficult to describe and study, because it has to do with some information that the nodes possess about the environment they inhabit.

  3. A reachable pair is a pair of nodes \(\langle x, y\rangle\) such that there is a directed path from x to y; the distance distribution of a graph is a discrete distribution that gives, for every t, the fraction of reachable pairs of nodes that are at distance t.

  4. Observe that we delete nodes but count the percentage of arcs (rather than nodes) that have been removed: this choice is justified by the fact that otherwise node orderings that put large-degree nodes first would certainly be considered (unfairly) more disruptive.

  5. We remark that several proposals have been made to find features that highlight such structural differences in a computationwise-feasible way (e.g., assortative mixing (Newman and Park 2003)), but all instances we are aware of have been questioned by the subsequent literature, so no clear-cut results are known so far. An exception is the idea of considering the spid [shortest-path index of dispersion (Boldi et al. 2011a)], which is experimentally larger than the one for web graphs and smaller than the one for social networks. For instance, the spid of the entire Facebook graph is 0.09 (Backstrom et al. 2012).

  6. Actually, the notion had been introduced before by Marchiori and Latora (2000) and named connectivity length, but we find the name “harmonic diameter” much more insightful.

  7. We purposely use the word “divergence” between distributions, instead of “distance”, to avoid confusion with the notion of distance in a graph.

  8. It is mostly a matter of taste whether to use directed symmetric graphs or simple undirected graphs. In our case, since we have to cope with both directed and undirected graph, we prefer to speak of directed graphs that are symmetric, that is, for every arc xy, there is a symmetric arc yx.

  9. Label propagation has been independently proposed under the name of peer pressure clustering by Gilbert et al. (2007).

  10. There are sampled variants of Brandes’s algorithm (Brandes and Erlebach 2005a), but the Hoeffding bound providing precision guarantees requires \(\Uptheta(n^4\log n/\varepsilon^2)\) visits to obtain absolute precision \(\varepsilon. \)

  11. In Boldi et al. (2011b), we also presented the outcomes of similar experiments performed on other networks (Amazon, Enron and . it ) that agree with the ones shown here. Our tables and graphs slightly differ from those previously published in (Boldi et al. 2011b), because we had time to generate more runs, and thus, increase the precision of our results.

  12. http://law.di.unimi.it/. In particular, the graphs we used are the datasets named hollywood-2011, ljournal-2008, orkut-2007, in-2004 and uk-2007-05. Note that isolated nodes have been removed from hollywood-2011.

  13. http://webgraph.di.unimi.it/.

  14. We remark that in some cases, the measure is negative or does not decrease monotonically. This is sometimes an artifact of the probabilistic technique used to estimate our measures—small relative errors are unavoidable.

  15. In this paper, like in all the other experimental research on the same topic, conclusions about social networks should be taken with a grain of salt, due to the heterogeneity of such networks and the lack of a large repertoire of examples.

References

  • Albert R, Jeong H, Barabási A-L (2000) Error and attack tolerance of complex networks. Nature 406:378–382

    Article  Google Scholar 

  • Anthonisse JM (1971) The rush in a graph. Technical report, University of Amsterdam Mathematical Centre, Amsterdam

  • Backstrom L, Boldi P, Rosa M, Ugander J, Vigna S (2012) Four degrees of separation. In: ACM Web Science 2012: Conference Proceedings, pp 45–54 (Best paper award)

  • Bavelas A (1950) Communication patterns in task-oriented groups. J Acoust Soc Am 57:271–282

    Google Scholar 

  • Boldi P, Vigna S (2012a) Four degrees of separation, really. Arxiv preprint arxiv:1205.5509

  • Boldi P, Vigna S (2012b). Harmonic centrality (in preparation)

  • Boldi P, Santini M, Vigna S (2009) Page Rank: functional dependencies. ACM Trans Inf Sys 27(4):1–23

    Article  Google Scholar 

  • Boldi P, Rosa M, Vigna S (2011a) HyperANF: approximating the neighbourhood function of very large graphs on a budget. In: Srinivasan S, Ramamritham S, Kumar A, Ravindra MP, Bertino E, Kumar R (eds) Proceedings of the 20th international conference on World Wide Web. ACM, pp 625–634

  • Boldi P, Rosa M, Vigna S (2011b) Robustness of social networks: Comparative results based on distance distributions. In: Social Informatics, Third International Conference, SocInfo 2011. Lecture Notes in Computer Science, vol 6894. Springer, Berlin, pp 8–21

  • Borgatti SP (2005) Centrality and network flow. Soc Netw 27(1):55–71

    Article  Google Scholar 

  • Borgatti SP (2006) Identifying sets of key players in a social network. Comput Math Organ Theory 12:21–34

    Article  MATH  Google Scholar 

  • Borgatti SP, Carley KM, Krackhardt D (2006) On the robustness of centrality measures under conditions of imperfect data. Soc Netw 28(2):124–136

    Article  Google Scholar 

  • Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177

    Article  MATH  Google Scholar 

  • Brandes U, Erlebach T (eds) (2005a) Network analysis: methodological foundations. Lecture Notes in Computer Science, vol 3418. Springer, Berlin

  • Brandes U, Erlebach T (2005b) Network analysis: methodological foundations (Lecture Notes in Computer Science). Number 3418 in Lecture Notes in Computer Science. Springer, Berlin

  • Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 219–228

  • Cohen E (1997) Size-estimation framework with applications to transitive closure and reachability. J Comput Syst Sci 55:441–453

    Article  MATH  Google Scholar 

  • Cohen R, Havlin S (2010) Complex networks: structure, robustness and function. Cambridge University Press, Cambridge

  • Donato D, Leonardi S, Millozzi S, Tsaparas P (2008) Mining the inner structure of the web graph. J Phys A: Math Theor 41(22):224017

    Article  MathSciNet  Google Scholar 

  • Efron B, Gong G (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Stat 37(1):36–48

    MathSciNet  Google Scholar 

  • Flajolet P, Fusy É, Gandouet O, Meunier F (2007) HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of the 13th conference on analysis of algorithm (AofA 07), Juan-les-Pins, pp 127–146

  • Fogaras D (2003) Where to start browsing the web? In: Innovative Internet Community Systems, Third International Workshop, IICS 2003. Lecture Notes in Computer Science, vol 2877. Springer, Leipzig, pp 65–79

  • Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41

    Article  Google Scholar 

  • Gilbert JR, Reinhardt S, Shah V (2007) High-performance graph algorithms from parallel sparse matrices. In: Kagstrom B, Elmroth E, Dongarra J, Wasniewski J (eds) Applied parallel computing. State of the Art in Scientific Computing (8th PARA’06). Lecture Notes in Computer Science, vol 4699. Springer, New York, pp 260–269

  • Kendall MG (1945) The treatment of ties in ranking problems. Biometrika 33(3):239–251

    Article  MathSciNet  MATH  Google Scholar 

  • Langville AN, Meyer CD (2004) Deeper inside Page Rank. Internet Math 1(3):355–400

    Article  MathSciNet  Google Scholar 

  • Li L, Alderson DL, Doyle J, Willinger W (2005) Towards a theory of scale-free graphs: definition, properties, and implications. Internet Math 2(4):431–523

    Article  MathSciNet  MATH  Google Scholar 

  • Marchiori M, Latora V (2000) Harmony in the small-world. Physica A: Stat Mech Appl 285(3-4):539–546

    Article  MATH  Google Scholar 

  • Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07), San Diego

  • Newman MEJ, Park J (2003) Why social networks are different from other types of networks. Phys Rev E 68(3):036122

    Article  Google Scholar 

  • Page L, Brin S, Motwani R, Winograd T (1998) The Page Rank citation ranking: bringing order to the web. Technical report, Stanford Digital Library Technologies Project, Stanford University, Stanford

  • Palmer CR, Gibbons PB, Faloutsos C (2002) Anf: a fast and scalable tool for data mining in massive graphs. In: KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 81–90

  • Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106. doi:10.1103/PhysRevE.76.036106

    Google Scholar 

  • Travers J, Milgram S (1969) An experimental study of the small world problem. Sociometry 32(4):425–443

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastiano Vigna.

Additional information

Paolo Boldi and Sebastiano Vigna have been partially supported by a Yahoo! faculty grant, by MIUR (Italian Ministry of University and Research) and by the EU-FET grant NADINE (GA 288956).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boldi, P., Rosa, M. & Vigna, S. Robustness of social and web graphs to node removal. Soc. Netw. Anal. Min. 3, 829–842 (2013). https://doi.org/10.1007/s13278-013-0096-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-013-0096-x

Keywords

Navigation