Abstract
Social networks are a source of large scale graphs. We study how social network algorithms behave on sparsified versions of such networks with two motivations in mind:
-
1.
In practice, it is challenging to collect, store and process the entire often constantly growing network, so it is important to understand how algorithms behave on incomplete views of a network.
-
2.
Even if one has the full network, algorithms may be infeasible at such large scale, and the only option may be to sparsify the networks to make them computationally tractable while still maintaining the fidelity of the social network algorithms.
We present a variety of methods for sparsifying a network based on linear regression and linear algebraic sampling for graph reconstruction. We compare the methods against one another with respect to clustering. Specifically, given a graph G, we sample the columns of its adjacency matrix and reconstruct the remaining columns using only those sampled columns to obtain Ĝ, the reconstructed approximation of G. We then perform clustering on G and Ĝ to get two sets of clusters and compute their modularity, fitness and centrality. Our thorough experimentation reveals that graphs reconstructed through our methodology preserve (in some cases, even improve) community structure while being orders of magnitude more efficient both in storage and computation. We show similar results if the target is prominence of nodes rather than clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Achlioptas, D., McSherry, F.: Fast computation of low-rank matrix approximations. JACM (2007) 298 Kshiteesh Hegde, Malik Magdon-Ismail, Boleslaw Szymanski and Konstantin Kuzmin
Adamic, L.A., Glance, N.: The political blogosphere and the 2004 us election: divided they blog. Int. Workshop on Link discovery (2005)
Arora, S., Hazan, E., Kale, S.: A fast random sampling algorithm for sparsifying matrices. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (2006)
Bader, D.A., Kintali, S., Madduri, K., Mihail, M.: Approximating betweenness centrality. Algorithms and Models for the Web-Graph (2007)
Boutsidis, C., Drineas, P., Magdon-Ismail, M.: Near-optimal column-based matrix reconstruction. SICOMP (2014)
Brandes, U.: A faster algorithm for betweenness centrality. J. of Math. Sociology (2001)
Chen, M., Nguyen, T., Szymanski, B.K.: A new metric for quality of network community structure. HUMAN (2013)
Deshpande, A., Vempala, S.: Adaptive sampling and fast low-rank matrix approximation. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (2006)
Fortunato, S.: Community detection in graphs. Physics Reports (2010)
Gaiteri, C., Chen, M., Szymanski, B., Kuzmin, K., Xie, J., Lee, C., Blanche, T., Neto, E.C., Huang, S.C., Grabowski, T., et al.: Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering. Scientific Reports (2015)
Girvan, M., Newman, M.E.: Community structure in social and biological networks. PNAS (2002)
Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. TWEB (2007)
Leskovec, J., Faloutsos, C.: Sampling from large graphs. ACM SIGKDD (2006)
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. WWW (2010)
Madduri, K., Ediger, D., Jiang, K., Bader, D., Chavarria-Miranda, D.: A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. IPDPS (2009)
Mahoney, M.W.: Randomized algorithms for matrices and data. Foundations and TrendsR in Machine Learning (2011)
Mahoney, M.W., Drineas, P.: CUR matrix decompositions for improved data analysis. PNAS (2009)
Newman, M.E.: Modularity and community structure in networks. PNAS (2006)
Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. PRE (2004)
Potamias, M., Bonchi, F., Castillo, C., Gionis, A.: Fast shortest path distance estimation in large networks. CIKM (2009)
Satuluri, V., Parthasarathy, S., Ruan, Y.: Local graph sparsification for scalable clustering. SIGMOD (2011)
Spielman, D.A., Srivastava, N.: Graph sparsification by effective resistances. SICOMP (2011)
Wang, T., Chen, Y., Zhang, Z., Xu, T., Jin, L., Hui, P., Deng, B., Li, X.: Understanding graph sampling algorithms for social network analysis. ICDCSW (2011)
Yang, J., Chen, Y.: Fast computing betweenness centrality with virtual nodes on large sparse networks. PloS (2011)
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems (2015)
Zachary, W.W.: An information flow model for conflict and fission in small groups. JSTOR (1977)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Hegde, K., Magdon-Ismail, M., Szymanski, B., Kuzmin, K. (2017). Clustering, Prominence and Social Network Analysis on Incomplete Networks. In: Cherifi, H., Gaito, S., Quattrociocchi, W., Sala, A. (eds) Complex Networks & Their Applications V. COMPLEX NETWORKS 2016 2016. Studies in Computational Intelligence, vol 693. Springer, Cham. https://doi.org/10.1007/978-3-319-50901-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-50901-3_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50900-6
Online ISBN: 978-3-319-50901-3
eBook Packages: EngineeringEngineering (R0)