Clustering, Prominence and Social Network Analysis on Incomplete Networks

Hegde, Kshiteesh; Magdon-Ismail, Malik; Szymanski, Boleslaw; Kuzmin, Konstantin

doi:10.1007/978-3-319-50901-3_23

Kshiteesh Hegde⁶,
Malik Magdon-Ismail⁶,
Boleslaw Szymanski⁶ &
…
Konstantin Kuzmin⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 693))

Included in the following conference series:

International Workshop on Complex Networks and their Applications

2793 Accesses

Abstract

Social networks are a source of large scale graphs. We study how social network algorithms behave on sparsified versions of such networks with two motivations in mind:

1.
In practice, it is challenging to collect, store and process the entire often constantly growing network, so it is important to understand how algorithms behave on incomplete views of a network.
2.
Even if one has the full network, algorithms may be infeasible at such large scale, and the only option may be to sparsify the networks to make them computationally tractable while still maintaining the fidelity of the social network algorithms.

We present a variety of methods for sparsifying a network based on linear regression and linear algebraic sampling for graph reconstruction. We compare the methods against one another with respect to clustering. Specifically, given a graph G, we sample the columns of its adjacency matrix and reconstruct the remaining columns using only those sampled columns to obtain Ĝ, the reconstructed approximation of G. We then perform clustering on G and Ĝ to get two sets of clusters and compute their modularity, fitness and centrality. Our thorough experimentation reveals that graphs reconstructed through our methodology preserve (in some cases, even improve) community structure while being orders of magnitude more efficient both in storage and computation. We show similar results if the target is prominence of nodes rather than clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Structure-preserving sparsification methods for social networks

Article 29 April 2016

Compressing Networks with Super Nodes

Article Open access 18 July 2018

Sampling on Networks: Estimating Eigenvector Centrality on Incomplete Networks

References

Achlioptas, D., McSherry, F.: Fast computation of low-rank matrix approximations. JACM (2007) 298 Kshiteesh Hegde, Malik Magdon-Ismail, Boleslaw Szymanski and Konstantin Kuzmin
Google Scholar
Adamic, L.A., Glance, N.: The political blogosphere and the 2004 us election: divided they blog. Int. Workshop on Link discovery (2005)
Google Scholar
Arora, S., Hazan, E., Kale, S.: A fast random sampling algorithm for sparsifying matrices. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (2006)
Google Scholar
Bader, D.A., Kintali, S., Madduri, K., Mihail, M.: Approximating betweenness centrality. Algorithms and Models for the Web-Graph (2007)
Google Scholar
Boutsidis, C., Drineas, P., Magdon-Ismail, M.: Near-optimal column-based matrix reconstruction. SICOMP (2014)
Google Scholar
Brandes, U.: A faster algorithm for betweenness centrality. J. of Math. Sociology (2001)
Google Scholar
Chen, M., Nguyen, T., Szymanski, B.K.: A new metric for quality of network community structure. HUMAN (2013)
Google Scholar
Deshpande, A., Vempala, S.: Adaptive sampling and fast low-rank matrix approximation. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (2006)
Google Scholar
Fortunato, S.: Community detection in graphs. Physics Reports (2010)
Google Scholar
Gaiteri, C., Chen, M., Szymanski, B., Kuzmin, K., Xie, J., Lee, C., Blanche, T., Neto, E.C., Huang, S.C., Grabowski, T., et al.: Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering. Scientific Reports (2015)
Google Scholar
Girvan, M., Newman, M.E.: Community structure in social and biological networks. PNAS (2002)
Google Scholar
Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. TWEB (2007)
Google Scholar
Leskovec, J., Faloutsos, C.: Sampling from large graphs. ACM SIGKDD (2006)
Google Scholar
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. WWW (2010)
Google Scholar
Madduri, K., Ediger, D., Jiang, K., Bader, D., Chavarria-Miranda, D.: A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. IPDPS (2009)
Google Scholar
Mahoney, M.W.: Randomized algorithms for matrices and data. Foundations and TrendsR in Machine Learning (2011)
Google Scholar
Mahoney, M.W., Drineas, P.: CUR matrix decompositions for improved data analysis. PNAS (2009)
Google Scholar
Newman, M.E.: Modularity and community structure in networks. PNAS (2006)
Google Scholar
Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. PRE (2004)
Google Scholar
Potamias, M., Bonchi, F., Castillo, C., Gionis, A.: Fast shortest path distance estimation in large networks. CIKM (2009)
Google Scholar
Satuluri, V., Parthasarathy, S., Ruan, Y.: Local graph sparsification for scalable clustering. SIGMOD (2011)
Google Scholar
Spielman, D.A., Srivastava, N.: Graph sparsification by effective resistances. SICOMP (2011)
Google Scholar
Wang, T., Chen, Y., Zhang, Z., Xu, T., Jin, L., Hui, P., Deng, B., Li, X.: Understanding graph sampling algorithms for social network analysis. ICDCSW (2011)
Google Scholar
Yang, J., Chen, Y.: Fast computing betweenness centrality with virtual nodes on large sparse networks. PloS (2011)
Google Scholar
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems (2015)
Google Scholar
Zachary, W.W.: An information flow model for conflict and fission in small groups. JSTOR (1977)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
Kshiteesh Hegde, Malik Magdon-Ismail, Boleslaw Szymanski & Konstantin Kuzmin

Authors

Kshiteesh Hegde
View author publications
You can also search for this author in PubMed Google Scholar
Malik Magdon-Ismail
View author publications
You can also search for this author in PubMed Google Scholar
Boleslaw Szymanski
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Kuzmin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kshiteesh Hegde .

Editor information

Editors and Affiliations

University of Burgundy , Dijon, France
Hocine Cherifi
Computer Science Department, University of Milan Computer Science Department, Milan, Italy
Sabrina Gaito
IMT Lucca , Lucca, Italy
Walter Quattrociocchi
Blanchardstown Business and Tech Park, Bell Labs-Nokia Blanchardstown Business and Tech Park, Blanchardstown, Ireland
Alessandra Sala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hegde, K., Magdon-Ismail, M., Szymanski, B., Kuzmin, K. (2017). Clustering, Prominence and Social Network Analysis on Incomplete Networks. In: Cherifi, H., Gaito, S., Quattrociocchi, W., Sala, A. (eds) Complex Networks & Their Applications V. COMPLEX NETWORKS 2016 2016. Studies in Computational Intelligence, vol 693. Springer, Cham. https://doi.org/10.1007/978-3-319-50901-3_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-50901-3_23
Published: 30 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50900-6
Online ISBN: 978-3-319-50901-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics