Skip to main content

Clustering, Prominence and Social Network Analysis on Incomplete Networks

  • Conference paper
  • First Online:
Complex Networks & Their Applications V (COMPLEX NETWORKS 2016 2016)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 693))

Included in the following conference series:

  • 2793 Accesses

Abstract

Social networks are a source of large scale graphs. We study how social network algorithms behave on sparsified versions of such networks with two motivations in mind:

  1. 1.

    In practice, it is challenging to collect, store and process the entire often constantly growing network, so it is important to understand how algorithms behave on incomplete views of a network.

  2. 2.

    Even if one has the full network, algorithms may be infeasible at such large scale, and the only option may be to sparsify the networks to make them computationally tractable while still maintaining the fidelity of the social network algorithms.

We present a variety of methods for sparsifying a network based on linear regression and linear algebraic sampling for graph reconstruction. We compare the methods against one another with respect to clustering. Specifically, given a graph G, we sample the columns of its adjacency matrix and reconstruct the remaining columns using only those sampled columns to obtain Ĝ, the reconstructed approximation of G. We then perform clustering on G and Ĝ to get two sets of clusters and compute their modularity, fitness and centrality. Our thorough experimentation reveals that graphs reconstructed through our methodology preserve (in some cases, even improve) community structure while being orders of magnitude more efficient both in storage and computation. We show similar results if the target is prominence of nodes rather than clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Achlioptas, D., McSherry, F.: Fast computation of low-rank matrix approximations. JACM (2007) 298 Kshiteesh Hegde, Malik Magdon-Ismail, Boleslaw Szymanski and Konstantin Kuzmin

    Google Scholar 

  2. Adamic, L.A., Glance, N.: The political blogosphere and the 2004 us election: divided they blog. Int. Workshop on Link discovery (2005)

    Google Scholar 

  3. Arora, S., Hazan, E., Kale, S.: A fast random sampling algorithm for sparsifying matrices. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (2006)

    Google Scholar 

  4. Bader, D.A., Kintali, S., Madduri, K., Mihail, M.: Approximating betweenness centrality. Algorithms and Models for the Web-Graph (2007)

    Google Scholar 

  5. Boutsidis, C., Drineas, P., Magdon-Ismail, M.: Near-optimal column-based matrix reconstruction. SICOMP (2014)

    Google Scholar 

  6. Brandes, U.: A faster algorithm for betweenness centrality. J. of Math. Sociology (2001)

    Google Scholar 

  7. Chen, M., Nguyen, T., Szymanski, B.K.: A new metric for quality of network community structure. HUMAN (2013)

    Google Scholar 

  8. Deshpande, A., Vempala, S.: Adaptive sampling and fast low-rank matrix approximation. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (2006)

    Google Scholar 

  9. Fortunato, S.: Community detection in graphs. Physics Reports (2010)

    Google Scholar 

  10. Gaiteri, C., Chen, M., Szymanski, B., Kuzmin, K., Xie, J., Lee, C., Blanche, T., Neto, E.C., Huang, S.C., Grabowski, T., et al.: Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering. Scientific Reports (2015)

    Google Scholar 

  11. Girvan, M., Newman, M.E.: Community structure in social and biological networks. PNAS (2002)

    Google Scholar 

  12. Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. TWEB (2007)

    Google Scholar 

  13. Leskovec, J., Faloutsos, C.: Sampling from large graphs. ACM SIGKDD (2006)

    Google Scholar 

  14. Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. WWW (2010)

    Google Scholar 

  15. Madduri, K., Ediger, D., Jiang, K., Bader, D., Chavarria-Miranda, D.: A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. IPDPS (2009)

    Google Scholar 

  16. Mahoney, M.W.: Randomized algorithms for matrices and data. Foundations and TrendsR in Machine Learning (2011)

    Google Scholar 

  17. Mahoney, M.W., Drineas, P.: CUR matrix decompositions for improved data analysis. PNAS (2009)

    Google Scholar 

  18. Newman, M.E.: Modularity and community structure in networks. PNAS (2006)

    Google Scholar 

  19. Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. PRE (2004)

    Google Scholar 

  20. Potamias, M., Bonchi, F., Castillo, C., Gionis, A.: Fast shortest path distance estimation in large networks. CIKM (2009)

    Google Scholar 

  21. Satuluri, V., Parthasarathy, S., Ruan, Y.: Local graph sparsification for scalable clustering. SIGMOD (2011)

    Google Scholar 

  22. Spielman, D.A., Srivastava, N.: Graph sparsification by effective resistances. SICOMP (2011)

    Google Scholar 

  23. Wang, T., Chen, Y., Zhang, Z., Xu, T., Jin, L., Hui, P., Deng, B., Li, X.: Understanding graph sampling algorithms for social network analysis. ICDCSW (2011)

    Google Scholar 

  24. Yang, J., Chen, Y.: Fast computing betweenness centrality with virtual nodes on large sparse networks. PloS (2011)

    Google Scholar 

  25. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems (2015)

    Google Scholar 

  26. Zachary, W.W.: An information flow model for conflict and fission in small groups. JSTOR (1977)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kshiteesh Hegde .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Hegde, K., Magdon-Ismail, M., Szymanski, B., Kuzmin, K. (2017). Clustering, Prominence and Social Network Analysis on Incomplete Networks. In: Cherifi, H., Gaito, S., Quattrociocchi, W., Sala, A. (eds) Complex Networks & Their Applications V. COMPLEX NETWORKS 2016 2016. Studies in Computational Intelligence, vol 693. Springer, Cham. https://doi.org/10.1007/978-3-319-50901-3_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50901-3_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50900-6

  • Online ISBN: 978-3-319-50901-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics