skip to main content
research-article

A separability framework for analyzing community structure

Published:01 February 2014Publication History
Skip Abstract Section

Abstract

Four major factors govern the intricacies of community extraction in networks: (1) the literature offers a multitude of disparate community detection algorithms whose output exhibits high structural variability across the collection, (2) communities identified by algorithms may differ structurally from real communities that arise in practice, (3) there is no consensus characterizing how to discriminate communities from noncommunities, and (4) the application domain includes a wide variety of networks of fundamentally different natures. In this article, we present a class separability framework to tackle these challenges through a comprehensive analysis of community properties. Our approach enables the assessment of the structural dissimilarity among the output of multiple community detection algorithms and between the output of algorithms and communities that arise in practice. In addition, our method provides us with a way to organize the vast collection of community detection algorithms by grouping those that behave similarly. Finally, we identify the most discriminative graph-theoretical properties of community signature and the small subset of properties that account for most of the biases of the different community detection algorithms. We illustrate our approach with an experimental analysis, which reveals nuances of the structure of real and extracted communities. In our experiments, we furnish our framework with the output of 10 different community detection procedures, representative of categories of popular algorithms available in the literature, applied to a diverse collection of large-scale real network datasets whose domains span biology, online shopping, and social systems. We also analyze communities identified by annotations that accompany the data, which reflect exemplar communities in various domain. We characterize these communities using a broad spectrum of community properties to produce the different structural classes. As our experiments show that community structure is not a universal concept, our framework enables an informed choice of the most suitable community detection method for identifying communities of a specific type in a given network and allows for a comparison of existing community detection algorithms while guiding the design of new ones.

References

  1. Bruno Abrahao, Sucheta Soundarajan, John Hopcroft, and Robert Kleinberg. 2012. On the separability of structural classes of communities. In Proc. of the 18th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. David W. Aha, Dennis Kibler, and Marc K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6, 1, 37--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yong-Yeol Ahn, James P. Bagrow, and Sune Lehmann. 2010. Link communities reveal multiscale complexity in networks. Nature 466, 7307, 761--764.Google ScholarGoogle Scholar
  4. Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proc. of the 12th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. P. Bagrow and E. M. Bollt. 2005. A local method for detecting communities. Physical Review E 72, 046108.Google ScholarGoogle ScholarCross RefCross Ref
  6. Brian Ball, Brian Karrer, and M. E. J. Newman. 2011. Efficient and principled method for detecting communities in networks. Physical Review E 84, 3, 036103.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jeffrey Baumes, Mark Goldberg, and Malik Magdon-Ismail. 2005. Efficient identification of overlapping communities. In Proc. of the 2005 IEEE Intl. Conf. on Intelligence and Security Informatics. 27--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 10, P10008+.Google ScholarGoogle Scholar
  9. Nitesh V. Chawla. 2005. Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook. Springer, 853--867.Google ScholarGoogle Scholar
  10. Fan R. K. Chung. 1996. Spectral Graph Theory. American Mathematical Society.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Aaron Clauset, M. E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical Review E 70, 6, 066111+.Google ScholarGoogle ScholarCross RefCross Ref
  12. Michele Coscia, Fosca Giannotti, and Dino Pedreschi. 2011. A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining 4, 5, 512--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Stijn Van Dongen. 2008. Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications 30, 1, 121--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. S. Evans and R. Lambiotte. 2009. Line graphs, link partitions and overlapping communities. Physical Review E 80, 016105.Google ScholarGoogle ScholarCross RefCross Ref
  15. N. Fatemi-Ghomi, P. L. Palmer, and M. Petrou. 1999. The two-point correlation function: A measure of interclass separability. Journal of Mathematical Imaging and Vision 10, 1, 7--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Santo Fortunato. 2010. Community detection in graphs. Physics Reports 486, 75--174.Google ScholarGoogle ScholarCross RefCross Ref
  17. M. Girvan and M. Newman. 2002a. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12, 7821--7826.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Girvan and M. E. J. Newman. 2002b. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12, 7821--7826.Google ScholarGoogle ScholarCross RefCross Ref
  19. Steve Gregory. 2008. A fast algorithm to find overlapping communities in networks. In Proc. of the 2008 European Conf. on Machine Learning and Knowledge Discovery in Databases: Part I. 408--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mark A. Hall. 1999. Correlation-Based Feature Subset Selection for Machine Learning. Ph.D. Dissertation. Department of Computer Science, University of Waikato.Google ScholarGoogle Scholar
  21. Jake M. Hofman and Chris H. Wiggins. 2008. Bayesian approach to network modularity. Physical Review Letters 100, 25, 258701+.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. Hoory, N. Linial, and A. Wigderson. 2006. Expander graphs and their applications. Bulletin of the American Mathematical Society 43, 4, 439.Google ScholarGoogle ScholarCross RefCross Ref
  23. George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20, 1, 359--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. W. Kernighan and S. Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal 49, 1, 291--307.Google ScholarGoogle ScholarCross RefCross Ref
  25. Christian Komusiewicz, Falk Huffner, Hannes Moser, and Rolf Niedermeier. 2009. Isolation concepts for efficiently enumerating dense subgraphs. Theoretical Computer Science 410, 38a-40, 3640--3654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Andrea Lancichinetti and Santo Fortunato. 2009. Community detection algorithms: A comparative analysis. Physical Review E 80, 056117.Google ScholarGoogle ScholarCross RefCross Ref
  27. Andrea Lancichinetti, Santo Fortunato, and Janos Kertesz. 2009. Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics 11, 3, 033015.Google ScholarGoogle ScholarCross RefCross Ref
  28. Sune Lehmann, Martin Schwartz, and Lars K. Hansen. 2008. Biclique communities. Physical Review E 78, 1, 016108+.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jure Leskovec, Lada Adamic, and Bernardo Huberman. 2006. The dynamics of viral marketing. In Proc. of the 7th ACM Conf. on Electronic Commerce. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jure Leskovec, Kevin Lang, Anirban Dasgupta, and Michael Mahoney. 2008. Statistical properties of community structure in large social and information networks. In Proc. of the 17th Intl. Conf. on World Wide Web. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jure Leskovec, Kevin Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proc. of the 19th Intl. Conf. on World Wide Web. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yu-Ru Lin, Jimeng Sun, Paul Castro, Ravi Konuru, Hari Sundaram, and Aisling Kelliher. 2009. MetaFac: Community discovery via relational hypergraph factorization. In Proc. of the 15th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining. 527--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Russell Lyons and Yuval Peres. 2012. Probability on Trees and Networks. Cambridge University Press.Google ScholarGoogle Scholar
  34. Nina Mishra, Robert Schreiber, Isabelle Stanton, and Robert Tarjan. 2008. Finding strongly knit clusters in social networks. Internet Mathematics 5, 1, 155--174.Google ScholarGoogle ScholarCross RefCross Ref
  35. Alan Mislove, Bimal Viswanath, Krishna Gummadi, and Peter Druschel. 2010. You are who you know: Inferring user profiles in online social networks. In Proc. of the 3rd ACM Intl. Conf. on Web Search and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. E. J. Newman. 2004. Detecting community structure in networks. European Physical Journal B 38, 2, 321--330.Google ScholarGoogle ScholarCross RefCross Ref
  37. M. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23, 8577--8582.Google ScholarGoogle ScholarCross RefCross Ref
  38. Gergely Palla, Imre Derenyi, Illes Farkas, and Tamas Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 7043, 814--818.Google ScholarGoogle Scholar
  39. Daniel Park, Rohit Singh, Michael Baym, Chung-Shou Liao, and Bonnie Berger. 2011. IsoBase: A database of functionally related proteins across PPI networks. Nucleic Acids Research 39, suppl 1, D295--D300.Google ScholarGoogle ScholarCross RefCross Ref
  40. Pascal Pons and Matthieu Latapy. 2006. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications 10, 2, 191--218.Google ScholarGoogle ScholarCross RefCross Ref
  41. Martin Rosvall and Carl Bergstrom. 2011. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE 6, 4, e18209.Google ScholarGoogle ScholarCross RefCross Ref
  42. Satu Elisa Schaeffer. 2005. Stochastic local clustering for massive graphs. In Proc. of the 9th Pacific-Asia Conf. on Advances in Knowledge Discovery and Data Mining. 354--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Huawei Shen, Xueqi Cheng, Kai Cai, and Mao-Bin Hu. 2009. Detect overlapping and hierarchical community structure in networks. Physica A: Statistical Mechanics and Its Applications 388, 1706--1712.Google ScholarGoogle ScholarCross RefCross Ref
  44. Karen Stephenson and Marvin Zelen. 1989. Rethinking centrality: Methods and examples. Social Networks 11, 1, 1--37.Google ScholarGoogle ScholarCross RefCross Ref
  45. Sergios Theodoridis and Konstantinos Koutroumbas. 2008. Pattern Recognition (4th ed.). Academic Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Vladimir N. Vapnik. 1998. Statistical Learning Theory. Wiley-Interscience.Google ScholarGoogle ScholarCross RefCross Ref
  47. Fang Wei, Weining Qian, Chen Wang, and Aoying Zhou. 2009. Detecting overlapping community structures in networks. World Wide Web 12, 2, 235--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. E. Weinan, Tiejun Li, and Eric Vanden-Eijnden. 2008. Optimal partition and effective dynamics of complex networks. Proceedings of the National Academy of Sciences 105, 23, 7907--7912.Google ScholarGoogle ScholarCross RefCross Ref
  49. Jaewon Yang and Jure Leskovec. 2012. Defining and evaluating network communities based on ground-truth. In 12th IEEE Intl. Conf. on Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A separability framework for analyzing community structure

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 8, Issue 1
      Casin special issue
      February 2014
      157 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2582178
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 February 2014
      • Accepted: 1 August 2013
      • Revised: 1 May 2013
      • Received: 1 October 2012
      Published in tkdd Volume 8, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader