skip to main content
research-article

A separability framework for analyzing community structure

Published: 01 February 2014 Publication History

Abstract

Four major factors govern the intricacies of community extraction in networks: (1) the literature offers a multitude of disparate community detection algorithms whose output exhibits high structural variability across the collection, (2) communities identified by algorithms may differ structurally from real communities that arise in practice, (3) there is no consensus characterizing how to discriminate communities from noncommunities, and (4) the application domain includes a wide variety of networks of fundamentally different natures. In this article, we present a class separability framework to tackle these challenges through a comprehensive analysis of community properties. Our approach enables the assessment of the structural dissimilarity among the output of multiple community detection algorithms and between the output of algorithms and communities that arise in practice. In addition, our method provides us with a way to organize the vast collection of community detection algorithms by grouping those that behave similarly. Finally, we identify the most discriminative graph-theoretical properties of community signature and the small subset of properties that account for most of the biases of the different community detection algorithms. We illustrate our approach with an experimental analysis, which reveals nuances of the structure of real and extracted communities. In our experiments, we furnish our framework with the output of 10 different community detection procedures, representative of categories of popular algorithms available in the literature, applied to a diverse collection of large-scale real network datasets whose domains span biology, online shopping, and social systems. We also analyze communities identified by annotations that accompany the data, which reflect exemplar communities in various domain. We characterize these communities using a broad spectrum of community properties to produce the different structural classes. As our experiments show that community structure is not a universal concept, our framework enables an informed choice of the most suitable community detection method for identifying communities of a specific type in a given network and allows for a comparison of existing community detection algorithms while guiding the design of new ones.

References

[1]
Bruno Abrahao, Sucheta Soundarajan, John Hopcroft, and Robert Kleinberg. 2012. On the separability of structural classes of communities. In Proc. of the 18th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining.
[2]
David W. Aha, Dennis Kibler, and Marc K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6, 1, 37--66.
[3]
Yong-Yeol Ahn, James P. Bagrow, and Sune Lehmann. 2010. Link communities reveal multiscale complexity in networks. Nature 466, 7307, 761--764.
[4]
Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proc. of the 12th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining.
[5]
J. P. Bagrow and E. M. Bollt. 2005. A local method for detecting communities. Physical Review E 72, 046108.
[6]
Brian Ball, Brian Karrer, and M. E. J. Newman. 2011. Efficient and principled method for detecting communities in networks. Physical Review E 84, 3, 036103.
[7]
Jeffrey Baumes, Mark Goldberg, and Malik Magdon-Ismail. 2005. Efficient identification of overlapping communities. In Proc. of the 2005 IEEE Intl. Conf. on Intelligence and Security Informatics. 27--36.
[8]
Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 10, P10008+.
[9]
Nitesh V. Chawla. 2005. Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook. Springer, 853--867.
[10]
Fan R. K. Chung. 1996. Spectral Graph Theory. American Mathematical Society.
[11]
Aaron Clauset, M. E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical Review E 70, 6, 066111+.
[12]
Michele Coscia, Fosca Giannotti, and Dino Pedreschi. 2011. A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining 4, 5, 512--546.
[13]
Stijn Van Dongen. 2008. Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications 30, 1, 121--141.
[14]
T. S. Evans and R. Lambiotte. 2009. Line graphs, link partitions and overlapping communities. Physical Review E 80, 016105.
[15]
N. Fatemi-Ghomi, P. L. Palmer, and M. Petrou. 1999. The two-point correlation function: A measure of interclass separability. Journal of Mathematical Imaging and Vision 10, 1, 7--25.
[16]
Santo Fortunato. 2010. Community detection in graphs. Physics Reports 486, 75--174.
[17]
M. Girvan and M. Newman. 2002a. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12, 7821--7826.
[18]
M. Girvan and M. E. J. Newman. 2002b. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12, 7821--7826.
[19]
Steve Gregory. 2008. A fast algorithm to find overlapping communities in networks. In Proc. of the 2008 European Conf. on Machine Learning and Knowledge Discovery in Databases: Part I. 408--423.
[20]
Mark A. Hall. 1999. Correlation-Based Feature Subset Selection for Machine Learning. Ph.D. Dissertation. Department of Computer Science, University of Waikato.
[21]
Jake M. Hofman and Chris H. Wiggins. 2008. Bayesian approach to network modularity. Physical Review Letters 100, 25, 258701+.
[22]
S. Hoory, N. Linial, and A. Wigderson. 2006. Expander graphs and their applications. Bulletin of the American Mathematical Society 43, 4, 439.
[23]
George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20, 1, 359--392.
[24]
B. W. Kernighan and S. Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal 49, 1, 291--307.
[25]
Christian Komusiewicz, Falk Huffner, Hannes Moser, and Rolf Niedermeier. 2009. Isolation concepts for efficiently enumerating dense subgraphs. Theoretical Computer Science 410, 38a-40, 3640--3654.
[26]
Andrea Lancichinetti and Santo Fortunato. 2009. Community detection algorithms: A comparative analysis. Physical Review E 80, 056117.
[27]
Andrea Lancichinetti, Santo Fortunato, and Janos Kertesz. 2009. Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics 11, 3, 033015.
[28]
Sune Lehmann, Martin Schwartz, and Lars K. Hansen. 2008. Biclique communities. Physical Review E 78, 1, 016108+.
[29]
Jure Leskovec, Lada Adamic, and Bernardo Huberman. 2006. The dynamics of viral marketing. In Proc. of the 7th ACM Conf. on Electronic Commerce.
[30]
Jure Leskovec, Kevin Lang, Anirban Dasgupta, and Michael Mahoney. 2008. Statistical properties of community structure in large social and information networks. In Proc. of the 17th Intl. Conf. on World Wide Web.
[31]
Jure Leskovec, Kevin Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proc. of the 19th Intl. Conf. on World Wide Web.
[32]
Yu-Ru Lin, Jimeng Sun, Paul Castro, Ravi Konuru, Hari Sundaram, and Aisling Kelliher. 2009. MetaFac: Community discovery via relational hypergraph factorization. In Proc. of the 15th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining. 527--536.
[33]
Russell Lyons and Yuval Peres. 2012. Probability on Trees and Networks. Cambridge University Press.
[34]
Nina Mishra, Robert Schreiber, Isabelle Stanton, and Robert Tarjan. 2008. Finding strongly knit clusters in social networks. Internet Mathematics 5, 1, 155--174.
[35]
Alan Mislove, Bimal Viswanath, Krishna Gummadi, and Peter Druschel. 2010. You are who you know: Inferring user profiles in online social networks. In Proc. of the 3rd ACM Intl. Conf. on Web Search and Data Mining.
[36]
M. E. J. Newman. 2004. Detecting community structure in networks. European Physical Journal B 38, 2, 321--330.
[37]
M. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23, 8577--8582.
[38]
Gergely Palla, Imre Derenyi, Illes Farkas, and Tamas Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 7043, 814--818.
[39]
Daniel Park, Rohit Singh, Michael Baym, Chung-Shou Liao, and Bonnie Berger. 2011. IsoBase: A database of functionally related proteins across PPI networks. Nucleic Acids Research 39, suppl 1, D295--D300.
[40]
Pascal Pons and Matthieu Latapy. 2006. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications 10, 2, 191--218.
[41]
Martin Rosvall and Carl Bergstrom. 2011. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE 6, 4, e18209.
[42]
Satu Elisa Schaeffer. 2005. Stochastic local clustering for massive graphs. In Proc. of the 9th Pacific-Asia Conf. on Advances in Knowledge Discovery and Data Mining. 354--360.
[43]
Huawei Shen, Xueqi Cheng, Kai Cai, and Mao-Bin Hu. 2009. Detect overlapping and hierarchical community structure in networks. Physica A: Statistical Mechanics and Its Applications 388, 1706--1712.
[44]
Karen Stephenson and Marvin Zelen. 1989. Rethinking centrality: Methods and examples. Social Networks 11, 1, 1--37.
[45]
Sergios Theodoridis and Konstantinos Koutroumbas. 2008. Pattern Recognition (4th ed.). Academic Press.
[46]
Vladimir N. Vapnik. 1998. Statistical Learning Theory. Wiley-Interscience.
[47]
Fang Wei, Weining Qian, Chen Wang, and Aoying Zhou. 2009. Detecting overlapping community structures in networks. World Wide Web 12, 2, 235--261.
[48]
E. Weinan, Tiejun Li, and Eric Vanden-Eijnden. 2008. Optimal partition and effective dynamics of complex networks. Proceedings of the National Academy of Sciences 105, 23, 7907--7912.
[49]
Jaewon Yang and Jure Leskovec. 2012. Defining and evaluating network communities based on ground-truth. In 12th IEEE Intl. Conf. on Data Mining.

Cited By

View all
  • (2021)An Efficient Method Based on Label Propagation for Overlapping Community Detection2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD49262.2021.9437855(168-173)Online publication date: 5-May-2021
  • (2021)Overlapping community detection by constrained personalized PageRankExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.114682173:COnline publication date: 1-Jul-2021
  • (2019)Overlapping Community Detection in Bipartite Networks using a Micro-bipartite Network Model: Bi-EgoNetJournal of Intelligent & Fuzzy Systems10.3233/JIFS-190320(1-12)Online publication date: 16-Oct-2019
  • Show More Cited By

Index Terms

  1. A separability framework for analyzing community structure

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 8, Issue 1
    Casin special issue
    February 2014
    157 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/2582178
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 February 2014
    Accepted: 01 August 2013
    Revised: 01 May 2013
    Received: 01 October 2012
    Published in TKDD Volume 8, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Class separability
    2. community structure
    3. detection algorithms
    4. networks

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)An Efficient Method Based on Label Propagation for Overlapping Community Detection2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD49262.2021.9437855(168-173)Online publication date: 5-May-2021
    • (2021)Overlapping community detection by constrained personalized PageRankExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.114682173:COnline publication date: 1-Jul-2021
    • (2019)Overlapping Community Detection in Bipartite Networks using a Micro-bipartite Network Model: Bi-EgoNetJournal of Intelligent & Fuzzy Systems10.3233/JIFS-190320(1-12)Online publication date: 16-Oct-2019
    • (2019)Discovering overlapping communities in ego-nets using friend intimacyJournal of Intelligent & Fuzzy Systems10.3233/JIFS-172242(1-9)Online publication date: 21-May-2019
    • (2019)Krylov Subspace Approximation for Local Community Detection in Large NetworksACM Transactions on Knowledge Discovery from Data10.1145/334070813:5(1-30)Online publication date: 24-Sep-2019
    • (2019)Is a Single Embedding Enough? Learning Node Representations that Capture Multiple Social ContextsThe World Wide Web Conference10.1145/3308558.3313660(394-404)Online publication date: 13-May-2019
    • (2019)OCDAD: An Overlapping Community Detecting Algorithm using Attention Degree in Directed Ex-EgoNet2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00090(442-448)Online publication date: Aug-2019
    • (2018)Local Spectral Clustering for Overlapping Community DetectionACM Transactions on Knowledge Discovery from Data10.1145/310637012:2(1-27)Online publication date: 10-Jan-2018
    • (2017)Ego-Splitting FrameworkProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/3097983.3098054(145-154)Online publication date: 13-Aug-2017
    • (2016)In a World That CountsProceedings of the 25th International Conference on World Wide Web10.1145/2872427.2882972(111-120)Online publication date: 11-Apr-2016
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media