Abstract
Four major factors govern the intricacies of community extraction in networks: (1) the literature offers a multitude of disparate community detection algorithms whose output exhibits high structural variability across the collection, (2) communities identified by algorithms may differ structurally from real communities that arise in practice, (3) there is no consensus characterizing how to discriminate communities from noncommunities, and (4) the application domain includes a wide variety of networks of fundamentally different natures. In this article, we present a class separability framework to tackle these challenges through a comprehensive analysis of community properties. Our approach enables the assessment of the structural dissimilarity among the output of multiple community detection algorithms and between the output of algorithms and communities that arise in practice. In addition, our method provides us with a way to organize the vast collection of community detection algorithms by grouping those that behave similarly. Finally, we identify the most discriminative graph-theoretical properties of community signature and the small subset of properties that account for most of the biases of the different community detection algorithms. We illustrate our approach with an experimental analysis, which reveals nuances of the structure of real and extracted communities. In our experiments, we furnish our framework with the output of 10 different community detection procedures, representative of categories of popular algorithms available in the literature, applied to a diverse collection of large-scale real network datasets whose domains span biology, online shopping, and social systems. We also analyze communities identified by annotations that accompany the data, which reflect exemplar communities in various domain. We characterize these communities using a broad spectrum of community properties to produce the different structural classes. As our experiments show that community structure is not a universal concept, our framework enables an informed choice of the most suitable community detection method for identifying communities of a specific type in a given network and allows for a comparison of existing community detection algorithms while guiding the design of new ones.
- Bruno Abrahao, Sucheta Soundarajan, John Hopcroft, and Robert Kleinberg. 2012. On the separability of structural classes of communities. In Proc. of the 18th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- David W. Aha, Dennis Kibler, and Marc K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6, 1, 37--66. Google ScholarDigital Library
- Yong-Yeol Ahn, James P. Bagrow, and Sune Lehmann. 2010. Link communities reveal multiscale complexity in networks. Nature 466, 7307, 761--764.Google Scholar
- Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proc. of the 12th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- J. P. Bagrow and E. M. Bollt. 2005. A local method for detecting communities. Physical Review E 72, 046108.Google ScholarCross Ref
- Brian Ball, Brian Karrer, and M. E. J. Newman. 2011. Efficient and principled method for detecting communities in networks. Physical Review E 84, 3, 036103.Google ScholarCross Ref
- Jeffrey Baumes, Mark Goldberg, and Malik Magdon-Ismail. 2005. Efficient identification of overlapping communities. In Proc. of the 2005 IEEE Intl. Conf. on Intelligence and Security Informatics. 27--36. Google ScholarDigital Library
- Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 10, P10008+.Google Scholar
- Nitesh V. Chawla. 2005. Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook. Springer, 853--867.Google Scholar
- Fan R. K. Chung. 1996. Spectral Graph Theory. American Mathematical Society.Google ScholarDigital Library
- Aaron Clauset, M. E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical Review E 70, 6, 066111+.Google ScholarCross Ref
- Michele Coscia, Fosca Giannotti, and Dino Pedreschi. 2011. A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining 4, 5, 512--546. Google ScholarDigital Library
- Stijn Van Dongen. 2008. Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications 30, 1, 121--141. Google ScholarDigital Library
- T. S. Evans and R. Lambiotte. 2009. Line graphs, link partitions and overlapping communities. Physical Review E 80, 016105.Google ScholarCross Ref
- N. Fatemi-Ghomi, P. L. Palmer, and M. Petrou. 1999. The two-point correlation function: A measure of interclass separability. Journal of Mathematical Imaging and Vision 10, 1, 7--25. Google ScholarDigital Library
- Santo Fortunato. 2010. Community detection in graphs. Physics Reports 486, 75--174.Google ScholarCross Ref
- M. Girvan and M. Newman. 2002a. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12, 7821--7826.Google ScholarCross Ref
- M. Girvan and M. E. J. Newman. 2002b. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12, 7821--7826.Google ScholarCross Ref
- Steve Gregory. 2008. A fast algorithm to find overlapping communities in networks. In Proc. of the 2008 European Conf. on Machine Learning and Knowledge Discovery in Databases: Part I. 408--423. Google ScholarDigital Library
- Mark A. Hall. 1999. Correlation-Based Feature Subset Selection for Machine Learning. Ph.D. Dissertation. Department of Computer Science, University of Waikato.Google Scholar
- Jake M. Hofman and Chris H. Wiggins. 2008. Bayesian approach to network modularity. Physical Review Letters 100, 25, 258701+.Google ScholarCross Ref
- S. Hoory, N. Linial, and A. Wigderson. 2006. Expander graphs and their applications. Bulletin of the American Mathematical Society 43, 4, 439.Google ScholarCross Ref
- George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20, 1, 359--392. Google ScholarDigital Library
- B. W. Kernighan and S. Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal 49, 1, 291--307.Google ScholarCross Ref
- Christian Komusiewicz, Falk Huffner, Hannes Moser, and Rolf Niedermeier. 2009. Isolation concepts for efficiently enumerating dense subgraphs. Theoretical Computer Science 410, 38a-40, 3640--3654. Google ScholarDigital Library
- Andrea Lancichinetti and Santo Fortunato. 2009. Community detection algorithms: A comparative analysis. Physical Review E 80, 056117.Google ScholarCross Ref
- Andrea Lancichinetti, Santo Fortunato, and Janos Kertesz. 2009. Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics 11, 3, 033015.Google ScholarCross Ref
- Sune Lehmann, Martin Schwartz, and Lars K. Hansen. 2008. Biclique communities. Physical Review E 78, 1, 016108+.Google ScholarCross Ref
- Jure Leskovec, Lada Adamic, and Bernardo Huberman. 2006. The dynamics of viral marketing. In Proc. of the 7th ACM Conf. on Electronic Commerce. Google ScholarDigital Library
- Jure Leskovec, Kevin Lang, Anirban Dasgupta, and Michael Mahoney. 2008. Statistical properties of community structure in large social and information networks. In Proc. of the 17th Intl. Conf. on World Wide Web. Google ScholarDigital Library
- Jure Leskovec, Kevin Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proc. of the 19th Intl. Conf. on World Wide Web. Google ScholarDigital Library
- Yu-Ru Lin, Jimeng Sun, Paul Castro, Ravi Konuru, Hari Sundaram, and Aisling Kelliher. 2009. MetaFac: Community discovery via relational hypergraph factorization. In Proc. of the 15th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining. 527--536. Google ScholarDigital Library
- Russell Lyons and Yuval Peres. 2012. Probability on Trees and Networks. Cambridge University Press.Google Scholar
- Nina Mishra, Robert Schreiber, Isabelle Stanton, and Robert Tarjan. 2008. Finding strongly knit clusters in social networks. Internet Mathematics 5, 1, 155--174.Google ScholarCross Ref
- Alan Mislove, Bimal Viswanath, Krishna Gummadi, and Peter Druschel. 2010. You are who you know: Inferring user profiles in online social networks. In Proc. of the 3rd ACM Intl. Conf. on Web Search and Data Mining. Google ScholarDigital Library
- M. E. J. Newman. 2004. Detecting community structure in networks. European Physical Journal B 38, 2, 321--330.Google ScholarCross Ref
- M. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23, 8577--8582.Google ScholarCross Ref
- Gergely Palla, Imre Derenyi, Illes Farkas, and Tamas Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 7043, 814--818.Google Scholar
- Daniel Park, Rohit Singh, Michael Baym, Chung-Shou Liao, and Bonnie Berger. 2011. IsoBase: A database of functionally related proteins across PPI networks. Nucleic Acids Research 39, suppl 1, D295--D300.Google ScholarCross Ref
- Pascal Pons and Matthieu Latapy. 2006. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications 10, 2, 191--218.Google ScholarCross Ref
- Martin Rosvall and Carl Bergstrom. 2011. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE 6, 4, e18209.Google ScholarCross Ref
- Satu Elisa Schaeffer. 2005. Stochastic local clustering for massive graphs. In Proc. of the 9th Pacific-Asia Conf. on Advances in Knowledge Discovery and Data Mining. 354--360. Google ScholarDigital Library
- Huawei Shen, Xueqi Cheng, Kai Cai, and Mao-Bin Hu. 2009. Detect overlapping and hierarchical community structure in networks. Physica A: Statistical Mechanics and Its Applications 388, 1706--1712.Google ScholarCross Ref
- Karen Stephenson and Marvin Zelen. 1989. Rethinking centrality: Methods and examples. Social Networks 11, 1, 1--37.Google ScholarCross Ref
- Sergios Theodoridis and Konstantinos Koutroumbas. 2008. Pattern Recognition (4th ed.). Academic Press. Google ScholarDigital Library
- Vladimir N. Vapnik. 1998. Statistical Learning Theory. Wiley-Interscience.Google ScholarCross Ref
- Fang Wei, Weining Qian, Chen Wang, and Aoying Zhou. 2009. Detecting overlapping community structures in networks. World Wide Web 12, 2, 235--261. Google ScholarDigital Library
- E. Weinan, Tiejun Li, and Eric Vanden-Eijnden. 2008. Optimal partition and effective dynamics of complex networks. Proceedings of the National Academy of Sciences 105, 23, 7907--7912.Google ScholarCross Ref
- Jaewon Yang and Jure Leskovec. 2012. Defining and evaluating network communities based on ground-truth. In 12th IEEE Intl. Conf. on Data Mining. Google ScholarDigital Library
Index Terms
- A separability framework for analyzing community structure
Recommendations
On the separability of structural classes of communities
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningThree major factors govern the intricacies of community extraction in networks: (1) the application domain includes a wide variety of networks of fundamentally different natures, (2) the literature offers a multitude of disparate community detection ...
Analyzing Two Participation Strategies in an Undergraduate Course Community
Proceedings of the 19th International Conference on Collaboration and Technology - Volume 8224Nowadays, information systems, and more particularly, learning support systems, tend to include social interaction features in their design. These features generally aim to sustain the activities of partially virtual communities and help extend the ...
On Diameter Based Community Structure Identification in Networks
ICDCN '17: Proceedings of the 18th International Conference on Distributed Computing and NetworkingSeveral community detection algorithms for large scale networks have been reported in the literature so far. But to the best of our knowledge none of these algorithms consider the diameter of the community as a key parameter. In this paper, we propose a ...
Comments