skip to main content
research-article

Structure and Overlaps of Ground-Truth Communities in Networks

Published:30 April 2014Publication History
Skip Abstract Section

Abstract

One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Even though detection of such communities is of great interest, understanding the structure communities in large networks remains relatively limited. In particular, due to the unavailability of labeled ground-truth data, it was traditionally very hard to develop accurate models of network community structure.

Here we use six large social, collaboration, and information networks where nodes explicitly state their ground-truth community memberships. For example, nodes in social networks join into explicitly defined interest based groups, and we use such groups as explicitly labeled ground-truth communities. We use such ground-truth communities to study their structural signatures by analyzing how ground-truth communities emerge in networks and how they overlap. We observe some surprising phenomena. First, ground-truth communities contain high-degree hub nodes that reside in community overlaps and link to most of the members of the community. Second, the overlaps of communities are more densely connected than the non-overlapping parts of communities. We show that this in contrast to the conventional wisdom that community overlaps are more sparsely connected than the non-overlapping parts themselves. We then show that many existing models of network communities do not capture dense community overlaps. This in turn means that most present models and community detection methods confuse overlaps as separate communities. In contrast, we present the community-affiliation graph model (AGM), a conceptual model of network community structure. We demonstrate that AGM reliably captures the overall structure of networks as well as the overlapping and hierarchical nature of network communities.

References

  1. Bruno D. Abrahao, Sucheta Soundarajan, John E. Hopcroft, and Robert Kleinberg. 2012. On the separability of structural classes of communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12). 624--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann. 2010. Link communities reveal multi-scale complexity in networks. Nature 466.Google ScholarGoogle Scholar
  3. E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. 2007. Mixed membership stochastic blockmodels. J. Machine Learn. Res. 9, 1981--2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brian Ball, Brian Karrer, and M. E. J. Newman. 2011. Efficient and principled method for detecting communities in networks. Phys. Rev. E 84.Google ScholarGoogle ScholarCross RefCross Ref
  6. A.-L. Barabási and Z. N. Oltvai. 2004. Network biology: Understanding the cell's functional organization. Nature Rev. Genetics 5, 2, 101--113.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jeffrey Baumes, Mark Goldberg, and Malik Magdon-ismail. 2005. Efficient identification of overlapping communities. In Proceedings of the IEEE International Conference on Intelligence and Security Informatics (ISI'05). 27--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Baumes, M. Goldberg, M. Magdon-Ismail, and A. Wallace. 2004. Discovering hidden groups in communication networks. In Proceedings of the 2nd NSF/NIJ Symposium on Intelligence and Security Informatics (ISI'04).Google ScholarGoogle Scholar
  9. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Machine Learn. Res. 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. L. Breiger. 1974. The duality of persons and groups. Social Forces 53, 2, 181--190.Google ScholarGoogle ScholarCross RefCross Ref
  11. D. Chakrabarti and C. Faloutsos. 2006. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38, 1, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM'04).Google ScholarGoogle Scholar
  13. A. Clauset, M. E. J. Newman, and C. Moore. 2004. Finding community structure in very large networks. Phys. Rev. E 70.Google ScholarGoogle ScholarCross RefCross Ref
  14. V. Colizza, A. Flammini, M. Serrano, and A. Vespignani. 2005. Characterization and modeling of protein protein interaction networks. Physica A Stat. Mech. Appl. 352, 1--27.Google ScholarGoogle ScholarCross RefCross Ref
  15. V. Colizza, A. Flammini, M. Serrano, and A. Vespignani. 2006. Detecting rich-club ordering in complex networks. Nature Phys. 2, 110--115.Google ScholarGoogle ScholarCross RefCross Ref
  16. I. S. Dhillon, Y. Guan, and B. Kulis. 2007. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Trans. Pattern Anal. Machine Intell. 29, 11, 1944--1957. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. S. Evans and R. Lambiotte. 2009. Line graphs, link partitions, and overlapping communities. Phys. Rev. E 80.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Faloutsos, P. Faloutsos, and C. Faloutsos. 1999. On power-law relationships of the internet topology. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'99). 251--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Illes J. Farkas, Imre Derényi, Albert-László Barabási, and Tamas Vicsek. 2001. Spectra of ‘Real-World’ graphs: Beyond the semi-circle law. Phys. Rev. E 64.Google ScholarGoogle ScholarCross RefCross Ref
  20. Scott L. Feld. 1981. The focused organization of social ties. Amer. J. Sociol. 86, 5, 1015--1035.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. Fiedler. 1973. Algebraic connectivity of graphs. Czechoslovak Math. J. 23, 98, 298--305.Google ScholarGoogle ScholarCross RefCross Ref
  22. G. W. Flake, S. Lawrence, and C. L. Giles. 2000. Efficient identification of Web communities. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). 150--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3--5, 75--174.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Fortunato and M. Barthélemy. 2007. Resolution limit in community detection. Proc. Nat. Acad. Sci. U.S.A. 104, 1, 36--41.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. Girvan and M. E. J. Newman. 2002. Community structure in social and biological networks. Proc. Nat. Acad. Sci. U.S.A. 99, 12, 7821--7826.Google ScholarGoogle ScholarCross RefCross Ref
  26. David F. Gleich and C. Seshadhri. 2012. Neighborhoods are good communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12).Google ScholarGoogle Scholar
  27. M. S. Granovetter. 1973. The strength of weak ties. Amer. J. Sociol. 78, 1360--1380.Google ScholarGoogle ScholarCross RefCross Ref
  28. S. Gregory. 2010. Finding overlapping communities in networks by label propagation. New J. Physics 12, 10.Google ScholarGoogle ScholarCross RefCross Ref
  29. S. Guattery and G. L. Miller. 1998. On the quality of spectral separators. SIAM J. Matrix Anal. Appl. 19, 701--719. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Keith Henderson and Tina Eliassi-Rad. 2009. Applying latent dirichlet allocation to group discovery in large graphs. In Proceedings of the ACM Symposium on Applied Computing (SAC'09). 1456--1461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Paul W. Holland, Kathryn Laskey, and Samuel Leinhardt. 1983. Stochastic blockmodels: First steps. Social Netw. 5, 2, 109--137.Google ScholarGoogle ScholarCross RefCross Ref
  32. J. Hopcroft, O. Khan, B. Kulis, and B. Selman. 2003. Natural communities in large linked networks. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03). 541--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Kairam, D. Wang, and J. Leskovec. 2012. The life and death of online groups: Predicting group growth and longevity. In Proceedings of the ACM International Conference on Web Search and Data Minig (WSDM'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Kannan, S. Vempala, and A. Vetta. 2004. On clusterings: Good, bad and spectral. J. ACM 51, 3, 497--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. Karrer, E. Levina, and M. E. J. Newman. 2008. Robustness of community structure in networks. Phys. Rev. E 77.Google ScholarGoogle ScholarCross RefCross Ref
  36. Brian Karrer and M. E. J. Newman. 2011. Stochastic blockmodels and community structure in networks. Phys. Rev. E 83.Google ScholarGoogle ScholarCross RefCross Ref
  37. G. Karypis and V. Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. 2000. Stochastic models for the Web graph. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS'00). 57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Andrea Lancichinetti and Santo Fortunato. 2009. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E 80, 1.Google ScholarGoogle ScholarCross RefCross Ref
  40. Silvio Lattanzi and D. Sivakumar. 2009. Affiliation networks. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC'09). 427--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Conrad Lee, Fergal Reid, Aaron McDaid, and Neil Hurley. 2010. Detecting highly overlapping community structure by greedy clique expansion. In Proceedings of the 4th International Workshop on Advances in Social Network Mining and Analysis (SNA-KDD'10).Google ScholarGoogle Scholar
  42. J. Leskovec, L. A. Adamic, and B. A. Huberman. 2007a. The dynamics of viral marketing. ACM Trans. Web 1, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Leskovec, J. Kleinberg, and C. Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD'05). 177--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. Leskovec, J. Kleinberg, and C. Faloutsos. 2007b. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 1, 29--123.Google ScholarGoogle ScholarCross RefCross Ref
  46. J. Leskovec, K. Lang, and M. Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web (WWW'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wangqun Lin, Xiangnan Kong, Philip S. Yu, Quanyuan Wu, Yan Jia, and Chuan Li. 2012. Community detection in incomplete information networks. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM, New York, NY, 341--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. McAuley and J. Leskovec. 2012. Learning to discover social circles in ego networks. In Proceedings of the 26th Annual Conference on Advances in Neural Information Processing Systems (NIPS'12). 548--556.Google ScholarGoogle Scholar
  49. Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC'07). 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. M. Mitzenmacher. 2004. A brief history of generative models for power law and lognormal distributions. Internet Math. 1, 2, 226--251.Google ScholarGoogle ScholarCross RefCross Ref
  51. M. Molloy and B. Reed. 1995. A critical point for random graphs with a given degree sequence. Random Struct. Algorit. 6, 161--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Morten Mørup, Mikkel N. Schmidt, and Lars Kai Hansen. 2011. Infinite multiple membership relational modeling for complex networks. CoRR abs/1101.5097.Google ScholarGoogle Scholar
  53. Raj Rao Nadakuditi and M. E. J. Newman. 2012. Graph spectra and the detectability of community structure in networks. Phys. Rev. Lett. 108.Google ScholarGoogle ScholarCross RefCross Ref
  54. Tamás Nepusz, Haiyuan Yu, and Alberto Paccanaro. 2012. Detecting overlapping protein complexes in protein-protein interaction networks. Nature Methods 9, 471--472.Google ScholarGoogle ScholarCross RefCross Ref
  55. M. E. J. Newman. 2006. Modularity and community structure in networks. Proc. Nat. Acad. Sci. U.S.A. 103, 23, 8577--8582.Google ScholarGoogle ScholarCross RefCross Ref
  56. M. E. J. Newman and M. Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69.Google ScholarGoogle Scholar
  57. G. Palla, I. Derényi, I. Farkas, and T. Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 7043, 814--818.Google ScholarGoogle Scholar
  58. S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos. 2012. Community detection in social media. Data Mining Knowl. Discov. 24, 3, 515--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Ioannis Psorakis, Stephen Roberts, Mark Ebden, and Ben Sheldon. 2011. Overlapping community detection using Bayesian non-negative matrix factorization. Phys. Rev. E 83, 6.Google ScholarGoogle ScholarCross RefCross Ref
  60. Y. Qi, J. K. Seetharaman, and Z. B. Joseph. 2005. Random forest similarity for protein-protein interaction prediction from multiple sources. In Proceedings of the Pacific Symposium on Biocomputing.Google ScholarGoogle Scholar
  61. F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. 2004. Defining and identifying communities in networks. Proc. Nat. Acad. Sci. U.S.A. 101, 9, 2658--2663.Google ScholarGoogle ScholarCross RefCross Ref
  62. M. Sales-Pardo, R. Guimerà, A. A. Moreira, and L. A. N. Amaral. 2007. Extracting the hierarchical organization of complex systems. Proc. Nat. Acad. Sci. U.S.A. 104, 18874--18874.Google ScholarGoogle ScholarCross RefCross Ref
  63. E. N. Sawardecker, M. Sales-Pardo, and L. A. N. Amaral. 2009. Detection of node group membership in networks with group overlap. Euro. Phys. J. B 67, 277--284.Google ScholarGoogle Scholar
  64. S. E. Schaeffer. 2007. Graph Clustering. Comput. Sci. Rev. 1, 1, 27--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. C. Seshadhri, Tamara G. Kolda, and Ali Pinar. 2012. Community structure and scale-free collections of Erdos-Renyi graphs. Phys. Rev. E 85.Google ScholarGoogle ScholarCross RefCross Ref
  66. Huawei Shen, Xueqi Cheng, Kai Cai, and Mao-Bin Hu. 2009. Detect overlapping and hierarchical community structure in networks. Physica A: Stat. Mech. Appl. 388, 8, 1706--1712.Google ScholarGoogle ScholarCross RefCross Ref
  67. J. Shi and J. Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Machine Intell. 22, 8, 888--905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Georg Simmel. 1964. Conflict and the Web of Group Affiliations. Simon and Schuster.Google ScholarGoogle Scholar
  69. D. A. Spielman and S.-H. Teng. 2007. Spectral partitioning works: Planar graphs and finite element meshes. Linear Algebra Appl. 421, 2--3, 284--305.Google ScholarGoogle ScholarCross RefCross Ref
  70. Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 797--806. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Chayant Tantipathananandh, Tanya Berger-Wolf, and David Kempe. 2007. A framework for community identification in dynamic social networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07). 717--726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Charalampos E. Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In Proceedings of the IEEE International Conference on Data Mining (ICDM'08). 608--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. U. von Luxburg. 2007. A tutorial on spectral clustering. Stat. Comput. 17, 395--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Chunyan Wang, Mao Ye, and Wang-Chien Lee. 2012. From face-to-face gathering to social structure. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM'12). 465--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 440--442.Google ScholarGoogle ScholarCross RefCross Ref
  76. Jaewon Yang and Jure Leskovec. 2012a. Community-affiliation graph model for overlapping network community detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM'12). 1170--1175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Jaewon Yang and Jure Leskovec. 2012b. Defining and evaluating network communities based on ground-truth. In Proceedings of the IEEE International Conference on Data Mining (ICDM'12). 745--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Jaewon Yang and Jure Leskovec. 2012c. Structure and overlaps of communities in large networks. In Proceedings of the 6th International Workshop on Advances in Social Network Mining and Analysis (SNA-KDD'12).Google ScholarGoogle Scholar
  79. Jaewon Yang and Jure Leskovec. 2013a. Defining and evaluating network communities based on ground-truth. Knowl. Inform. Syst. J.Google ScholarGoogle Scholar
  80. J. Yang and J. Leskovec. 2013b. Overlapping community detection at scale: A non-negative factorization approach. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM'13). 587--596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. E. Zheleva, H. Sharara, and L. Getoor. 2009. Co-evolution of social and affiliation networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 1007--1016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, and Hongyuan Zha. 2006. Probabilistic models for discovering e-communities. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Structure and Overlaps of Ground-Truth Communities in Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 5, Issue 2
      Special Issue on Linking Social Granularity and Functions
      April 2014
      347 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/2611448
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 April 2014
      • Accepted: 1 April 2013
      • Revised: 1 March 2013
      • Received: 1 January 2013
      Published in tist Volume 5, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader