research-article

Structure and Overlaps of Ground-Truth Communities in Networks

Authors:
Jaewon Yang

Stanford University

Stanford University
View Profile

,
Jure Leskovec

Stanford University

Stanford University
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 5 Issue 2Article No.: 26pp 1–35https://doi.org/10.1145/2594454

Published:30 April 2014Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Even though detection of such communities is of great interest, understanding the structure communities in large networks remains relatively limited. In particular, due to the unavailability of labeled ground-truth data, it was traditionally very hard to develop accurate models of network community structure.

Here we use six large social, collaboration, and information networks where nodes explicitly state their ground-truth community memberships. For example, nodes in social networks join into explicitly defined interest based groups, and we use such groups as explicitly labeled ground-truth communities. We use such ground-truth communities to study their structural signatures by analyzing how ground-truth communities emerge in networks and how they overlap. We observe some surprising phenomena. First, ground-truth communities contain high-degree hub nodes that reside in community overlaps and link to most of the members of the community. Second, the overlaps of communities are more densely connected than the non-overlapping parts of communities. We show that this in contrast to the conventional wisdom that community overlaps are more sparsely connected than the non-overlapping parts themselves. We then show that many existing models of network communities do not capture dense community overlaps. This in turn means that most present models and community detection methods confuse overlaps as separate communities. In contrast, we present the community-affiliation graph model (AGM), a conceptual model of network community structure. We demonstrate that AGM reliably captures the overall structure of networks as well as the overlapping and hierarchical nature of network communities.

References

Bruno D. Abrahao, Sucheta Soundarajan, John E. Hopcroft, and Robert Kleinberg. 2012. On the separability of structural classes of communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12). 624--632. Google ScholarDigital Library
Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann. 2010. Link communities reveal multi-scale complexity in networks. Nature 466.Google Scholar
E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. 2007. Mixed membership stochastic blockmodels. J. Machine Learn. Res. 9, 1981--2014. Google ScholarDigital Library
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 44--54. Google ScholarDigital Library
Brian Ball, Brian Karrer, and M. E. J. Newman. 2011. Efficient and principled method for detecting communities in networks. Phys. Rev. E 84.Google ScholarCross Ref
A.-L. Barabási and Z. N. Oltvai. 2004. Network biology: Understanding the cell's functional organization. Nature Rev. Genetics 5, 2, 101--113.Google ScholarCross Ref
Jeffrey Baumes, Mark Goldberg, and Malik Magdon-ismail. 2005. Efficient identification of overlapping communities. In Proceedings of the IEEE International Conference on Intelligence and Security Informatics (ISI'05). 27--36. Google ScholarDigital Library
J. Baumes, M. Goldberg, M. Magdon-Ismail, and A. Wallace. 2004. Discovering hidden groups in communication networks. In Proceedings of the 2nd NSF/NIJ Symposium on Intelligence and Security Informatics (ISI'04).Google Scholar
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Machine Learn. Res. 3, 993--1022. Google ScholarDigital Library
R. L. Breiger. 1974. The duality of persons and groups. Social Forces 53, 2, 181--190.Google ScholarCross Ref
D. Chakrabarti and C. Faloutsos. 2006. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38, 1, 2. Google ScholarDigital Library
D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM'04).Google Scholar
A. Clauset, M. E. J. Newman, and C. Moore. 2004. Finding community structure in very large networks. Phys. Rev. E 70.Google ScholarCross Ref
V. Colizza, A. Flammini, M. Serrano, and A. Vespignani. 2005. Characterization and modeling of protein protein interaction networks. Physica A Stat. Mech. Appl. 352, 1--27.Google ScholarCross Ref
V. Colizza, A. Flammini, M. Serrano, and A. Vespignani. 2006. Detecting rich-club ordering in complex networks. Nature Phys. 2, 110--115.Google ScholarCross Ref
I. S. Dhillon, Y. Guan, and B. Kulis. 2007. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Trans. Pattern Anal. Machine Intell. 29, 11, 1944--1957. Google ScholarDigital Library
T. S. Evans and R. Lambiotte. 2009. Line graphs, link partitions, and overlapping communities. Phys. Rev. E 80.Google ScholarCross Ref
M. Faloutsos, P. Faloutsos, and C. Faloutsos. 1999. On power-law relationships of the internet topology. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'99). 251--262. Google ScholarDigital Library
Illes J. Farkas, Imre Derényi, Albert-László Barabási, and Tamas Vicsek. 2001. Spectra of ‘Real-World’ graphs: Beyond the semi-circle law. Phys. Rev. E 64.Google ScholarCross Ref
Scott L. Feld. 1981. The focused organization of social ties. Amer. J. Sociol. 86, 5, 1015--1035.Google ScholarCross Ref
M. Fiedler. 1973. Algebraic connectivity of graphs. Czechoslovak Math. J. 23, 98, 298--305.Google ScholarCross Ref
G. W. Flake, S. Lawrence, and C. L. Giles. 2000. Efficient identification of Web communities. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). 150--160. Google ScholarDigital Library
S. Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3--5, 75--174.Google ScholarCross Ref
S. Fortunato and M. Barthélemy. 2007. Resolution limit in community detection. Proc. Nat. Acad. Sci. U.S.A. 104, 1, 36--41.Google ScholarCross Ref
M. Girvan and M. E. J. Newman. 2002. Community structure in social and biological networks. Proc. Nat. Acad. Sci. U.S.A. 99, 12, 7821--7826.Google ScholarCross Ref
David F. Gleich and C. Seshadhri. 2012. Neighborhoods are good communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12).Google Scholar
M. S. Granovetter. 1973. The strength of weak ties. Amer. J. Sociol. 78, 1360--1380.Google ScholarCross Ref
S. Gregory. 2010. Finding overlapping communities in networks by label propagation. New J. Physics 12, 10.Google ScholarCross Ref
S. Guattery and G. L. Miller. 1998. On the quality of spectral separators. SIAM J. Matrix Anal. Appl. 19, 701--719. Google ScholarDigital Library
Keith Henderson and Tina Eliassi-Rad. 2009. Applying latent dirichlet allocation to group discovery in large graphs. In Proceedings of the ACM Symposium on Applied Computing (SAC'09). 1456--1461. Google ScholarDigital Library
Paul W. Holland, Kathryn Laskey, and Samuel Leinhardt. 1983. Stochastic blockmodels: First steps. Social Netw. 5, 2, 109--137.Google ScholarCross Ref
J. Hopcroft, O. Khan, B. Kulis, and B. Selman. 2003. Natural communities in large linked networks. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03). 541--546. Google ScholarDigital Library
S. Kairam, D. Wang, and J. Leskovec. 2012. The life and death of online groups: Predicting group growth and longevity. In Proceedings of the ACM International Conference on Web Search and Data Minig (WSDM'12). Google ScholarDigital Library
R. Kannan, S. Vempala, and A. Vetta. 2004. On clusterings: Good, bad and spectral. J. ACM 51, 3, 497--515. Google ScholarDigital Library
B. Karrer, E. Levina, and M. E. J. Newman. 2008. Robustness of community structure in networks. Phys. Rev. E 77.Google ScholarCross Ref
Brian Karrer and M. E. J. Newman. 2011. Stochastic blockmodels and community structure in networks. Phys. Rev. E 83.Google ScholarCross Ref
G. Karypis and V. Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359--392. Google ScholarDigital Library
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. 2000. Stochastic models for the Web graph. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS'00). 57. Google ScholarDigital Library
Andrea Lancichinetti and Santo Fortunato. 2009. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E 80, 1.Google ScholarCross Ref
Silvio Lattanzi and D. Sivakumar. 2009. Affiliation networks. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC'09). 427--434. Google ScholarDigital Library
Conrad Lee, Fergal Reid, Aaron McDaid, and Neil Hurley. 2010. Detecting highly overlapping community structure by greedy clique expansion. In Proceedings of the 4th International Workshop on Advances in Social Network Mining and Analysis (SNA-KDD'10).Google Scholar
J. Leskovec, L. A. Adamic, and B. A. Huberman. 2007a. The dynamics of viral marketing. ACM Trans. Web 1, 1. Google ScholarDigital Library
J. Leskovec, J. Kleinberg, and C. Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD'05). 177--187. Google ScholarDigital Library
J. Leskovec, J. Kleinberg, and C. Faloutsos. 2007b. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 1. Google ScholarDigital Library
J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 1, 29--123.Google ScholarCross Ref
J. Leskovec, K. Lang, and M. Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web (WWW'10). Google ScholarDigital Library
Wangqun Lin, Xiangnan Kong, Philip S. Yu, Quanyuan Wu, Yan Jia, and Chuan Li. 2012. Community detection in incomplete information networks. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM, New York, NY, 341--350. Google ScholarDigital Library
J. McAuley and J. Leskovec. 2012. Learning to discover social circles in ego networks. In Proceedings of the 26th Annual Conference on Advances in Neural Information Processing Systems (NIPS'12). 548--556.Google Scholar
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC'07). 29--42. Google ScholarDigital Library
M. Mitzenmacher. 2004. A brief history of generative models for power law and lognormal distributions. Internet Math. 1, 2, 226--251.Google ScholarCross Ref
M. Molloy and B. Reed. 1995. A critical point for random graphs with a given degree sequence. Random Struct. Algorit. 6, 161--180. Google ScholarDigital Library
Morten Mørup, Mikkel N. Schmidt, and Lars Kai Hansen. 2011. Infinite multiple membership relational modeling for complex networks. CoRR abs/1101.5097.Google Scholar
Raj Rao Nadakuditi and M. E. J. Newman. 2012. Graph spectra and the detectability of community structure in networks. Phys. Rev. Lett. 108.Google ScholarCross Ref
Tamás Nepusz, Haiyuan Yu, and Alberto Paccanaro. 2012. Detecting overlapping protein complexes in protein-protein interaction networks. Nature Methods 9, 471--472.Google ScholarCross Ref
M. E. J. Newman. 2006. Modularity and community structure in networks. Proc. Nat. Acad. Sci. U.S.A. 103, 23, 8577--8582.Google ScholarCross Ref
M. E. J. Newman and M. Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69.Google Scholar
G. Palla, I. Derényi, I. Farkas, and T. Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 7043, 814--818.Google Scholar
S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos. 2012. Community detection in social media. Data Mining Knowl. Discov. 24, 3, 515--554. Google ScholarDigital Library
Ioannis Psorakis, Stephen Roberts, Mark Ebden, and Ben Sheldon. 2011. Overlapping community detection using Bayesian non-negative matrix factorization. Phys. Rev. E 83, 6.Google ScholarCross Ref
Y. Qi, J. K. Seetharaman, and Z. B. Joseph. 2005. Random forest similarity for protein-protein interaction prediction from multiple sources. In Proceedings of the Pacific Symposium on Biocomputing.Google Scholar
F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. 2004. Defining and identifying communities in networks. Proc. Nat. Acad. Sci. U.S.A. 101, 9, 2658--2663.Google ScholarCross Ref
M. Sales-Pardo, R. Guimerà, A. A. Moreira, and L. A. N. Amaral. 2007. Extracting the hierarchical organization of complex systems. Proc. Nat. Acad. Sci. U.S.A. 104, 18874--18874.Google ScholarCross Ref
E. N. Sawardecker, M. Sales-Pardo, and L. A. N. Amaral. 2009. Detection of node group membership in networks with group overlap. Euro. Phys. J. B 67, 277--284.Google Scholar
S. E. Schaeffer. 2007. Graph Clustering. Comput. Sci. Rev. 1, 1, 27--64. Google ScholarDigital Library
C. Seshadhri, Tamara G. Kolda, and Ali Pinar. 2012. Community structure and scale-free collections of Erdos-Renyi graphs. Phys. Rev. E 85.Google ScholarCross Ref
Huawei Shen, Xueqi Cheng, Kai Cai, and Mao-Bin Hu. 2009. Detect overlapping and hierarchical community structure in networks. Physica A: Stat. Mech. Appl. 388, 8, 1706--1712.Google ScholarCross Ref
J. Shi and J. Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Machine Intell. 22, 8, 888--905. Google ScholarDigital Library
Georg Simmel. 1964. Conflict and the Web of Group Affiliations. Simon and Schuster.Google Scholar
D. A. Spielman and S.-H. Teng. 2007. Spectral partitioning works: Planar graphs and finite element meshes. Linear Algebra Appl. 421, 2--3, 284--305.Google ScholarCross Ref
Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 797--806. Google ScholarDigital Library
Chayant Tantipathananandh, Tanya Berger-Wolf, and David Kempe. 2007. A framework for community identification in dynamic social networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07). 717--726. Google ScholarDigital Library
Charalampos E. Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In Proceedings of the IEEE International Conference on Data Mining (ICDM'08). 608--617. Google ScholarDigital Library
U. von Luxburg. 2007. A tutorial on spectral clustering. Stat. Comput. 17, 395--416. Google ScholarDigital Library
Chunyan Wang, Mao Ye, and Wang-Chien Lee. 2012. From face-to-face gathering to social structure. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM'12). 465--474. Google ScholarDigital Library
D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 440--442.Google ScholarCross Ref
Jaewon Yang and Jure Leskovec. 2012a. Community-affiliation graph model for overlapping network community detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM'12). 1170--1175. Google ScholarDigital Library
Jaewon Yang and Jure Leskovec. 2012b. Defining and evaluating network communities based on ground-truth. In Proceedings of the IEEE International Conference on Data Mining (ICDM'12). 745--754. Google ScholarDigital Library
Jaewon Yang and Jure Leskovec. 2012c. Structure and overlaps of communities in large networks. In Proceedings of the 6th International Workshop on Advances in Social Network Mining and Analysis (SNA-KDD'12).Google Scholar
Jaewon Yang and Jure Leskovec. 2013a. Defining and evaluating network communities based on ground-truth. Knowl. Inform. Syst. J.Google Scholar
J. Yang and J. Leskovec. 2013b. Overlapping community detection at scale: A non-negative factorization approach. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM'13). 587--596. Google ScholarDigital Library
E. Zheleva, H. Sharara, and L. Getoor. 2009. Co-evolution of social and affiliation networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 1007--1016. Google ScholarDigital Library
Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, and Hongyuan Zha. 2006. Probabilistic models for discovering e-communities. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). 173--182. Google ScholarDigital Library

Index Terms

Structure and Overlaps of Ground-Truth Communities in Networks
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Overlapping community detection at scale: a nonnegative matrix factorization approach
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining

Network communities represent basic structures for understanding the organization of real-world networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than ...
Read More
Defining and evaluating network communities based on ground-truth
MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics

Nodes in real-world networks, such as social, information or technological networks, organize into communities where edges appear with high concentration among the members of the community. Identifying communities in networks has proven to be a ...
Read More
Defining and evaluating network communities based on ground-truth

Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Intelligent Systems and Technology Volume 5, Issue 2
Special Issue on Linking Social Granularity and Functions
April 2014
347 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2611448
Issue’s Table of Contents

Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 April 2014
- Accepted: 1 April 2013
- Revised: 1 March 2013
- Received: 1 January 2013
Published in tist Volume 5, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Network communities
affiliation networks
social networks
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 69
  Total Citations
  View Citations
- 1,429
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Structure and Overlaps of Ground-Truth Communities in Networks

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Overlapping community detection at scale: a nonnegative matrix factorization approach

Defining and evaluating network communities based on ground-truth

Defining and evaluating network communities based on ground-truth

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Structure and Overlaps of Ground-Truth Communities in Networks

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Overlapping community detection at scale: a nonnegative matrix factorization approach

Defining and evaluating network communities based on ground-truth

Defining and evaluating network communities based on ground-truth

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media