skip to main content
research-article

Community Discovery via Metagraph Factorization

Published: 01 August 2011 Publication History

Abstract

This work aims at discovering community structure in rich media social networks through analysis of time-varying, multirelational data. Community structure represents the latent social context of user actions. It has important applications such as search and recommendation. The problem is particularly useful in the enterprise domain, where extracting emergent community structure on enterprise social media can help in forming new collaborative teams, in expertise discovery, and in the long term reorganization of enterprises based on collaboration patterns. There are several unique challenges: (a) In social media, the context of user actions is constantly changing and coevolving; hence the social context contains time-evolving multidimensional relations. (b) The social context is determined by the available system features and is unique in each social media platform; hence the analysis of such data needs to flexibly incorporate various system features. In this article we propose MetaFac (MetaGraph Factorization), a framework that extracts community structures from dynamic, multidimensional social contexts and interactions. Our work has three key contributions: (1) metagraph, a novel relational hypergraph representation for modeling multirelational and multidimensional social data; (2) an efficient multirelational factorization method for community extraction on a given metagraph; (3) an online method to handle time-varying relations through incremental metagraph factorization. Extensive experiments on real-world social data collected from an enterprise and the public Digg social media Web site suggest that our technique is scalable and is able to extract meaningful communities from social media contexts. We illustrate the usefulness of our framework through two prediction tasks: (1) in the enterprise dataset, the task is to predict users’ future interests on tag usage, and (2) in the Digg dataset, the task is to predict users’ future interests in voting and commenting on Digg stories. Our prediction significantly outperforms baseline methods (including aspect model and tensor analysis), indicating the promising direction of using metagraphs for handling time-varying social relational contexts.

References

[1]
Adamic, L. and Adar, E. 2003. Friends and neighbors on the web. Social Networks 25, 3, 211--230.
[2]
Aggarwal, C. and Yu, P. 2005. Online analysis of community evolution in data streams. In Proceedings of SIAM Conference on Data Mining (SDM).
[3]
Ahmed, A. and Xing, E. 2008. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: With applications to evolutionary clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM).
[4]
Airoldi, E., Blei, D., Fienberg, S., and Xing, E. 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981--2014.
[5]
Asur, S., Parthasarathy, S., and Ucar, D. 2007. An event-based framework for characterizing the evolutionary behavior of interaction graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD).
[6]
Bader, B. and Kolda, T. 2006. Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Trans. Math. Softw. 32, 4, 635--653.
[7]
Bader, B., Harshman, R., and Kolda, T. 2006. Temporal analysis of social networks using three-way dedicom. Tech. rep. SAND2006-2161, Sandia National Labs, Albuquerque, NM and Livermore, CA.
[8]
Backstrom, L., Huttenlocher, D., Kleinberg, J., and Lan, X. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). 44--54.
[9]
Banerjee, A., Basu, S., and Merugu, S. 2007. Multi-way clustering on relation graphs. In Proceedings of SIAM Conference on Data Mining (SDM).
[10]
Barber, M., Faria, M., Streit, L., and Strogan, O. 2008. Searching for communities in bipartite networks. American Institute of Physics, arXiv:803.2854, 171--182.
[11]
Basu, A. and Blanning, R. 2007. Metagraphs and Their Applications. Springer-Verlag, Berlin.
[12]
Bekkerman, R., El-Yaniv, R., and McCallum, A. 2005. Multi-way distributional clustering via pairwise interactions. In Proceedings of the International Conference on Machine Learning (ICML). 41--48.
[13]
Berge, C. 1976. Graphs and Hypergraphs. North-Holland.
[14]
Berger-Wolf, T. and Saia, J. 2006. A framework for analysis of dynamic social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[15]
Borg, I. and Groenen, P. 2005. Modern Multidimensional Scaling: Theory and Applications. Springer.
[16]
Borgatti, S. and Cross, R. 2003. A relational view of information seeking and learning in social networks. Manag. Sci. 49, 4, 432--445.
[17]
Carroll, J. and Chang, J. 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition. Psychometrika 35, 3, 283--319.
[18]
Catral, M., Han, L., Neumann, M., and Plemmons, R. 2004. On reduced rank nonnegative matrix factorization for symmetric nonnegative matrices. Linear Alg. Appl. 393, 107--126.
[19]
Chakrabarti, D., Kumar, R., and Tomkins, A. 2006. Evolutionary clustering. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 554--560.
[20]
Chen, P. 1976. The entity-relationship model---Toward a unified view of data. ACM Trans. Datab. Syst. 1, 1.
[21]
Chi, Y., Song, X., Zhou, D., Hino, K., and Tseng, B. 2007. Evolutionary spectral clustering by incorporating temporal smoothness. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[22]
Chi, Y., Zhu, S., Gong, Y., and Zhang, Y. 2008. Probabilistic polyadic factorization and its application to personalized recommendation. In Proceedings of International Conference on Knowledge Management.
[23]
Chung, F. 1997. Spectral Graph Theory. American Mathematical Society.
[24]
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statist. Society. Series B (Methodological) 39, 1, 1--38.
[25]
Ding, C., He, X., and Simon, H. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the SIAM Conference on Data Mining (SDM).
[26]
Falkowski, T., Bartelheimer, J., and Spiliopoulou, M. 2006. Mining and visualizing the evolution of subgroups in social networks. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI). 52--58.
[27]
Fortunato, S. 2010. Community detection in graphs. Phys. Rep. 486, 3--5, 75--174.
[28]
Friedman, N., Getoor, L., Koller, D., and Pfeffer, A. 1999. Learning probabilistic relational models. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 1300--1309.
[29]
Girvan, M. and Newman, M. 2002. Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99, 12, 7821.
[30]
Granovetter, M. 1985. Economic action and social structure: A theory of embeddedness. Amer J. Sociol. 91, 3, 481--510.
[31]
Grujic, J., Mitrovic, M., and Tadic, B. 2009. Mixing patterns and communities on bipartite graphs on web-based social interactions. In Proceedings of the International Conference on Digital Signal Processing. 1--8.
[32]
Harada, K., Ishioka, T., Suzuki, I., and Furukawa, M. 2007. A method for solving a bipartite-graph clustering problem with sequence optimization. In Proceedings of the IEEE International Conference on Computer and Information Technology. 915--920.
[33]
Harshman, R. 1970. Foundations of the parafac procedure: Models and conditions for an “Explanatory” Multi-modal factor analysis. UCLA Work. Papers Phonetics 16, 1, 1.
[34]
Hitchcock, F. 1927. The expression of a tensor or a polyadic as a sum of products. J. Math. Phys 6, 1, 164--189.
[35]
Hofman, J. and Wiggins, C. 2008. Bayesian approach to network modularity. Phys. Rev. Lett. 100, 25, 258701.
[36]
Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceeding of the ACM SIGIR Conference.
[37]
Holland, P. and Leinhardt, S. 1981. An exponential family of probability distributions for directed graphs. J. Amer. Statistical Assoc. 76, 373, 33--50.
[38]
Järvelin, K. and Kekäläinen, J. 2000. Ir evaluation methods for retrieving highly relevant documents. In Proceedings of the ACM SIGIR Conference.
[39]
Kemp, C., Griffiths, T., and Tenenbaum, J. 2004. Discovering latent classes in relational data. M.I.T. AI memo 2004-19.
[40]
Kemp, C., Tenenbaum, J., Griffiths, T., Yamada, T., and Ueda, N. 2006. Learning systems of concepts with an infinite relational model. Proc. Natl. Conf. AI. AAAI, 381.
[41]
Kumar, R., Novak, J., and Tomkins, A. 2006. Structure and evolution of online social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[42]
Lahiri, M. and Berger-Wolf, T. 2008. Mining periodic behavior in dynamic social networks. In Proceedings of the International Conference on Data Mining (ICDM). 373--382.
[43]
Lee, D. and Seung, H. 2001. Algorithms for non-negative matrix factorization. In Proceedings of the Conference on Neural Information Processing Systems (NIPS). 556--562.
[44]
Leskovec, J., Backstrom, L., Kumar, R., and Tomkins, A. 2008. Microscopic evolution of social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[45]
Lin, Y.-R., Chi, Y., Zhu, S., Sundaram, H., and Tseng, B. L. 2008. Facenet: A framework for analyzing communities and their evolutions in dynamics networks. In Proceedings of the International World Wide Web Conference.
[46]
Lin, Y.-R., Chi, Y., Zhu, S., Sundaram, H., and Tseng, B. L. 2009a. Analyzing communities and their evolutions in dynamics networks. Trans. Knowl. Discov. Data 3, 2.
[47]
Lin, Y.-R., Sun, J., Castro, P., Konuru, R., Sundaram, H., and Kelliher, A. 2009b. Metafac: Community discovery via relational hypergraph factorization. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[48]
Long, B., Zhang, Z., and Yu, P. 2007. A probabilistic framework for relational clustering. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 470--479.
[49]
Lovasz, L. and Plummer, M. 1986. Matching Theory. North Holland.
[50]
Millen, D., Feinberg, J., and Kerr, B. 2006. Dogear: Social bookmarking in the enterprise. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 111--120.
[51]
Monge, P. and Contractor, N. 2001. Emergence of communication networks. The New Handbook of Organizational Communication: Advances in Theory, Research, and Methods. F. M. Jablin and L. Putnam Eds. 440--502.
[52]
Newman, M. and Girvan, M. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69, 2, 26113.
[53]
Newman, M. and Leicht, E. 2007. Mixture models and exploratory analysis in networks. Proc. Nat. Acad. Sci. 104, 23, 9564.
[54]
Palla, G., Barabasi, A., and Vicsek, T. 2007. Quantifying social group evolution. Nature 446, 7136, 664--667.
[55]
Popescul, A., Ungar, L. H., Pennock, D. M., and Lawrence, S. 2001. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). 437--444.
[56]
Powell, W., Koput, K., and Smith-Doerr, L. 1996. Interorganizational collaboration and the locus of innovation: Networks of learning in biotechnology. Admin. Science Quart. 41, 1.
[57]
Rugg, R. 1984. Mathematical, algorithmic and data structure issues: Building a hypergraph-based data structure the examples of census geography and the road system. Cartographica: Inter. J. Geographic Inform. Geovisualization 21, 2, 179--187.
[58]
Sarkar, P. and Moore, A. 2005. Dynamic social network analysis using latent space models. SIGKDD Explorations Newsl. 7, 2, 31--40.
[59]
Scandura, T. A. and Williams, E. A. 2000. Research methodology in management: Current practices, trends, and implications for future research. Academy Manag. J. 43, 6, 1248--1264.
[60]
Schein, A., Popescul, A., Ungar, L., and Pennock, D. 2001. Generative models for cold-start recommendations. In Proceedings of the SIGIR Workshop on Recommender Systems.
[61]
Seidman, S. 1981. Structures induced by collections of subsets: A hypergraph approach. Mathem. Social Sci. 1, 4, 381--396.
[62]
Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 22, 8, 888--905.
[63]
Simsek, Z. and Veiga, J. F. 2000. The electronic survey technique: An integration and assessment. Organiz. Res. Meth. 3, 1, 93.
[64]
Singh, A. and Gordon, G. 2008. Relational learning via collective matrix factorization. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[65]
Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., and Schult, R. 2006. Monic: Modeling and monitoring cluster transitions. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 706--711.
[66]
Stanton, J. M. and Rogelberg, S. G. 2001. Using internet/intranet web pages to collect organizational research data. Organiz. Res. Meth. 4, 3, 200.
[67]
Sun, J., Faloutsos, C., Papadimitriou, S., and Yu, P. 2007. Graphscope: Parameter-free mining of large time-evolving graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 687--696.
[68]
Tang, L., Liu, H., Zhang, J., and Nazeri, Z. 2008. Community evolution in dynamic multi-mode networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[69]
Tantipathananandh, C., Berger-Wolf, T., and Kempe, D. 2007. A framework for community identification in dynamic social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 717--726.
[70]
Taskar, B., Abbeel, P., and Koller, D. 2002. Discriminative probabilistic models for relational data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). 895--902.
[71]
Vavasis, S. 2007. On the complexity of nonnegative matrix factorization. arXiv-0708.4149v2{CS.NA}.
[72]
Wang, X., Sun, J., Chen, Z., and Zhai, C. 2006. Latent semantic analysis for multiple-type interrelated data objects. In Proceedings of International ACM SIGIR Conference on Research and Development on Information Retrieval. 236--243.
[73]
Wasserman, S. and Faust, K. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press.
[74]
Yang, T., Chi, Y., Zhu, S., Gong, Y., and Jin, R. 2009. A Bayesian approach toward finding communities and their evolutions in dynamic social networks. In Proceedings of SIAM International Conference on Data Mining (SDM).
[75]
You, C., Holder, L., and Cook, D. 2009. Learning patterns in the dynamics of biological networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 977--986.
[76]
Yu, K., Yu, S., and Tresp, V. 2005. Soft clustering on graphs. In Proceedings of the Conference on Neural Information Processing Systems (NIPS).
[77]
Zhu, S., Yu, K., Chi, Y., and Gong, Y. 2007. Combining content and link for classification using matrix factorization. In Proceedings of the International ACM SIGIR Conference on Research and Development on Information Retrieval. 487--494.

Cited By

View all
  • (2020)Multi-Source Information Fusion Based Heterogeneous Network EmbeddingInformation Sciences10.1016/j.ins.2020.05.012Online publication date: May-2020
  • (2019)ErLinkTopic: A generative probabilistic framework for analyzing regional communities in social networksVinh University Journal of Science10.56824/vujs.2019nt1848:2AOnline publication date: 15-Aug-2019
  • (2019)DynaMo: Dynamic Community Detection by Incrementally Maximizing ModularityIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2951419(1-1)Online publication date: 2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 5, Issue 3
August 2011
119 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1993077
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2011
Accepted: 01 January 2011
Revised: 01 October 2010
Received: 01 July 2009
Published in TKDD Volume 5, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MetaFac
  2. community discovery
  3. dynamic social network analysis
  4. metagraph factorization
  5. nonnegative tensor factorization
  6. relational hypergraph

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Multi-Source Information Fusion Based Heterogeneous Network EmbeddingInformation Sciences10.1016/j.ins.2020.05.012Online publication date: May-2020
  • (2019)ErLinkTopic: A generative probabilistic framework for analyzing regional communities in social networksVinh University Journal of Science10.56824/vujs.2019nt1848:2AOnline publication date: 15-Aug-2019
  • (2019)DynaMo: Dynamic Community Detection by Incrementally Maximizing ModularityIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2951419(1-1)Online publication date: 2019
  • (2019)Social community detection and message propagation scheme based on personal willingness in social networkSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-018-3283-x23:15(6267-6285)Online publication date: 1-Aug-2019
  • (2018)Multityped Community Discovery in Time-Evolving Heterogeneous Information Networks Based on Tensor DecompositionComplexity10.1155/2018/96534042018(38)Online publication date: 1-Mar-2018
  • (2018)Event Analytics via Discriminant Tensor FactorizationACM Transactions on Knowledge Discovery from Data10.1145/318445512:6(1-38)Online publication date: 10-Oct-2018
  • (2018)Detecting Topic Authoritative Social Media Users: A Multilayer Network ApproachIEEE Transactions on Multimedia10.1109/TMM.2017.276332420:5(1195-1208)Online publication date: May-2018
  • (2018)Analysis of User Network and Correlation for Community Discovery Based on Topic-Aware Similarity and Behavioral InfluenceIEEE Transactions on Human-Machine Systems10.1109/THMS.2017.272534148:6(559-571)Online publication date: Dec-2018
  • (2018)A Micromodel to Predict Message Propagation for Twitter Users2018 International Conference on Data Science and Engineering (ICDSE)10.1109/ICDSE.2018.8527807(1-5)Online publication date: Aug-2018
  • (2018)Hete_MESE: Multi-Dimensional Community Detection Algorithm Based on Multiplex Network Extraction and Seed Expansion for Heterogeneous Information NetworksIEEE Access10.1109/ACCESS.2018.28836386(73965-73983)Online publication date: 2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media