skip to main content
research-article

Community Discovery via Metagraph Factorization

Published:01 August 2011Publication History
Skip Abstract Section

Abstract

This work aims at discovering community structure in rich media social networks through analysis of time-varying, multirelational data. Community structure represents the latent social context of user actions. It has important applications such as search and recommendation. The problem is particularly useful in the enterprise domain, where extracting emergent community structure on enterprise social media can help in forming new collaborative teams, in expertise discovery, and in the long term reorganization of enterprises based on collaboration patterns. There are several unique challenges: (a) In social media, the context of user actions is constantly changing and coevolving; hence the social context contains time-evolving multidimensional relations. (b) The social context is determined by the available system features and is unique in each social media platform; hence the analysis of such data needs to flexibly incorporate various system features. In this article we propose MetaFac (MetaGraph Factorization), a framework that extracts community structures from dynamic, multidimensional social contexts and interactions. Our work has three key contributions: (1) metagraph, a novel relational hypergraph representation for modeling multirelational and multidimensional social data; (2) an efficient multirelational factorization method for community extraction on a given metagraph; (3) an online method to handle time-varying relations through incremental metagraph factorization. Extensive experiments on real-world social data collected from an enterprise and the public Digg social media Web site suggest that our technique is scalable and is able to extract meaningful communities from social media contexts. We illustrate the usefulness of our framework through two prediction tasks: (1) in the enterprise dataset, the task is to predict users’ future interests on tag usage, and (2) in the Digg dataset, the task is to predict users’ future interests in voting and commenting on Digg stories. Our prediction significantly outperforms baseline methods (including aspect model and tensor analysis), indicating the promising direction of using metagraphs for handling time-varying social relational contexts.

References

  1. Adamic, L. and Adar, E. 2003. Friends and neighbors on the web. Social Networks 25, 3, 211--230.Google ScholarGoogle ScholarCross RefCross Ref
  2. Aggarwal, C. and Yu, P. 2005. Online analysis of community evolution in data streams. In Proceedings of SIAM Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  3. Ahmed, A. and Xing, E. 2008. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: With applications to evolutionary clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM).Google ScholarGoogle Scholar
  4. Airoldi, E., Blei, D., Fienberg, S., and Xing, E. 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981--2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Asur, S., Parthasarathy, S., and Ucar, D. 2007. An event-based framework for characterizing the evolutionary behavior of interaction graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bader, B. and Kolda, T. 2006. Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Trans. Math. Softw. 32, 4, 635--653. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bader, B., Harshman, R., and Kolda, T. 2006. Temporal analysis of social networks using three-way dedicom. Tech. rep. SAND2006-2161, Sandia National Labs, Albuquerque, NM and Livermore, CA.Google ScholarGoogle Scholar
  8. Backstrom, L., Huttenlocher, D., Kleinberg, J., and Lan, X. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Banerjee, A., Basu, S., and Merugu, S. 2007. Multi-way clustering on relation graphs. In Proceedings of SIAM Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  10. Barber, M., Faria, M., Streit, L., and Strogan, O. 2008. Searching for communities in bipartite networks. American Institute of Physics, arXiv:803.2854, 171--182.Google ScholarGoogle Scholar
  11. Basu, A. and Blanning, R. 2007. Metagraphs and Their Applications. Springer-Verlag, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bekkerman, R., El-Yaniv, R., and McCallum, A. 2005. Multi-way distributional clustering via pairwise interactions. In Proceedings of the International Conference on Machine Learning (ICML). 41--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Berge, C. 1976. Graphs and Hypergraphs. North-Holland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Berger-Wolf, T. and Saia, J. 2006. A framework for analysis of dynamic social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Borg, I. and Groenen, P. 2005. Modern Multidimensional Scaling: Theory and Applications. Springer.Google ScholarGoogle Scholar
  16. Borgatti, S. and Cross, R. 2003. A relational view of information seeking and learning in social networks. Manag. Sci. 49, 4, 432--445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Carroll, J. and Chang, J. 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition. Psychometrika 35, 3, 283--319.Google ScholarGoogle ScholarCross RefCross Ref
  18. Catral, M., Han, L., Neumann, M., and Plemmons, R. 2004. On reduced rank nonnegative matrix factorization for symmetric nonnegative matrices. Linear Alg. Appl. 393, 107--126.Google ScholarGoogle ScholarCross RefCross Ref
  19. Chakrabarti, D., Kumar, R., and Tomkins, A. 2006. Evolutionary clustering. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 554--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chen, P. 1976. The entity-relationship model---Toward a unified view of data. ACM Trans. Datab. Syst. 1, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chi, Y., Song, X., Zhou, D., Hino, K., and Tseng, B. 2007. Evolutionary spectral clustering by incorporating temporal smoothness. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chi, Y., Zhu, S., Gong, Y., and Zhang, Y. 2008. Probabilistic polyadic factorization and its application to personalized recommendation. In Proceedings of International Conference on Knowledge Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chung, F. 1997. Spectral Graph Theory. American Mathematical Society.Google ScholarGoogle Scholar
  24. Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statist. Society. Series B (Methodological) 39, 1, 1--38.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ding, C., He, X., and Simon, H. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the SIAM Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  26. Falkowski, T., Bartelheimer, J., and Spiliopoulou, M. 2006. Mining and visualizing the evolution of subgroups in social networks. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI). 52--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Fortunato, S. 2010. Community detection in graphs. Phys. Rep. 486, 3--5, 75--174.Google ScholarGoogle ScholarCross RefCross Ref
  28. Friedman, N., Getoor, L., Koller, D., and Pfeffer, A. 1999. Learning probabilistic relational models. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 1300--1309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Girvan, M. and Newman, M. 2002. Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99, 12, 7821.Google ScholarGoogle ScholarCross RefCross Ref
  30. Granovetter, M. 1985. Economic action and social structure: A theory of embeddedness. Amer J. Sociol. 91, 3, 481--510.Google ScholarGoogle ScholarCross RefCross Ref
  31. Grujic, J., Mitrovic, M., and Tadic, B. 2009. Mixing patterns and communities on bipartite graphs on web-based social interactions. In Proceedings of the International Conference on Digital Signal Processing. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Harada, K., Ishioka, T., Suzuki, I., and Furukawa, M. 2007. A method for solving a bipartite-graph clustering problem with sequence optimization. In Proceedings of the IEEE International Conference on Computer and Information Technology. 915--920. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Harshman, R. 1970. Foundations of the parafac procedure: Models and conditions for an “Explanatory” Multi-modal factor analysis. UCLA Work. Papers Phonetics 16, 1, 1.Google ScholarGoogle Scholar
  34. Hitchcock, F. 1927. The expression of a tensor or a polyadic as a sum of products. J. Math. Phys 6, 1, 164--189.Google ScholarGoogle ScholarCross RefCross Ref
  35. Hofman, J. and Wiggins, C. 2008. Bayesian approach to network modularity. Phys. Rev. Lett. 100, 25, 258701.Google ScholarGoogle ScholarCross RefCross Ref
  36. Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceeding of the ACM SIGIR Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Holland, P. and Leinhardt, S. 1981. An exponential family of probability distributions for directed graphs. J. Amer. Statistical Assoc. 76, 373, 33--50.Google ScholarGoogle Scholar
  38. Järvelin, K. and Kekäläinen, J. 2000. Ir evaluation methods for retrieving highly relevant documents. In Proceedings of the ACM SIGIR Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Kemp, C., Griffiths, T., and Tenenbaum, J. 2004. Discovering latent classes in relational data. M.I.T. AI memo 2004-19.Google ScholarGoogle Scholar
  40. Kemp, C., Tenenbaum, J., Griffiths, T., Yamada, T., and Ueda, N. 2006. Learning systems of concepts with an infinite relational model. Proc. Natl. Conf. AI. AAAI, 381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kumar, R., Novak, J., and Tomkins, A. 2006. Structure and evolution of online social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Lahiri, M. and Berger-Wolf, T. 2008. Mining periodic behavior in dynamic social networks. In Proceedings of the International Conference on Data Mining (ICDM). 373--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Lee, D. and Seung, H. 2001. Algorithms for non-negative matrix factorization. In Proceedings of the Conference on Neural Information Processing Systems (NIPS). 556--562.Google ScholarGoogle Scholar
  44. Leskovec, J., Backstrom, L., Kumar, R., and Tomkins, A. 2008. Microscopic evolution of social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Lin, Y.-R., Chi, Y., Zhu, S., Sundaram, H., and Tseng, B. L. 2008. Facenet: A framework for analyzing communities and their evolutions in dynamics networks. In Proceedings of the International World Wide Web Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Lin, Y.-R., Chi, Y., Zhu, S., Sundaram, H., and Tseng, B. L. 2009a. Analyzing communities and their evolutions in dynamics networks. Trans. Knowl. Discov. Data 3, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Lin, Y.-R., Sun, J., Castro, P., Konuru, R., Sundaram, H., and Kelliher, A. 2009b. Metafac: Community discovery via relational hypergraph factorization. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Long, B., Zhang, Z., and Yu, P. 2007. A probabilistic framework for relational clustering. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 470--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Lovasz, L. and Plummer, M. 1986. Matching Theory. North Holland.Google ScholarGoogle Scholar
  50. Millen, D., Feinberg, J., and Kerr, B. 2006. Dogear: Social bookmarking in the enterprise. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 111--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Monge, P. and Contractor, N. 2001. Emergence of communication networks. The New Handbook of Organizational Communication: Advances in Theory, Research, and Methods. F. M. Jablin and L. Putnam Eds. 440--502.Google ScholarGoogle Scholar
  52. Newman, M. and Girvan, M. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69, 2, 26113.Google ScholarGoogle ScholarCross RefCross Ref
  53. Newman, M. and Leicht, E. 2007. Mixture models and exploratory analysis in networks. Proc. Nat. Acad. Sci. 104, 23, 9564.Google ScholarGoogle ScholarCross RefCross Ref
  54. Palla, G., Barabasi, A., and Vicsek, T. 2007. Quantifying social group evolution. Nature 446, 7136, 664--667.Google ScholarGoogle Scholar
  55. Popescul, A., Ungar, L. H., Pennock, D. M., and Lawrence, S. 2001. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). 437--444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Powell, W., Koput, K., and Smith-Doerr, L. 1996. Interorganizational collaboration and the locus of innovation: Networks of learning in biotechnology. Admin. Science Quart. 41, 1.Google ScholarGoogle ScholarCross RefCross Ref
  57. Rugg, R. 1984. Mathematical, algorithmic and data structure issues: Building a hypergraph-based data structure the examples of census geography and the road system. Cartographica: Inter. J. Geographic Inform. Geovisualization 21, 2, 179--187.Google ScholarGoogle ScholarCross RefCross Ref
  58. Sarkar, P. and Moore, A. 2005. Dynamic social network analysis using latent space models. SIGKDD Explorations Newsl. 7, 2, 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Scandura, T. A. and Williams, E. A. 2000. Research methodology in management: Current practices, trends, and implications for future research. Academy Manag. J. 43, 6, 1248--1264.Google ScholarGoogle Scholar
  60. Schein, A., Popescul, A., Ungar, L., and Pennock, D. 2001. Generative models for cold-start recommendations. In Proceedings of the SIGIR Workshop on Recommender Systems.Google ScholarGoogle Scholar
  61. Seidman, S. 1981. Structures induced by collections of subsets: A hypergraph approach. Mathem. Social Sci. 1, 4, 381--396.Google ScholarGoogle ScholarCross RefCross Ref
  62. Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 22, 8, 888--905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Simsek, Z. and Veiga, J. F. 2000. The electronic survey technique: An integration and assessment. Organiz. Res. Meth. 3, 1, 93.Google ScholarGoogle ScholarCross RefCross Ref
  64. Singh, A. and Gordon, G. 2008. Relational learning via collective matrix factorization. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., and Schult, R. 2006. Monic: Modeling and monitoring cluster transitions. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 706--711. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Stanton, J. M. and Rogelberg, S. G. 2001. Using internet/intranet web pages to collect organizational research data. Organiz. Res. Meth. 4, 3, 200.Google ScholarGoogle ScholarCross RefCross Ref
  67. Sun, J., Faloutsos, C., Papadimitriou, S., and Yu, P. 2007. Graphscope: Parameter-free mining of large time-evolving graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 687--696. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Tang, L., Liu, H., Zhang, J., and Nazeri, Z. 2008. Community evolution in dynamic multi-mode networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Tantipathananandh, C., Berger-Wolf, T., and Kempe, D. 2007. A framework for community identification in dynamic social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 717--726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Taskar, B., Abbeel, P., and Koller, D. 2002. Discriminative probabilistic models for relational data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). 895--902. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Vavasis, S. 2007. On the complexity of nonnegative matrix factorization. arXiv-0708.4149v2{CS.NA}.Google ScholarGoogle Scholar
  72. Wang, X., Sun, J., Chen, Z., and Zhai, C. 2006. Latent semantic analysis for multiple-type interrelated data objects. In Proceedings of International ACM SIGIR Conference on Research and Development on Information Retrieval. 236--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Wasserman, S. and Faust, K. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press.Google ScholarGoogle Scholar
  74. Yang, T., Chi, Y., Zhu, S., Gong, Y., and Jin, R. 2009. A Bayesian approach toward finding communities and their evolutions in dynamic social networks. In Proceedings of SIAM International Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  75. You, C., Holder, L., and Cook, D. 2009. Learning patterns in the dynamics of biological networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 977--986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Yu, K., Yu, S., and Tresp, V. 2005. Soft clustering on graphs. In Proceedings of the Conference on Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  77. Zhu, S., Yu, K., Chi, Y., and Gong, Y. 2007. Combining content and link for classification using matrix factorization. In Proceedings of the International ACM SIGIR Conference on Research and Development on Information Retrieval. 487--494. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Community Discovery via Metagraph Factorization

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Knowledge Discovery from Data
                  ACM Transactions on Knowledge Discovery from Data  Volume 5, Issue 3
                  August 2011
                  119 pages
                  ISSN:1556-4681
                  EISSN:1556-472X
                  DOI:10.1145/1993077
                  Issue’s Table of Contents

                  Copyright © 2011 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 August 2011
                  • Accepted: 1 January 2011
                  • Revised: 1 October 2010
                  • Received: 1 July 2009
                  Published in tkdd Volume 5, Issue 3

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article
                  • Research
                  • Refereed

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader