Abstract
This work aims at discovering community structure in rich media social networks through analysis of time-varying, multirelational data. Community structure represents the latent social context of user actions. It has important applications such as search and recommendation. The problem is particularly useful in the enterprise domain, where extracting emergent community structure on enterprise social media can help in forming new collaborative teams, in expertise discovery, and in the long term reorganization of enterprises based on collaboration patterns. There are several unique challenges: (a) In social media, the context of user actions is constantly changing and coevolving; hence the social context contains time-evolving multidimensional relations. (b) The social context is determined by the available system features and is unique in each social media platform; hence the analysis of such data needs to flexibly incorporate various system features. In this article we propose MetaFac (MetaGraph Factorization), a framework that extracts community structures from dynamic, multidimensional social contexts and interactions. Our work has three key contributions: (1) metagraph, a novel relational hypergraph representation for modeling multirelational and multidimensional social data; (2) an efficient multirelational factorization method for community extraction on a given metagraph; (3) an online method to handle time-varying relations through incremental metagraph factorization. Extensive experiments on real-world social data collected from an enterprise and the public Digg social media Web site suggest that our technique is scalable and is able to extract meaningful communities from social media contexts. We illustrate the usefulness of our framework through two prediction tasks: (1) in the enterprise dataset, the task is to predict users’ future interests on tag usage, and (2) in the Digg dataset, the task is to predict users’ future interests in voting and commenting on Digg stories. Our prediction significantly outperforms baseline methods (including aspect model and tensor analysis), indicating the promising direction of using metagraphs for handling time-varying social relational contexts.
- Adamic, L. and Adar, E. 2003. Friends and neighbors on the web. Social Networks 25, 3, 211--230.Google ScholarCross Ref
- Aggarwal, C. and Yu, P. 2005. Online analysis of community evolution in data streams. In Proceedings of SIAM Conference on Data Mining (SDM).Google Scholar
- Ahmed, A. and Xing, E. 2008. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: With applications to evolutionary clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM).Google Scholar
- Airoldi, E., Blei, D., Fienberg, S., and Xing, E. 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981--2014. Google ScholarDigital Library
- Asur, S., Parthasarathy, S., and Ucar, D. 2007. An event-based framework for characterizing the evolutionary behavior of interaction graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- Bader, B. and Kolda, T. 2006. Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Trans. Math. Softw. 32, 4, 635--653. Google ScholarDigital Library
- Bader, B., Harshman, R., and Kolda, T. 2006. Temporal analysis of social networks using three-way dedicom. Tech. rep. SAND2006-2161, Sandia National Labs, Albuquerque, NM and Livermore, CA.Google Scholar
- Backstrom, L., Huttenlocher, D., Kleinberg, J., and Lan, X. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). 44--54. Google ScholarDigital Library
- Banerjee, A., Basu, S., and Merugu, S. 2007. Multi-way clustering on relation graphs. In Proceedings of SIAM Conference on Data Mining (SDM).Google Scholar
- Barber, M., Faria, M., Streit, L., and Strogan, O. 2008. Searching for communities in bipartite networks. American Institute of Physics, arXiv:803.2854, 171--182.Google Scholar
- Basu, A. and Blanning, R. 2007. Metagraphs and Their Applications. Springer-Verlag, Berlin. Google ScholarDigital Library
- Bekkerman, R., El-Yaniv, R., and McCallum, A. 2005. Multi-way distributional clustering via pairwise interactions. In Proceedings of the International Conference on Machine Learning (ICML). 41--48. Google ScholarDigital Library
- Berge, C. 1976. Graphs and Hypergraphs. North-Holland. Google ScholarDigital Library
- Berger-Wolf, T. and Saia, J. 2006. A framework for analysis of dynamic social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Borg, I. and Groenen, P. 2005. Modern Multidimensional Scaling: Theory and Applications. Springer.Google Scholar
- Borgatti, S. and Cross, R. 2003. A relational view of information seeking and learning in social networks. Manag. Sci. 49, 4, 432--445. Google ScholarDigital Library
- Carroll, J. and Chang, J. 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition. Psychometrika 35, 3, 283--319.Google ScholarCross Ref
- Catral, M., Han, L., Neumann, M., and Plemmons, R. 2004. On reduced rank nonnegative matrix factorization for symmetric nonnegative matrices. Linear Alg. Appl. 393, 107--126.Google ScholarCross Ref
- Chakrabarti, D., Kumar, R., and Tomkins, A. 2006. Evolutionary clustering. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 554--560. Google ScholarDigital Library
- Chen, P. 1976. The entity-relationship model---Toward a unified view of data. ACM Trans. Datab. Syst. 1, 1. Google ScholarDigital Library
- Chi, Y., Song, X., Zhou, D., Hino, K., and Tseng, B. 2007. Evolutionary spectral clustering by incorporating temporal smoothness. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Chi, Y., Zhu, S., Gong, Y., and Zhang, Y. 2008. Probabilistic polyadic factorization and its application to personalized recommendation. In Proceedings of International Conference on Knowledge Management. Google ScholarDigital Library
- Chung, F. 1997. Spectral Graph Theory. American Mathematical Society.Google Scholar
- Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statist. Society. Series B (Methodological) 39, 1, 1--38.Google ScholarCross Ref
- Ding, C., He, X., and Simon, H. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the SIAM Conference on Data Mining (SDM).Google Scholar
- Falkowski, T., Bartelheimer, J., and Spiliopoulou, M. 2006. Mining and visualizing the evolution of subgroups in social networks. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI). 52--58. Google ScholarDigital Library
- Fortunato, S. 2010. Community detection in graphs. Phys. Rep. 486, 3--5, 75--174.Google ScholarCross Ref
- Friedman, N., Getoor, L., Koller, D., and Pfeffer, A. 1999. Learning probabilistic relational models. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 1300--1309. Google ScholarDigital Library
- Girvan, M. and Newman, M. 2002. Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99, 12, 7821.Google ScholarCross Ref
- Granovetter, M. 1985. Economic action and social structure: A theory of embeddedness. Amer J. Sociol. 91, 3, 481--510.Google ScholarCross Ref
- Grujic, J., Mitrovic, M., and Tadic, B. 2009. Mixing patterns and communities on bipartite graphs on web-based social interactions. In Proceedings of the International Conference on Digital Signal Processing. 1--8. Google ScholarDigital Library
- Harada, K., Ishioka, T., Suzuki, I., and Furukawa, M. 2007. A method for solving a bipartite-graph clustering problem with sequence optimization. In Proceedings of the IEEE International Conference on Computer and Information Technology. 915--920. Google ScholarDigital Library
- Harshman, R. 1970. Foundations of the parafac procedure: Models and conditions for an “Explanatory” Multi-modal factor analysis. UCLA Work. Papers Phonetics 16, 1, 1.Google Scholar
- Hitchcock, F. 1927. The expression of a tensor or a polyadic as a sum of products. J. Math. Phys 6, 1, 164--189.Google ScholarCross Ref
- Hofman, J. and Wiggins, C. 2008. Bayesian approach to network modularity. Phys. Rev. Lett. 100, 25, 258701.Google ScholarCross Ref
- Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceeding of the ACM SIGIR Conference. Google ScholarDigital Library
- Holland, P. and Leinhardt, S. 1981. An exponential family of probability distributions for directed graphs. J. Amer. Statistical Assoc. 76, 373, 33--50.Google Scholar
- Järvelin, K. and Kekäläinen, J. 2000. Ir evaluation methods for retrieving highly relevant documents. In Proceedings of the ACM SIGIR Conference. Google ScholarDigital Library
- Kemp, C., Griffiths, T., and Tenenbaum, J. 2004. Discovering latent classes in relational data. M.I.T. AI memo 2004-19.Google Scholar
- Kemp, C., Tenenbaum, J., Griffiths, T., Yamada, T., and Ueda, N. 2006. Learning systems of concepts with an infinite relational model. Proc. Natl. Conf. AI. AAAI, 381. Google ScholarDigital Library
- Kumar, R., Novak, J., and Tomkins, A. 2006. Structure and evolution of online social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Lahiri, M. and Berger-Wolf, T. 2008. Mining periodic behavior in dynamic social networks. In Proceedings of the International Conference on Data Mining (ICDM). 373--382. Google ScholarDigital Library
- Lee, D. and Seung, H. 2001. Algorithms for non-negative matrix factorization. In Proceedings of the Conference on Neural Information Processing Systems (NIPS). 556--562.Google Scholar
- Leskovec, J., Backstrom, L., Kumar, R., and Tomkins, A. 2008. Microscopic evolution of social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Lin, Y.-R., Chi, Y., Zhu, S., Sundaram, H., and Tseng, B. L. 2008. Facenet: A framework for analyzing communities and their evolutions in dynamics networks. In Proceedings of the International World Wide Web Conference. Google ScholarDigital Library
- Lin, Y.-R., Chi, Y., Zhu, S., Sundaram, H., and Tseng, B. L. 2009a. Analyzing communities and their evolutions in dynamics networks. Trans. Knowl. Discov. Data 3, 2. Google ScholarDigital Library
- Lin, Y.-R., Sun, J., Castro, P., Konuru, R., Sundaram, H., and Kelliher, A. 2009b. Metafac: Community discovery via relational hypergraph factorization. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Long, B., Zhang, Z., and Yu, P. 2007. A probabilistic framework for relational clustering. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 470--479. Google ScholarDigital Library
- Lovasz, L. and Plummer, M. 1986. Matching Theory. North Holland.Google Scholar
- Millen, D., Feinberg, J., and Kerr, B. 2006. Dogear: Social bookmarking in the enterprise. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 111--120. Google ScholarDigital Library
- Monge, P. and Contractor, N. 2001. Emergence of communication networks. The New Handbook of Organizational Communication: Advances in Theory, Research, and Methods. F. M. Jablin and L. Putnam Eds. 440--502.Google Scholar
- Newman, M. and Girvan, M. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69, 2, 26113.Google ScholarCross Ref
- Newman, M. and Leicht, E. 2007. Mixture models and exploratory analysis in networks. Proc. Nat. Acad. Sci. 104, 23, 9564.Google ScholarCross Ref
- Palla, G., Barabasi, A., and Vicsek, T. 2007. Quantifying social group evolution. Nature 446, 7136, 664--667.Google Scholar
- Popescul, A., Ungar, L. H., Pennock, D. M., and Lawrence, S. 2001. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). 437--444. Google ScholarDigital Library
- Powell, W., Koput, K., and Smith-Doerr, L. 1996. Interorganizational collaboration and the locus of innovation: Networks of learning in biotechnology. Admin. Science Quart. 41, 1.Google ScholarCross Ref
- Rugg, R. 1984. Mathematical, algorithmic and data structure issues: Building a hypergraph-based data structure the examples of census geography and the road system. Cartographica: Inter. J. Geographic Inform. Geovisualization 21, 2, 179--187.Google ScholarCross Ref
- Sarkar, P. and Moore, A. 2005. Dynamic social network analysis using latent space models. SIGKDD Explorations Newsl. 7, 2, 31--40. Google ScholarDigital Library
- Scandura, T. A. and Williams, E. A. 2000. Research methodology in management: Current practices, trends, and implications for future research. Academy Manag. J. 43, 6, 1248--1264.Google Scholar
- Schein, A., Popescul, A., Ungar, L., and Pennock, D. 2001. Generative models for cold-start recommendations. In Proceedings of the SIGIR Workshop on Recommender Systems.Google Scholar
- Seidman, S. 1981. Structures induced by collections of subsets: A hypergraph approach. Mathem. Social Sci. 1, 4, 381--396.Google ScholarCross Ref
- Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 22, 8, 888--905. Google ScholarDigital Library
- Simsek, Z. and Veiga, J. F. 2000. The electronic survey technique: An integration and assessment. Organiz. Res. Meth. 3, 1, 93.Google ScholarCross Ref
- Singh, A. and Gordon, G. 2008. Relational learning via collective matrix factorization. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., and Schult, R. 2006. Monic: Modeling and monitoring cluster transitions. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 706--711. Google ScholarDigital Library
- Stanton, J. M. and Rogelberg, S. G. 2001. Using internet/intranet web pages to collect organizational research data. Organiz. Res. Meth. 4, 3, 200.Google ScholarCross Ref
- Sun, J., Faloutsos, C., Papadimitriou, S., and Yu, P. 2007. Graphscope: Parameter-free mining of large time-evolving graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 687--696. Google ScholarDigital Library
- Tang, L., Liu, H., Zhang, J., and Nazeri, Z. 2008. Community evolution in dynamic multi-mode networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Tantipathananandh, C., Berger-Wolf, T., and Kempe, D. 2007. A framework for community identification in dynamic social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 717--726. Google ScholarDigital Library
- Taskar, B., Abbeel, P., and Koller, D. 2002. Discriminative probabilistic models for relational data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). 895--902. Google ScholarDigital Library
- Vavasis, S. 2007. On the complexity of nonnegative matrix factorization. arXiv-0708.4149v2{CS.NA}.Google Scholar
- Wang, X., Sun, J., Chen, Z., and Zhai, C. 2006. Latent semantic analysis for multiple-type interrelated data objects. In Proceedings of International ACM SIGIR Conference on Research and Development on Information Retrieval. 236--243. Google ScholarDigital Library
- Wasserman, S. and Faust, K. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press.Google Scholar
- Yang, T., Chi, Y., Zhu, S., Gong, Y., and Jin, R. 2009. A Bayesian approach toward finding communities and their evolutions in dynamic social networks. In Proceedings of SIAM International Conference on Data Mining (SDM).Google Scholar
- You, C., Holder, L., and Cook, D. 2009. Learning patterns in the dynamics of biological networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 977--986. Google ScholarDigital Library
- Yu, K., Yu, S., and Tresp, V. 2005. Soft clustering on graphs. In Proceedings of the Conference on Neural Information Processing Systems (NIPS).Google Scholar
- Zhu, S., Yu, K., Chi, Y., and Gong, Y. 2007. Combining content and link for classification using matrix factorization. In Proceedings of the International ACM SIGIR Conference on Research and Development on Information Retrieval. 487--494. Google ScholarDigital Library
Index Terms
- Community Discovery via Metagraph Factorization
Recommendations
MetaFac: community discovery via relational hypergraph factorization
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningThis paper aims at discovering community structure in rich media social networks, through analysis of time-varying, multi-relational data. Community structure represents the latent social context of user actions. It has important applications in ...
Extracting community structure through relational hypergraphs
WWW '09: Proceedings of the 18th international conference on World wide webSocial media websites promote diverse user interaction on media objects as well as user actions with respect to other users. The goal of this work is to discover community structure in rich media social networks, and observe how it evolves over time, ...
Community Discovery from Social Media by Low-Rank Matrix Recovery
Special Sections on Diversity and Discovery in Recommender Systems, Online Advertising and Regular PapersThe pervasive usage and reach of social media have attracted a surge of attention in the multimedia research community. Community discovery from social media has therefore become an important yet challenging issue. However, due to the subjective ...
Comments