ABSTRACT
The study of collective behavior is to understand how individuals behave in a social network environment. Oceans of data generated by social media like Facebook, Twitter, Flickr and YouTube present opportunities and challenges to studying collective behavior in a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension based approach is adopted to address the heterogeneity of connections presented in social media. However, the networks in social media are normally of colossal size, involving hundreds of thousands or even millions of actors. The scale of networks entails scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the social-dimension based approach can efficiently handle networks of millions of actors while demonstrating comparable prediction performance as other non-scalable methods.
- J. Bentley. Multidimensional binary search trees used for associative searching. Comm. ACM, 1975. Google ScholarDigital Library
- P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In ACM KDD Conference, 1998.Google Scholar
- R.-E. Fan and C.-J. Lin. A study on threshold selection for multi-label classification. 2007.Google Scholar
- A. T. Fiore and J. S. Donath. Homophily in online dating: when do you like someone like yourself? In CHI '05: CHI '05 extended abstracts on Human factors in computing systems, pages 1371--1374, 2005. Google ScholarDigital Library
- L. Getoor and B. Taskar, editors. Introduction to Statistical Relational Learning. The MIT Press, 2007. Google ScholarDigital Library
- M. Hechter. Principles of Group Solidarity. University of California Press, 1988.Google Scholar
- R. Jin, A. Goswami, and G. Agrawal. Fast and exact out-of-core and distributed k-means clustering. Knowl. Inf. Syst., 10(1):17--40, 2006. Google ScholarDigital Library
- T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:881--892, 2002. Google ScholarDigital Library
- Y. Liu, R. Jin, and L. Yang. Semi-supervised multi-label learning by constrained non-negative matrix factorization. In AAAI, 2006. Google ScholarDigital Library
- S. A. Macskassy and F. Provost. A simple relational classifier. In Proceedings of the Multi-Relational Data Mining Workshop (MRDM) at the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.Google ScholarCross Ref
- S. A. Macskassy and F. Provost. Classification in networked data: A toolkit and a univariate case study. J. Mach. Learn. Res., 8:935--983, 2007. Google ScholarDigital Library
- M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27:415--444, 2001.Google ScholarCross Ref
- J. Neville and D. Jensen. Leveraging relational autocorrelation with latent group models. In MRDM '05: Proceedings of the 4th international workshop on Multi-relational mining, pages 49--55, 2005. Google ScholarDigital Library
- M. Newman. Power laws, Pareto distributions and Zipf's law. Contemporary physics, 46(5):323--352, 2005.Google ScholarCross Ref
- M. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 74(3), 2006.Google Scholar
- C. Ordonez. Clustering binary data streams with k-means. In DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pages12--19, 2003. Google ScholarDigital Library
- M. Sato and S. Ishii. On-line em algorithm for the normalized gaussian network. Neural Computation, 1999. Google ScholarDigital Library
- L. Tang and H. Liu. Relational learning via latent social dimensions. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817--826, 2009. Google ScholarDigital Library
- L. Tang, H. Liu, J. Zhang, and Z. Nazeri. Community evolution in dynamic multi-mode networks. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 677--685, 2008. Google ScholarDigital Library
- L. Tang, S. Rajan, and V. K. Narayanan. Large scale multi-label classification via metalabeler. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 211--220, 2009. Google ScholarDigital Library
- Z. Xu, V. Tresp, S. Yu, and K. Yu. Nonparametric relational learning for social network analysis. In KDD'2008 Workshop on Social Network Mining and Analysis, 2008.Google Scholar
- G. L. Zacharias, J. MacMillan, and S. B. V. Hemel, editors. Behavioral Modeling and Simulation: From Individuals to Societies. The National Academies Press, 2008.Google Scholar
- X. Zhu. Semi-supervised learning literature survey. 2006.Google Scholar
- X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.Google ScholarDigital Library
Index Terms
- Scalable learning of collective behavior based on sparse social dimensions
Recommendations
Relational learning via latent social dimensions
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningSocial media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, relational learning has been proposed, and collective inference based on ...
Toward Predicting Collective Behavior via Social Dimension Extraction
The social-dimension-based learning framework (SocioDim) can help predict online behaviors of social media users given a network and the behavior information of some actors in the network.
Relational Learning with Social Status Analysis
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data MiningRelational learning has been proposed to cope with the interdependency among linked instances in social network analysis, which often adopts network connectivity and social media content for prediction. A common assumption in existing relational ...
Comments