research-article

Scalable learning of collective behavior based on sparse social dimensions

Authors:
Lei Tang

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

,
Huan Liu

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementNovember 2009Pages 1107–1116https://doi.org/10.1145/1645953.1646094

Published:02 November 2009Publication History

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Pages 1107–1116

ABSTRACT

The study of collective behavior is to understand how individuals behave in a social network environment. Oceans of data generated by social media like Facebook, Twitter, Flickr and YouTube present opportunities and challenges to studying collective behavior in a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension based approach is adopted to address the heterogeneity of connections presented in social media. However, the networks in social media are normally of colossal size, involving hundreds of thousands or even millions of actors. The scale of networks entails scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the social-dimension based approach can efficiently handle networks of millions of actors while demonstrating comparable prediction performance as other non-scalable methods.

References

J. Bentley. Multidimensional binary search trees used for associative searching. Comm. ACM, 1975. Google ScholarDigital Library
P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In ACM KDD Conference, 1998.Google Scholar
R.-E. Fan and C.-J. Lin. A study on threshold selection for multi-label classification. 2007.Google Scholar
A. T. Fiore and J. S. Donath. Homophily in online dating: when do you like someone like yourself? In CHI '05: CHI '05 extended abstracts on Human factors in computing systems, pages 1371--1374, 2005. Google ScholarDigital Library
L. Getoor and B. Taskar, editors. Introduction to Statistical Relational Learning. The MIT Press, 2007. Google ScholarDigital Library
M. Hechter. Principles of Group Solidarity. University of California Press, 1988.Google Scholar
R. Jin, A. Goswami, and G. Agrawal. Fast and exact out-of-core and distributed k-means clustering. Knowl. Inf. Syst., 10(1):17--40, 2006. Google ScholarDigital Library
T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:881--892, 2002. Google ScholarDigital Library
Y. Liu, R. Jin, and L. Yang. Semi-supervised multi-label learning by constrained non-negative matrix factorization. In AAAI, 2006. Google ScholarDigital Library
S. A. Macskassy and F. Provost. A simple relational classifier. In Proceedings of the Multi-Relational Data Mining Workshop (MRDM) at the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.Google ScholarCross Ref
S. A. Macskassy and F. Provost. Classification in networked data: A toolkit and a univariate case study. J. Mach. Learn. Res., 8:935--983, 2007. Google ScholarDigital Library
M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27:415--444, 2001.Google ScholarCross Ref
J. Neville and D. Jensen. Leveraging relational autocorrelation with latent group models. In MRDM '05: Proceedings of the 4th international workshop on Multi-relational mining, pages 49--55, 2005. Google ScholarDigital Library
M. Newman. Power laws, Pareto distributions and Zipf's law. Contemporary physics, 46(5):323--352, 2005.Google ScholarCross Ref
M. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 74(3), 2006.Google Scholar
C. Ordonez. Clustering binary data streams with k-means. In DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pages12--19, 2003. Google ScholarDigital Library
M. Sato and S. Ishii. On-line em algorithm for the normalized gaussian network. Neural Computation, 1999. Google ScholarDigital Library
L. Tang and H. Liu. Relational learning via latent social dimensions. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817--826, 2009. Google ScholarDigital Library
L. Tang, H. Liu, J. Zhang, and Z. Nazeri. Community evolution in dynamic multi-mode networks. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 677--685, 2008. Google ScholarDigital Library
L. Tang, S. Rajan, and V. K. Narayanan. Large scale multi-label classification via metalabeler. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 211--220, 2009. Google ScholarDigital Library
Z. Xu, V. Tresp, S. Yu, and K. Yu. Nonparametric relational learning for social network analysis. In KDD'2008 Workshop on Social Network Mining and Analysis, 2008.Google Scholar
G. L. Zacharias, J. MacMillan, and S. B. V. Hemel, editors. Behavioral Modeling and Simulation: From Individuals to Societies. The National Academies Press, 2008.Google Scholar
X. Zhu. Semi-supervised learning literature survey. 2006.Google Scholar
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.Google ScholarDigital Library

Index Terms

Scalable learning of collective behavior based on sparse social dimensions
1. Applied computing
  1. Law, social and behavioral sciences
    1. Sociology
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Relational learning via latent social dimensions
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Social media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, relational learning has been proposed, and collective inference based on ...
Read More
Toward Predicting Collective Behavior via Social Dimension Extraction

The social-dimension-based learning framework (SocioDim) can help predict online behaviors of social media users given a network and the behavior information of some actors in the network.

Read More
Relational Learning with Social Status Analysis
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

Relational learning has been proposed to cope with the interdependency among linked instances in social network analysis, which often adopts network connectivity and social media content for prediction. A common assumption in existing relational ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
November 2009
2162 pages
ISBN:9781605585123
DOI:10.1145/1645953
General Chairs:
David Cheung
University of Hong Kong, Hong Kong
,
Il-Yeol Song
Drexel University, USA
,
Program Chairs:
Wesley Chu
UCLA, USA
,
Xiaohua Hu
Drexel University, USA
,
Jimmy Lin
University of Maryland, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
behavior prediction
edge-centric clustering
relational learning
social dimensions
social media
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 154
  Total Citations
  View Citations
- 1,140
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable learning of collective behavior based on sparse social dimensions

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Relational learning via latent social dimensions

Toward Predicting Collective Behavior via Social Dimension Extraction

Relational Learning with Social Status Analysis