skip to main content
10.1145/1645953.1646094acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Scalable learning of collective behavior based on sparse social dimensions

Published:02 November 2009Publication History

ABSTRACT

The study of collective behavior is to understand how individuals behave in a social network environment. Oceans of data generated by social media like Facebook, Twitter, Flickr and YouTube present opportunities and challenges to studying collective behavior in a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension based approach is adopted to address the heterogeneity of connections presented in social media. However, the networks in social media are normally of colossal size, involving hundreds of thousands or even millions of actors. The scale of networks entails scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the social-dimension based approach can efficiently handle networks of millions of actors while demonstrating comparable prediction performance as other non-scalable methods.

References

  1. J. Bentley. Multidimensional binary search trees used for associative searching. Comm. ACM, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In ACM KDD Conference, 1998.Google ScholarGoogle Scholar
  3. R.-E. Fan and C.-J. Lin. A study on threshold selection for multi-label classification. 2007.Google ScholarGoogle Scholar
  4. A. T. Fiore and J. S. Donath. Homophily in online dating: when do you like someone like yourself? In CHI '05: CHI '05 extended abstracts on Human factors in computing systems, pages 1371--1374, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Getoor and B. Taskar, editors. Introduction to Statistical Relational Learning. The MIT Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Hechter. Principles of Group Solidarity. University of California Press, 1988.Google ScholarGoogle Scholar
  7. R. Jin, A. Goswami, and G. Agrawal. Fast and exact out-of-core and distributed k-means clustering. Knowl. Inf. Syst., 10(1):17--40, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:881--892, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Liu, R. Jin, and L. Yang. Semi-supervised multi-label learning by constrained non-negative matrix factorization. In AAAI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. A. Macskassy and F. Provost. A simple relational classifier. In Proceedings of the Multi-Relational Data Mining Workshop (MRDM) at the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  11. S. A. Macskassy and F. Provost. Classification in networked data: A toolkit and a univariate case study. J. Mach. Learn. Res., 8:935--983, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27:415--444, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Neville and D. Jensen. Leveraging relational autocorrelation with latent group models. In MRDM '05: Proceedings of the 4th international workshop on Multi-relational mining, pages 49--55, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Newman. Power laws, Pareto distributions and Zipf's law. Contemporary physics, 46(5):323--352, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 74(3), 2006.Google ScholarGoogle Scholar
  16. C. Ordonez. Clustering binary data streams with k-means. In DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pages12--19, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Sato and S. Ishii. On-line em algorithm for the normalized gaussian network. Neural Computation, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Tang and H. Liu. Relational learning via latent social dimensions. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817--826, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Tang, H. Liu, J. Zhang, and Z. Nazeri. Community evolution in dynamic multi-mode networks. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 677--685, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Tang, S. Rajan, and V. K. Narayanan. Large scale multi-label classification via metalabeler. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 211--220, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Z. Xu, V. Tresp, S. Yu, and K. Yu. Nonparametric relational learning for social network analysis. In KDD'2008 Workshop on Social Network Mining and Analysis, 2008.Google ScholarGoogle Scholar
  22. G. L. Zacharias, J. MacMillan, and S. B. V. Hemel, editors. Behavioral Modeling and Simulation: From Individuals to Societies. The National Academies Press, 2008.Google ScholarGoogle Scholar
  23. X. Zhu. Semi-supervised learning literature survey. 2006.Google ScholarGoogle Scholar
  24. X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable learning of collective behavior based on sparse social dimensions

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
        November 2009
        2162 pages
        ISBN:9781605585123
        DOI:10.1145/1645953

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 November 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader