ABSTRACT
While most research in online social networks (OSNs) in the past has focused on static friendship networks, social network activity graphs are quite important as well. However, characterizing social network activity graphs is computationally intensive; reducing the size of these graphs using sampling algorithms is critical. There are two important requirements---the sampling algorithm must be able to preserve core graph characteristics and be amenable to a streaming implementation since activity graphs are naturally evolving in a streaming fashion. Existing approaches satisfy either one or the other requirement, but not both. In this paper, we propose a novel sampling algorithm called Streaming Time Node Sampling (STNS) that exploits temporal clustering often found in real social networks. Using real communication data collected from Facebook and Twitter, we show that STNS significantly out-performs state-of-the-art sampling mechanisms such as node sampling and Forest Fire sampling, across both averages and distributions of several graph properties.
- Facebook. http://www.facebook.com/.Google Scholar
- Myspace. http://www.myspace.com/.Google Scholar
- Twitter. http://www.twitter.com/.Google Scholar
- D. Achlioptas, A. Clauset, D. Kempe, and C. Moore. On the bias of traceroute sampling: or, power-law degree distributions in regular graphs. In ACM STOC, pages 694--703, 2005. Google ScholarDigital Library
- Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of topological characteristics of huge online social networking services. In WWW, pages 835--844, 2007. Google ScholarDigital Library
- H. Chun, H. Kwak, Y. Eom, Y. Ahn, S. Moon, and H. Jeong. Comparison of online social relations in volume vs interaction: a case study of cyworld. In ACM/USENIX IMC, pages 57--70, 2008. Google ScholarDigital Library
- L. Dall 'Asta, I. Alvarez-Hamelin, A. Barrat, A. Vázquez, and A. Vespignani. Exploring networks with traceroute-like probes: Theory and simulations. Theoretical Computer Science, 355(1):6--24, 2006. Google ScholarDigital Library
- S. Datta and H. Kargupta. Uniform data sampling from a peer-to-peer network. In Proceedings of ICDCS'02, page 50, 2007. Google ScholarDigital Library
- H. Eldardiry and J. Neville. A resampling technique for relational data graphs. In SNA-KDD'08: Proceedings of the second workshop on Social Network Mining and Analysis, 2008.Google Scholar
- Facebook. Chat reaches 1 billion messages sent per day. http://www.facebook.com/note.php?note_id=91351698919, 2009.Google Scholar
- C. Gkantsidis, M. Mihail, and A. Saberi. Random walks in peer-to-peer networks. In IEEE INFOCOM, 2004. Google ScholarDigital Library
- C. Hubler, H.-P. Kriegel, K. M. Borgwardt, and Z. Ghahramani. Metropolis algorithms for representative subgraph sampling. In ICDM, 2008. Google ScholarDigital Library
- I. Kahanda and J. Neville. Using transactional information to predict link strength in online social networks. In AAAI Conference on Weblogs and Social Media, 2009.Google Scholar
- J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarDigital Library
- V. Krishnamurthy, M. Faloutsos, M. Chrobak, J. Cui, L. Lao, and A. Percus. Sampling large Internet topologies for simulation purposes. Computer Networks, 51(15):4284--4302, 2007. Google ScholarDigital Library
- R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In SIGKDD, pages 611--617, 2006. Google ScholarDigital Library
- S. Lee, P. Kim, and H. Jeong. Statistical properties of sampled networks. Physical Review E, 73:016102, 2006.Google ScholarCross Ref
- J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. In SIGKDD, 2008. Google ScholarDigital Library
- J. Leskovec and C. Faloutsos. Sampling from large graphs. In SIGKDD, pages 631--636, 2006. Google ScholarDigital Library
- J. Leskovec and E. Horvitz. Worldwide Buzz: Planetary-Scale Views on an Instant-Messaging Network. In WWW, 2008. Google ScholarDigital Library
- J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In SIGKDD, pages 177--187, 2005. Google ScholarDigital Library
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In ACM/USENIX IMC, 2007. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. 1998.Google Scholar
- M. Stumpf, C. Wiuf, and R. May. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proceedings of the National Academy of Sciences, 102(12):4221--4224, 2005.Google ScholarCross Ref
- D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. On unbiased sampling for unstructured peer-to-peer networks. In IMC, pages 27--40, 2006. Google ScholarDigital Library
- B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In WOSN, August 2009. Google ScholarDigital Library
- C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EuroSys, pages 205--218, 2009. Google ScholarDigital Library
- S. Yoon, S. Lee, S.-H. Yook, and Y. Kim. Statistical properties of sampled networks by random walks. Phys. Rev. E, 75(4):046114, Apr 2007.Google ScholarCross Ref
Index Terms
- Time-based sampling of social network activity graphs
Recommendations
Sampling from large graphs
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningGiven a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.), but several of them become impractical for large graphs. Thus graph ...
Sampling in online social networks
SAC '14: Proceedings of the 29th Annual ACM Symposium on Applied ComputingIn this paper, we propose a new graph sampling method for online social networks that achieves the following. First, a sample graph should reflect the ratio between the number of nodes and the number of edges of the original graph. Second, a sample ...
Albatross sampling: robust and effective hybrid vertex sampling for social graphs
HotPlanet '11: Proceedings of the 3rd ACM international workshop on MobiArchNowadays, Online Social Networks (OSNs) have become dramatically popular and the study of social graphs attracts the interests of a large number of researchers. One critical challenge is the huge size of the social graph, which makes the graph analyzing ...
Comments