research-article

Detect inflated follower numbers in OSN using star sampling

Authors:
Hao Wang

University of Windsor, Windsor, Ontario, Canada

University of Windsor, Windsor, Ontario, Canada
View Profile

,
Jianguo Lu

University of Windsor, Windsor, Ontario, Canada

University of Windsor, Windsor, Ontario, Canada
View Profile

ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningAugust 2013Pages 127–133https://doi.org/10.1145/2492517.2492662

Published:25 August 2013Publication History

ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Pages 127–133

ABSTRACT

The properties of online social networks (OSNs) are of great interests to the general public as well as IT professionals. Often the raw data are not available and the summary released by the service providers are sketchy. Thus sampling is needed to reveal the hidden properties of the underlying data. While uniform random sampling is often preferred, some properties such as the top bloggers need to be obtained using PPS (probability proportional to size) sampling. Although PPS sampling can be approximated using simple random walk, it is not efficient because only one sample is taken in every step. This paper introduces an efficient sampling method, called star sampling, that takes all the neighbours as valid samples. It is more efficient than random walk sampling by a factor of the average degrees. We derive the estimator and its variance, and verify the result using six large real-networks locally where the ground-truth are known and the estimations can be evaluated.

Then we apply our method on Weibo, the Chinese version of Twitter, whose properties are rarely studied albeit its enormous size and influence. Along with other conventional metrics such as size and degree distributions, we demonstrate that star sampling can identify ten thousand top bloggers efficiently. In general, the estimated follower number is consistent with the claimed number, but there are cases where the follower numbers are inflated by a factor up to 132.

References

Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of topological characteristics of huge online social networking services. In Proceedings of the 16th international conference on World Wide Web, pages 835--844. ACM, 2007. Google ScholarDigital Library
A. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439): 509--512, 1999.Google ScholarCross Ref
R. Bond and et al. A 61-million-person experiment in social influence and political mobilization. Nature, 489(7415): 295--298, 2012.Google ScholarCross Ref
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. Computer networks, 33(1): 309--320, 2000. Google ScholarDigital Library
S. Catanese, P. De Meo, E. Ferrara, G. Fiumara, and A. Provetti. Crawling facebook for social network analysis purposes. Arxiv preprint arXiv:1105.6307, 2011. Google ScholarDigital Library
K.-w. Fu and M. Chau. Reality check for the chinese microblog space: a random sampling approach. PLOS ONE, 8(3): e58356, 2013.Google ScholarCross Ref
M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou. A walk in facebook: Uniform sampling of users in online social networks. Arxiv preprint arXiv:0906.0060, 2009.Google Scholar
B. Huberman, D. Romero, and F. Wu. Social networks that matter: Twitter under the microscope. 2008.Google Scholar
C. Hubler, H. Kriegel, K. Borgwardt, and Z. Ghahramani. Metropolis algorithms for representative subgraph sampling. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on, pages 283--292. IEEE, 2008. Google ScholarDigital Library
A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56--65. ACM, 2007. Google ScholarDigital Library
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600. ACM, 2010. Google ScholarDigital Library
S. Lee, P. Kim, and H. Jeong. Statistical properties of sampled networks. Physical Review E, 73(1): 016102, 2006.Google ScholarCross Ref
J. Leskovec and C. Faloutsos. Sampling from large graphs. In SIGKDD, pages 631--636. ACM, 2006. Google ScholarDigital Library
L. Lovász. Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty, 2(1): 1--46, 1993.Google Scholar
J. Lu and D. Li. Bias correction in small sample from big data. TKDE, IEEE Transactions of Knowledge and Data Engineering, in Press, 2013. Google ScholarDigital Library
N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. Equation of state calculations by fast computing machines. The journal of chemical physics, 21: 1087, 1953.Google Scholar
A. Mislove, M. Marcon, K. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In SIGCOMM, pages 29--42. ACM, 2007. Google ScholarDigital Library
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC'07), San Diego, CA, October 2007. Google ScholarDigital Library
M. Montemurro. Beyond the zipf--mandelbrot law in quantitative linguistics. Physica A: Statistical Mechanics and its Applications, 300(3): 567--578, 2001.Google ScholarCross Ref
N. Perlroth. Fake twitter followers become multimillion-dollar business. 2013.Google Scholar
M. Stumpf and C. Wiuf. Sampling properties of random graphs: the degree distribution. Physical Review E, 72(3): 036118, 2005.Google ScholarCross Ref
B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN'09), August 2009. Google ScholarDigital Library
J. Zhou, Y. Li, V. Adhikari, and Z. Zhang. Counting youtube videos via random prefix sampling. In SIGCOMM, pages 371--380. ACM, 2011. Google ScholarDigital Library

Index Terms

Detect inflated follower numbers in OSN using star sampling
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Web-based interaction
2. Information systems
  1. Information retrieval
    1. Document representation
  2. Information systems applications
    1. Data mining

Recommendations

Characterizing Twitter with Respondent-Driven Sampling
DASC '11: Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing

Twitter as one of the most important microblogging online social networks has attracted more than 200 million users in recent years. Although there have been several attempts on characterizing the Twitter by using incomplete sampled data, they have not ...
Read More
Unbiased sampling in directed social graph
SIGCOMM '10

Microblogging services, such as Twitter, are among the most important online social networks(OSNs). Different from OSNs such as Facebook, the topology of microblogging service is a directed graph instead of an undirected graph. Recently, due to the ...
Read More
Unbiased sampling in directed social graph
SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010 conference

Microblogging services, such as Twitter, are among the most important online social networks(OSNs). Different from OSNs such as Facebook, the topology of microblogging service is a directed graph instead of an undirected graph. Recently, due to the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
August 2013
1558 pages
ISBN:9781450322409
DOI:10.1145/2492517
General Chairs:
Jon Rokne
University of Calgary, Calgary, AB, Canada
,
Christos Faloutsos
Carnegie Mellon University, Pittsburgh, PA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph sampling
online social network
sampling
weibo
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate116of549submissions,21%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 124
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.