skip to main content
10.1145/2492517.2492662acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Detect inflated follower numbers in OSN using star sampling

Authors Info & Claims
Published:25 August 2013Publication History

ABSTRACT

The properties of online social networks (OSNs) are of great interests to the general public as well as IT professionals. Often the raw data are not available and the summary released by the service providers are sketchy. Thus sampling is needed to reveal the hidden properties of the underlying data. While uniform random sampling is often preferred, some properties such as the top bloggers need to be obtained using PPS (probability proportional to size) sampling. Although PPS sampling can be approximated using simple random walk, it is not efficient because only one sample is taken in every step. This paper introduces an efficient sampling method, called star sampling, that takes all the neighbours as valid samples. It is more efficient than random walk sampling by a factor of the average degrees. We derive the estimator and its variance, and verify the result using six large real-networks locally where the ground-truth are known and the estimations can be evaluated.

Then we apply our method on Weibo, the Chinese version of Twitter, whose properties are rarely studied albeit its enormous size and influence. Along with other conventional metrics such as size and degree distributions, we demonstrate that star sampling can identify ten thousand top bloggers efficiently. In general, the estimated follower number is consistent with the claimed number, but there are cases where the follower numbers are inflated by a factor up to 132.

References

  1. Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of topological characteristics of huge online social networking services. In Proceedings of the 16th international conference on World Wide Web, pages 835--844. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439): 509--512, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Bond and et al. A 61-million-person experiment in social influence and political mobilization. Nature, 489(7415): 295--298, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  4. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. Computer networks, 33(1): 309--320, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Catanese, P. De Meo, E. Ferrara, G. Fiumara, and A. Provetti. Crawling facebook for social network analysis purposes. Arxiv preprint arXiv:1105.6307, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K.-w. Fu and M. Chau. Reality check for the chinese microblog space: a random sampling approach. PLOS ONE, 8(3): e58356, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou. A walk in facebook: Uniform sampling of users in online social networks. Arxiv preprint arXiv:0906.0060, 2009.Google ScholarGoogle Scholar
  8. B. Huberman, D. Romero, and F. Wu. Social networks that matter: Twitter under the microscope. 2008.Google ScholarGoogle Scholar
  9. C. Hubler, H. Kriegel, K. Borgwardt, and Z. Ghahramani. Metropolis algorithms for representative subgraph sampling. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on, pages 283--292. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56--65. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Lee, P. Kim, and H. Jeong. Statistical properties of sampled networks. Physical Review E, 73(1): 016102, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Leskovec and C. Faloutsos. Sampling from large graphs. In SIGKDD, pages 631--636. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Lovász. Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty, 2(1): 1--46, 1993.Google ScholarGoogle Scholar
  15. J. Lu and D. Li. Bias correction in small sample from big data. TKDE, IEEE Transactions of Knowledge and Data Engineering, in Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. Equation of state calculations by fast computing machines. The journal of chemical physics, 21: 1087, 1953.Google ScholarGoogle Scholar
  17. A. Mislove, M. Marcon, K. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In SIGCOMM, pages 29--42. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC'07), San Diego, CA, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Montemurro. Beyond the zipf--mandelbrot law in quantitative linguistics. Physica A: Statistical Mechanics and its Applications, 300(3): 567--578, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  20. N. Perlroth. Fake twitter followers become multimillion-dollar business. 2013.Google ScholarGoogle Scholar
  21. M. Stumpf and C. Wiuf. Sampling properties of random graphs: the degree distribution. Physical Review E, 72(3): 036118, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  22. B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN'09), August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Zhou, Y. Li, V. Adhikari, and Z. Zhang. Counting youtube videos via random prefix sampling. In SIGCOMM, pages 371--380. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Detect inflated follower numbers in OSN using star sampling

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
          August 2013
          1558 pages
          ISBN:9781450322409
          DOI:10.1145/2492517

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 August 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate116of549submissions,21%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader