Skip to main content
Log in

Discover millions of fake followers in Weibo

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Weibo is the Chinese counterpart of Twitter, which has attracted hundreds of millions of users. Just like other Online Social Networks (hereafter OSNs), Weibo has a large number of fake accounts. They are created to sell their following links to customers, who want to boost their follower counts. These bogus accounts are difficult to identify individually, especially when they are created by sophisticated programs or controlled by human beings directly. This paper proposes a novel fake account detection method that is based on the very purpose of the existence of these accounts: they are created to follow their targets en masse, resulting in high-overlapping between the follower lists of their customers. This paper investigates the top Weibo accounts whose follower lists duplicate or nearly duplicate each other (hereafter called near-duplicates). Discovering near-duplicates is a challenging task. The network is large; the data in its entirety are not available; the pair-wise comparison is very expensive. We developed a sampling-based approach to discover all the near-duplicates of the top accounts, who have at least 50,000 followers. In the experiment, we found 395 near-duplicates, which leads us to 11.90 million fake accounts (4.56 % of total users) who send 741.10 million links (9.50 % of the entire edges). Furthermore, we characterize four typical structures of the spammers, cluster these spammers into 34 groups, and analyze the properties of each group.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://jlu.myweb.cs.uwindsor.ca/spammer/view_node-1787709495.

References

  • Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, page 12

  • Chen C, Wu K, Srinivasan V, Zhang V (2013) Battling the internet water army: detection of hidden paid posters. In: The 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

  • Chu Z et al (2012) Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans Depend Secure Comput 9(6):811–824

    Article  Google Scholar 

  • Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703

    Article  MathSciNet  MATH  Google Scholar 

  • Dasgupta A, Kumar R, Sarlos T (2014) On estimating the average degree. In: Proceedings of the 23rd international conference on World wide web. International World Wide Web Conferences Steering Committee

  • Ghosh S, Viswanath B, Kooti F, Sharma NK, Korlam G, Benevenuto F, Ganguly N, Gummadi KP (2012) Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st international conference on World Wide Web, pp 61–70. ACM

  • Giles J (2011) Social-bots infiltrate twitter and trick human users. New Sci 209(2804):28

    Article  Google Scholar 

  • Gjoka M, Kurant M, Butts C, Markopoulou A (2009) A walk in facebook: uniform sampling of users in online social networks. arXiv:0906.0060

  • Henzinger M (2006) Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR. ACM

  • Hu X, Tang J, Zhang Y, Liu H (2013) Social spammer detection in microblogging. In: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pp 2633–2639. AAAI Press

  • Jacomy M, Venturini T, Heymann S, Bastian M (2014) Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLoS One, 9(6):1–12

  • Katzir L, Liberty E, Somekh O (2011) Estimating sizes of social networks via biased sampling. In WWW, pp 597–606. ACM

  • Lee S-M, Chao A (1994) Estimating population size via sample coverage for closed capture-recapture models. Biometrics 50(1):88–97

    Article  MATH  Google Scholar 

  • Lin C, He J, Zhou J, Yang X, Chen K, Song L (2013) Analysis and identification of spamming behaviors in sina weibo microblog. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, ACM

  • Lu J, Li D (2013) Bias correction in small sample from big data. TKDE, IEEE Trans Knowledge Data Eng 25(11):2658–2663

    Article  Google Scholar 

  • Manku GS, Jain A, Das Sarma A (2007) Detecting near-duplicates for web crawling. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp 141–150, New York. ACM

  • Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge England

  • Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Information Sci 260:64–73

    Article  Google Scholar 

  • Myers SA, Sharma A, Gupta P, Lin J (2014) Information network or social network?: The structure of the twitter follow graph. In 23rd International World Wide Web Conference, WWW ’14, Seoul, Republic of Korea, Companion Volume, pp 493–498. International World Wide Web Conferences Steering Committee

  • Newman M (2010) Networks: an introduction. Oxford University Press Inc, Oxford England

  • Perlroth N (2013) Fake twitter followers become multimillion-dollar business. NewYork Times

  • Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference on - ACSAC ’10, p 1, New York. ACM Press

  • Tao K, Abel F, Hauff C, Houben GJ, Gadiraju U (2013) Groundhog day: near-duplicate detection on twitter. In: Proceedings of the 22nd international conference on World Wide Web, pp 1273–1284. International World Wide Web Conferences Steering Committee

  • Thomas K, Grier C, Song D, Paxson V (2011) Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM

  • Wang A (2009) Don’t follow me: spam detection in twitter. In: International Conference on Security and Cryptography (SECRYPT)

  • Wang H, Lu J (2013) Detect inflated follower numbers in osn using star sampling. The IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp 127–133

  • Wu B, Davison BD (2005) Identifying link farm spam pages. In: Proceedings of the 14th International World Wide Web Conference, pp 820–829. ACM Press

  • Zhang Q, Ma H, Qian W, Zhou A (2013) Duplicate detection for identifying social spam in microblogs. In: Big Data (BigData Congress), 2013 IEEE International Congress on, pp 141–148. IEEE

Download references

Acknowledgments

This work is supported by NSERC Discovery grant. We would like to thank Hao Wang for collecting the uniform random sample of Weibo that is used in this paper, and for his participation in the calculation of Jaccard similarity on this data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Lu, J. Discover millions of fake followers in Weibo. Soc. Netw. Anal. Min. 6, 16 (2016). https://doi.org/10.1007/s13278-016-0324-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-016-0324-2

Keywords

Navigation