Discover millions of fake followers in Weibo

Zhang, Yi; Lu, Jianguo

doi:10.1007/s13278-016-0324-2

Discover millions of fake followers in Weibo

Original Article
Published: 31 March 2016

Volume 6, article number 16, (2016)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Yi Zhang¹ &
Jianguo Lu¹

1389 Accesses
120 Citations
Explore all metrics

Abstract

Weibo is the Chinese counterpart of Twitter, which has attracted hundreds of millions of users. Just like other Online Social Networks (hereafter OSNs), Weibo has a large number of fake accounts. They are created to sell their following links to customers, who want to boost their follower counts. These bogus accounts are difficult to identify individually, especially when they are created by sophisticated programs or controlled by human beings directly. This paper proposes a novel fake account detection method that is based on the very purpose of the existence of these accounts: they are created to follow their targets en masse, resulting in high-overlapping between the follower lists of their customers. This paper investigates the top Weibo accounts whose follower lists duplicate or nearly duplicate each other (hereafter called near-duplicates). Discovering near-duplicates is a challenging task. The network is large; the data in its entirety are not available; the pair-wise comparison is very expensive. We developed a sampling-based approach to discover all the near-duplicates of the top accounts, who have at least 50,000 followers. In the experiment, we found 395 near-duplicates, which leads us to 11.90 million fake accounts (4.56 % of total users) who send 741.10 million links (9.50 % of the entire edges). Furthermore, we characterize four typical structures of the spammers, cluster these spammers into 34 groups, and analyze the properties of each group.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SybilRadar: A Graph-Structure Based Framework for Sybil Detection in On-line Social Networks

Multi-attribute identity resolution for online social network

Article 21 November 2019

Shalini Yadav, Adwitiya Sinha & Pawan Kumar

Identification of Fake Users on Social Networks and Detection of Spammers

Notes

http://jlu.myweb.cs.uwindsor.ca/spammer/view_node-1787709495.

References

Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, page 12
Chen C, Wu K, Srinivasan V, Zhang V (2013) Battling the internet water army: detection of hidden paid posters. In: The 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Chu Z et al (2012) Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans Depend Secure Comput 9(6):811–824
Article Google Scholar
Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703
Article MathSciNet MATH Google Scholar
Dasgupta A, Kumar R, Sarlos T (2014) On estimating the average degree. In: Proceedings of the 23rd international conference on World wide web. International World Wide Web Conferences Steering Committee
Ghosh S, Viswanath B, Kooti F, Sharma NK, Korlam G, Benevenuto F, Ganguly N, Gummadi KP (2012) Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st international conference on World Wide Web, pp 61–70. ACM
Giles J (2011) Social-bots infiltrate twitter and trick human users. New Sci 209(2804):28
Article Google Scholar
Gjoka M, Kurant M, Butts C, Markopoulou A (2009) A walk in facebook: uniform sampling of users in online social networks. arXiv:0906.0060
Henzinger M (2006) Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR. ACM
Hu X, Tang J, Zhang Y, Liu H (2013) Social spammer detection in microblogging. In: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pp 2633–2639. AAAI Press
Jacomy M, Venturini T, Heymann S, Bastian M (2014) Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLoS One, 9(6):1–12
Katzir L, Liberty E, Somekh O (2011) Estimating sizes of social networks via biased sampling. In WWW, pp 597–606. ACM
Lee S-M, Chao A (1994) Estimating population size via sample coverage for closed capture-recapture models. Biometrics 50(1):88–97
Article MATH Google Scholar
Lin C, He J, Zhou J, Yang X, Chen K, Song L (2013) Analysis and identification of spamming behaviors in sina weibo microblog. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, ACM
Lu J, Li D (2013) Bias correction in small sample from big data. TKDE, IEEE Trans Knowledge Data Eng 25(11):2658–2663
Article Google Scholar
Manku GS, Jain A, Das Sarma A (2007) Detecting near-duplicates for web crawling. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp 141–150, New York. ACM
Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge England
Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Information Sci 260:64–73
Article Google Scholar
Myers SA, Sharma A, Gupta P, Lin J (2014) Information network or social network?: The structure of the twitter follow graph. In 23rd International World Wide Web Conference, WWW ’14, Seoul, Republic of Korea, Companion Volume, pp 493–498. International World Wide Web Conferences Steering Committee
Newman M (2010) Networks: an introduction. Oxford University Press Inc, Oxford England
Perlroth N (2013) Fake twitter followers become multimillion-dollar business. NewYork Times
Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference on - ACSAC ’10, p 1, New York. ACM Press
Tao K, Abel F, Hauff C, Houben GJ, Gadiraju U (2013) Groundhog day: near-duplicate detection on twitter. In: Proceedings of the 22nd international conference on World Wide Web, pp 1273–1284. International World Wide Web Conferences Steering Committee
Thomas K, Grier C, Song D, Paxson V (2011) Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM
Wang A (2009) Don’t follow me: spam detection in twitter. In: International Conference on Security and Cryptography (SECRYPT)
Wang H, Lu J (2013) Detect inflated follower numbers in osn using star sampling. The IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp 127–133
Wu B, Davison BD (2005) Identifying link farm spam pages. In: Proceedings of the 14th International World Wide Web Conference, pp 820–829. ACM Press
Zhang Q, Ma H, Qian W, Zhou A (2013) Duplicate detection for identifying social spam in microblogs. In: Big Data (BigData Congress), 2013 IEEE International Congress on, pp 141–148. IEEE

Download references

Acknowledgments

This work is supported by NSERC Discovery grant. We would like to thank Hao Wang for collecting the uniform random sample of Weibo that is used in this paper, and for his participation in the calculation of Jaccard similarity on this data.

Author information

Authors and Affiliations

School of Computer Science, University of Windsor, 401 Sunset Avenue, Windsor, Canada
Yi Zhang & Jianguo Lu

Authors

Yi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Lu, J. Discover millions of fake followers in Weibo. Soc. Netw. Anal. Min. 6, 16 (2016). https://doi.org/10.1007/s13278-016-0324-2

Download citation

Received: 13 March 2015
Revised: 03 March 2016
Accepted: 05 March 2016
Published: 31 March 2016
DOI: https://doi.org/10.1007/s13278-016-0324-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover millions of fake followers in Weibo

Abstract

Access this article

Similar content being viewed by others

SybilRadar: A Graph-Structure Based Framework for Sybil Detection in On-line Social Networks

Multi-attribute identity resolution for online social network

Identification of Fake Users on Social Networks and Detection of Spammers

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discover millions of fake followers in Weibo

Abstract

Access this article

Similar content being viewed by others

SybilRadar: A Graph-Structure Based Framework for Sybil Detection in On-line Social Networks

Multi-attribute identity resolution for online social network

Identification of Fake Users on Social Networks and Detection of Spammers

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation