Skip to main content
Log in

Detecting cooperative and organized spammer groups in micro-blogging community

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In recent years, social spammers become rampant and evolve a number of variations in most social networks. In micro-blogging community, there are a typical type of anomalous groups consisting of cooperative and organized spammers, and they are hired by public relation companies and paid for posting tweets with certain content. They intentionally evolve their content and behavior patterns to prevent them from being detected, and cooperatively hijack the trending topics with a deliberate point of view which would affect people’s judgments and decisions seriously. Due to the evolving nature and hidden behavior of this type of spammers, we have to deal with two important issues to solve the problem of detecting this type of spammer groups. One is to detect the anomalous topics hijacked by spammer groups from numerous trending topics. Another is to detect the members of spammer group from the users joining anomalous topics. In this paper, we propose a two-stage topology-based method to detect spammer groups partially distributed in multiple trending topics. In the first stage, we detect the anomalous topics from plenty of trending topics according to a new similarity measure based on subgraph ranking. A topic is identified as anomalous if the topology characteristics of retweeting networks between adjacent periods change dramatically. In the second stage, we obtain several anomalous topic sequences through a few initial labeled spammers by employing the basic idea of label propagation, and cluster the users who join each topic sequence into group spammers and normal users by their total authorities. The total authority of user is his/her weighted cumulative authorities in anomalous topics of each topic sequence, and authority in each topic is defined based on the out-degree of user in the retweeting network. The experimental results based on real-world data collected from Sina micro-blogging site demonstrate that our similarity measure keeps a leading performance in all evaluation metrics, and our method can effectively detect the group spammers compared with other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://www.chinadaily.com.cn/china/2010-06/17/content_9981056.htm.

  2. https://www.technologyreview.com/s/426174/undercover-researchers-expose-chinese-internet-water-army/.

  3. http://open.weibo.com/.

  4. http://weibo.com/hottopic.

References

  • Banerjee AV (1992) A simple model of herd behavior. The Quarterly Journal of Economics, 797–817

    Article  Google Scholar 

  • Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  Google Scholar 

  • Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS), pp 75–83

  • Bun KK, Ishizuka M (2002) Topic extraction from news archive using tf*pdf algorithm. In: Proceedings of the Third International Conference on Web Information Systems Engineering, (WISE 2002), IEEE, pp 73–82

  • Bunke H, Dickinson PJ, Kraetzl M, Wallis WD (2007) A graph-theoretic approach to enterprise network dynamics, vol 24. Birkhäuser, New York

    MATH  Google Scholar 

  • Chen C, Wu K, Srinivasan V, Zhang X (2013) Battling the internet water army: detection of hidden paid posters. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ACM, pp 116–120

  • Chu Z, Widjaja I, Wang H (2012) Detecting social spam campaigns on twitter. In: International Conference on Applied Cryptography and Network Security, Springer, pp 455–472

  • Cormack GV, Hidalgo JMG, Sánz EP (2007) Feature engineering for mobile (sms) spam filtering. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 871–872

  • Dai H, Zhu F, Lim EP, Pang H (2015) Detecting anomaly collections using extreme feature ranks. Data Min Knowl Discov 29(3):689–731

    Article  MathSciNet  Google Scholar 

  • De Choudhury M, Mason WA, Hofman JM, Watts DJ (2010) Inferring relevant social networks from interpersonal communication. In: Proceedings of the 19th international conference on World wide web, ACM, pp 301–310

  • Fagin R, Kumar R, Sivakumar D (2003) Comparing top k lists. SIAM J Discrete Math 17(1):134–160

    Article  MathSciNet  Google Scholar 

  • Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, ACM, pp 35–47

  • Ghosh S, Viswanath B, Kooti F, Sharma NK, Korlam G, Benevenuto F, Ganguly N, Gummadi KP (2012) Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st international conference on World Wide Web, ACM, pp 61–70

  • Grier C, Thomas K, Paxson V, Zhang M (2010) @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM conference on Computer and communications security, ACM, pp 27–37

  • Hadgu AT, Garimella K, Weber I (2013) Political hashtag hijacking in the us. In: Proceedings of the 22nd international conference on World Wide Web, ACM, pp 55–56

  • Hayashi K, Maehara T, Toyoda M, Kawarabayashi Ki (2015) Real-time top-r topic detection on twitter with topic hijack filtering. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 417–426

  • Hooi B, Shah N, Beutel A, Gunneman S, Akoglu L, Kumar M, Makhija D, Faloutsos C (2016) Birdnest: Bayesian inference for ratings-fraud detection. In: Proceedings of the 2016 SIAM International Conference on Data Mining, SIAM, pp 495–503

  • Hu J, Fang Y, Godavarthy A (2013a) Topical authority propagation on microblogs. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, ACM, pp 1901–1904

  • Hu X, Tang J, Zhang Y, Liu H (2013b) Social spammer detection in microblogging. In: 23rd International Joint Conference on Artificial Intelligence, IJCAI 2013, AAAI Press, pp 2633–2639

  • Hu X, Tang J, Gao H, Liu H (2014a) Social spammer detection with sentiment information. In: 2014 IEEE International Conference on Data Mining, IEEE, pp 180–189

  • Hu X, Tang J, Liu H (2014b) Leveraging knowledge across media for spammer detection in microblogging. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, ACM, pp 547–556

  • Hu X, Tang J, Liu H (2014c) Online social spammer detection. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI Press, pp 59–65

  • Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Catchsync: catching synchronized behavior in large directed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 941–950

  • Jiang M, Beutel A, Cui P, Hooi B, Yang S, Faloutsos C (2015) A general suspiciousness metric for dense blocks in multimodal data. In: 2015 IEEE International Conference on Data Mining (ICDM), IEEE, pp 781–786

  • Jiang M, Cui P, Faloutsos C (2016) Suspicious behavior detection: current trends and future directions. IEEE Intell Syst 31(1):31–39

    Article  Google Scholar 

  • Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, ACM, pp 219–230

  • John JP, Moshchuk A, Gribble SD, Krishnamurthy A (2009) Studying spamming botnets using botlab. In: Proceedings of the 6th USENIX symposium on Networked systems design and implementation, USENIX Association, pp 291–306

  • Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632

    Article  MathSciNet  Google Scholar 

  • Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World Wide Web, ACM, pp 591–600

  • Lee K, Caverlee J, Webb S (2010) Uncovering social spammers: social honeypots+ machine learning. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 435–442

  • Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ACM, pp 939–948

  • Lloyd SP (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137

    Article  MathSciNet  Google Scholar 

  • Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Mccord M, Chuah M (2011) Spam detection on twitter using traditional classifiers. In: Autonomic and trusted computing, Springer, pp 175–186

  • Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web, ACM, pp 191–200

  • Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Tech. rep, Stanford InfoLab

  • Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30

    Article  Google Scholar 

  • Rayana S, Akoglu L (2016) Collective opinion spam detection using active inference. In: Proceedings of the 2016 SIAM International Conference on Data Mining, SIAM, pp 630–638

  • Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACM, pp 1–9

  • Sun X, Zhou Y, Guan X, Zhang B (2014) Modeling and reproducing retweeting dynamics in micro-blogging social networks. In: 2014 11th World Congress on Intelligent Control and Automation, IEEE, pp 3539–3544

  • VanDam C, Tan PN (2016) Detecting hashtag hijacking from twitter. In: Proceedings of the 8th ACM Conference on Web Science, ACM, pp 370–371

  • Wang G, Xie S, Liu B, Philip SY (2011) Review graph based online store review spammer detection. In: 2011 IEEE 11th International Conference on Data Mining, IEEE, pp 1242–1247

  • Xu C, Zhang J, Chang K, Long C (2013) Uncovering collusive spammers in chinese review websites. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, ACM, pp 979–988

  • Ye J, Akoglu L (2015) Discovering opinion spammer groups by network footprints. In: Machine Learning and Knowledge Discovery in Databases, Springer, pp 267–282

  • Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Tech. rep., Carnegie Mellon University

Download references

Acknowledgements

The research presented in this paper is supported in part by the National Natural Science Foundation (61572397, 61502383, 61375040, 61571360), Fundamental Research Project of Natural Science in Shaanxi Province (2015JM6298, 2015JM6299) and Specialized Research Plan Funded Project of Shaanxi Province Department of Education (15JK1505).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Gao.

Additional information

Responsible editor: G. Karypis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dang, Q., Zhou, Y., Gao, F. et al. Detecting cooperative and organized spammer groups in micro-blogging community. Data Min Knowl Disc 31, 573–605 (2017). https://doi.org/10.1007/s10618-016-0479-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-016-0479-5

Keywords

Navigation