Detecting cooperative and organized spammer groups in micro-blogging community

Dang, Qi; Zhou, Yadong; Gao, Feng; Sun, Qindong

doi:10.1007/s10618-016-0479-5

Detecting cooperative and organized spammer groups in micro-blogging community

Published: 25 November 2016

Volume 31, pages 573–605, (2017)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Qi Dang¹,
Yadong Zhou¹,
Feng Gao² &
…
Qindong Sun³

986 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

In recent years, social spammers become rampant and evolve a number of variations in most social networks. In micro-blogging community, there are a typical type of anomalous groups consisting of cooperative and organized spammers, and they are hired by public relation companies and paid for posting tweets with certain content. They intentionally evolve their content and behavior patterns to prevent them from being detected, and cooperatively hijack the trending topics with a deliberate point of view which would affect people’s judgments and decisions seriously. Due to the evolving nature and hidden behavior of this type of spammers, we have to deal with two important issues to solve the problem of detecting this type of spammer groups. One is to detect the anomalous topics hijacked by spammer groups from numerous trending topics. Another is to detect the members of spammer group from the users joining anomalous topics. In this paper, we propose a two-stage topology-based method to detect spammer groups partially distributed in multiple trending topics. In the first stage, we detect the anomalous topics from plenty of trending topics according to a new similarity measure based on subgraph ranking. A topic is identified as anomalous if the topology characteristics of retweeting networks between adjacent periods change dramatically. In the second stage, we obtain several anomalous topic sequences through a few initial labeled spammers by employing the basic idea of label propagation, and cluster the users who join each topic sequence into group spammers and normal users by their total authorities. The total authority of user is his/her weighted cumulative authorities in anomalous topics of each topic sequence, and authority in each topic is defined based on the out-degree of user in the retweeting network. The experimental results based on real-world data collected from Sina micro-blogging site demonstrate that our similarity measure keeps a leading performance in all evaluation metrics, and our method can effectively detect the group spammers compared with other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The homophily principle in social network analysis: A survey

Article 18 January 2022

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

Article Open access 13 April 2024

A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to Expertise

Article Open access 15 April 2024

Notes

References

Banerjee AV (1992) A simple model of herd behavior. The Quarterly Journal of Economics, 797–817
Article Google Scholar
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Article MathSciNet Google Scholar
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS), pp 75–83
Bun KK, Ishizuka M (2002) Topic extraction from news archive using tf*pdf algorithm. In: Proceedings of the Third International Conference on Web Information Systems Engineering, (WISE 2002), IEEE, pp 73–82
Bunke H, Dickinson PJ, Kraetzl M, Wallis WD (2007) A graph-theoretic approach to enterprise network dynamics, vol 24. Birkhäuser, New York
MATH Google Scholar
Chen C, Wu K, Srinivasan V, Zhang X (2013) Battling the internet water army: detection of hidden paid posters. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ACM, pp 116–120
Chu Z, Widjaja I, Wang H (2012) Detecting social spam campaigns on twitter. In: International Conference on Applied Cryptography and Network Security, Springer, pp 455–472
Cormack GV, Hidalgo JMG, Sánz EP (2007) Feature engineering for mobile (sms) spam filtering. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 871–872
Dai H, Zhu F, Lim EP, Pang H (2015) Detecting anomaly collections using extreme feature ranks. Data Min Knowl Discov 29(3):689–731
Article MathSciNet Google Scholar
De Choudhury M, Mason WA, Hofman JM, Watts DJ (2010) Inferring relevant social networks from interpersonal communication. In: Proceedings of the 19th international conference on World wide web, ACM, pp 301–310
Fagin R, Kumar R, Sivakumar D (2003) Comparing top k lists. SIAM J Discrete Math 17(1):134–160
Article MathSciNet Google Scholar
Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, ACM, pp 35–47
Ghosh S, Viswanath B, Kooti F, Sharma NK, Korlam G, Benevenuto F, Ganguly N, Gummadi KP (2012) Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st international conference on World Wide Web, ACM, pp 61–70
Grier C, Thomas K, Paxson V, Zhang M (2010) @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM conference on Computer and communications security, ACM, pp 27–37
Hadgu AT, Garimella K, Weber I (2013) Political hashtag hijacking in the us. In: Proceedings of the 22nd international conference on World Wide Web, ACM, pp 55–56
Hayashi K, Maehara T, Toyoda M, Kawarabayashi Ki (2015) Real-time top-r topic detection on twitter with topic hijack filtering. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 417–426
Hooi B, Shah N, Beutel A, Gunneman S, Akoglu L, Kumar M, Makhija D, Faloutsos C (2016) Birdnest: Bayesian inference for ratings-fraud detection. In: Proceedings of the 2016 SIAM International Conference on Data Mining, SIAM, pp 495–503
Hu J, Fang Y, Godavarthy A (2013a) Topical authority propagation on microblogs. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, ACM, pp 1901–1904
Hu X, Tang J, Zhang Y, Liu H (2013b) Social spammer detection in microblogging. In: 23rd International Joint Conference on Artificial Intelligence, IJCAI 2013, AAAI Press, pp 2633–2639
Hu X, Tang J, Gao H, Liu H (2014a) Social spammer detection with sentiment information. In: 2014 IEEE International Conference on Data Mining, IEEE, pp 180–189
Hu X, Tang J, Liu H (2014b) Leveraging knowledge across media for spammer detection in microblogging. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, ACM, pp 547–556
Hu X, Tang J, Liu H (2014c) Online social spammer detection. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI Press, pp 59–65
Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Catchsync: catching synchronized behavior in large directed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 941–950
Jiang M, Beutel A, Cui P, Hooi B, Yang S, Faloutsos C (2015) A general suspiciousness metric for dense blocks in multimodal data. In: 2015 IEEE International Conference on Data Mining (ICDM), IEEE, pp 781–786
Jiang M, Cui P, Faloutsos C (2016) Suspicious behavior detection: current trends and future directions. IEEE Intell Syst 31(1):31–39
Article Google Scholar
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, ACM, pp 219–230
John JP, Moshchuk A, Gribble SD, Krishnamurthy A (2009) Studying spamming botnets using botlab. In: Proceedings of the 6th USENIX symposium on Networked systems design and implementation, USENIX Association, pp 291–306
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632
Article MathSciNet Google Scholar
Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World Wide Web, ACM, pp 591–600
Lee K, Caverlee J, Webb S (2010) Uncovering social spammers: social honeypots+ machine learning. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 435–442
Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ACM, pp 939–948
Lloyd SP (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Article MathSciNet Google Scholar
Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Book Google Scholar
Mccord M, Chuah M (2011) Spam detection on twitter using traditional classifiers. In: Autonomic and trusted computing, Springer, pp 175–186
Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web, ACM, pp 191–200
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Tech. rep, Stanford InfoLab
Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30
Article Google Scholar
Rayana S, Akoglu L (2016) Collective opinion spam detection using active inference. In: Proceedings of the 2016 SIAM International Conference on Data Mining, SIAM, pp 630–638
Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACM, pp 1–9
Sun X, Zhou Y, Guan X, Zhang B (2014) Modeling and reproducing retweeting dynamics in micro-blogging social networks. In: 2014 11th World Congress on Intelligent Control and Automation, IEEE, pp 3539–3544
VanDam C, Tan PN (2016) Detecting hashtag hijacking from twitter. In: Proceedings of the 8th ACM Conference on Web Science, ACM, pp 370–371
Wang G, Xie S, Liu B, Philip SY (2011) Review graph based online store review spammer detection. In: 2011 IEEE 11th International Conference on Data Mining, IEEE, pp 1242–1247
Xu C, Zhang J, Chang K, Long C (2013) Uncovering collusive spammers in chinese review websites. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, ACM, pp 979–988
Ye J, Akoglu L (2015) Discovering opinion spammer groups by network footprints. In: Machine Learning and Knowledge Discovery in Databases, Springer, pp 267–282
Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Tech. rep., Carnegie Mellon University

Download references

Acknowledgements

The research presented in this paper is supported in part by the National Natural Science Foundation (61572397, 61502383, 61375040, 61571360), Fundamental Research Project of Natural Science in Shaanxi Province (2015JM6298, 2015JM6299) and Specialized Research Plan Funded Project of Shaanxi Province Department of Education (15JK1505).

Author information

Authors and Affiliations

Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an, People’s Republic of China
Qi Dang & Yadong Zhou
Institute of Systems Engineering, Xi’an Jiaotong University, Xi’an, People’s Republic of China
Feng Gao
Xi’an University of Technology, Xi’an, People’s Republic of China
Qindong Sun

Authors

Qi Dang
View author publications
You can also search for this author in PubMed Google Scholar
Yadong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Feng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Qindong Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Gao.

Additional information

Responsible editor: G. Karypis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dang, Q., Zhou, Y., Gao, F. et al. Detecting cooperative and organized spammer groups in micro-blogging community. Data Min Knowl Disc 31, 573–605 (2017). https://doi.org/10.1007/s10618-016-0479-5

Download citation

Received: 27 November 2015
Accepted: 17 September 2016
Published: 25 November 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s10618-016-0479-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting cooperative and organized spammer groups in micro-blogging community

Abstract

Access this article

Similar content being viewed by others

The homophily principle in social network analysis: A survey

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to Expertise

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting cooperative and organized spammer groups in micro-blogging community

Abstract

Access this article

Similar content being viewed by others

The homophily principle in social network analysis: A survey

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to Expertise

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation