Abstract
In recent years, social spammers become rampant and evolve a number of variations in most social networks. In micro-blogging community, there are a typical type of anomalous groups consisting of cooperative and organized spammers, and they are hired by public relation companies and paid for posting tweets with certain content. They intentionally evolve their content and behavior patterns to prevent them from being detected, and cooperatively hijack the trending topics with a deliberate point of view which would affect people’s judgments and decisions seriously. Due to the evolving nature and hidden behavior of this type of spammers, we have to deal with two important issues to solve the problem of detecting this type of spammer groups. One is to detect the anomalous topics hijacked by spammer groups from numerous trending topics. Another is to detect the members of spammer group from the users joining anomalous topics. In this paper, we propose a two-stage topology-based method to detect spammer groups partially distributed in multiple trending topics. In the first stage, we detect the anomalous topics from plenty of trending topics according to a new similarity measure based on subgraph ranking. A topic is identified as anomalous if the topology characteristics of retweeting networks between adjacent periods change dramatically. In the second stage, we obtain several anomalous topic sequences through a few initial labeled spammers by employing the basic idea of label propagation, and cluster the users who join each topic sequence into group spammers and normal users by their total authorities. The total authority of user is his/her weighted cumulative authorities in anomalous topics of each topic sequence, and authority in each topic is defined based on the out-degree of user in the retweeting network. The experimental results based on real-world data collected from Sina micro-blogging site demonstrate that our similarity measure keeps a leading performance in all evaluation metrics, and our method can effectively detect the group spammers compared with other methods.
Similar content being viewed by others
References
Banerjee AV (1992) A simple model of herd behavior. The Quarterly Journal of Economics, 797–817
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS), pp 75–83
Bun KK, Ishizuka M (2002) Topic extraction from news archive using tf*pdf algorithm. In: Proceedings of the Third International Conference on Web Information Systems Engineering, (WISE 2002), IEEE, pp 73–82
Bunke H, Dickinson PJ, Kraetzl M, Wallis WD (2007) A graph-theoretic approach to enterprise network dynamics, vol 24. Birkhäuser, New York
Chen C, Wu K, Srinivasan V, Zhang X (2013) Battling the internet water army: detection of hidden paid posters. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ACM, pp 116–120
Chu Z, Widjaja I, Wang H (2012) Detecting social spam campaigns on twitter. In: International Conference on Applied Cryptography and Network Security, Springer, pp 455–472
Cormack GV, Hidalgo JMG, Sánz EP (2007) Feature engineering for mobile (sms) spam filtering. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 871–872
Dai H, Zhu F, Lim EP, Pang H (2015) Detecting anomaly collections using extreme feature ranks. Data Min Knowl Discov 29(3):689–731
De Choudhury M, Mason WA, Hofman JM, Watts DJ (2010) Inferring relevant social networks from interpersonal communication. In: Proceedings of the 19th international conference on World wide web, ACM, pp 301–310
Fagin R, Kumar R, Sivakumar D (2003) Comparing top k lists. SIAM J Discrete Math 17(1):134–160
Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, ACM, pp 35–47
Ghosh S, Viswanath B, Kooti F, Sharma NK, Korlam G, Benevenuto F, Ganguly N, Gummadi KP (2012) Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st international conference on World Wide Web, ACM, pp 61–70
Grier C, Thomas K, Paxson V, Zhang M (2010) @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM conference on Computer and communications security, ACM, pp 27–37
Hadgu AT, Garimella K, Weber I (2013) Political hashtag hijacking in the us. In: Proceedings of the 22nd international conference on World Wide Web, ACM, pp 55–56
Hayashi K, Maehara T, Toyoda M, Kawarabayashi Ki (2015) Real-time top-r topic detection on twitter with topic hijack filtering. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 417–426
Hooi B, Shah N, Beutel A, Gunneman S, Akoglu L, Kumar M, Makhija D, Faloutsos C (2016) Birdnest: Bayesian inference for ratings-fraud detection. In: Proceedings of the 2016 SIAM International Conference on Data Mining, SIAM, pp 495–503
Hu J, Fang Y, Godavarthy A (2013a) Topical authority propagation on microblogs. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, ACM, pp 1901–1904
Hu X, Tang J, Zhang Y, Liu H (2013b) Social spammer detection in microblogging. In: 23rd International Joint Conference on Artificial Intelligence, IJCAI 2013, AAAI Press, pp 2633–2639
Hu X, Tang J, Gao H, Liu H (2014a) Social spammer detection with sentiment information. In: 2014 IEEE International Conference on Data Mining, IEEE, pp 180–189
Hu X, Tang J, Liu H (2014b) Leveraging knowledge across media for spammer detection in microblogging. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, ACM, pp 547–556
Hu X, Tang J, Liu H (2014c) Online social spammer detection. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI Press, pp 59–65
Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Catchsync: catching synchronized behavior in large directed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 941–950
Jiang M, Beutel A, Cui P, Hooi B, Yang S, Faloutsos C (2015) A general suspiciousness metric for dense blocks in multimodal data. In: 2015 IEEE International Conference on Data Mining (ICDM), IEEE, pp 781–786
Jiang M, Cui P, Faloutsos C (2016) Suspicious behavior detection: current trends and future directions. IEEE Intell Syst 31(1):31–39
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, ACM, pp 219–230
John JP, Moshchuk A, Gribble SD, Krishnamurthy A (2009) Studying spamming botnets using botlab. In: Proceedings of the 6th USENIX symposium on Networked systems design and implementation, USENIX Association, pp 291–306
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632
Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World Wide Web, ACM, pp 591–600
Lee K, Caverlee J, Webb S (2010) Uncovering social spammers: social honeypots+ machine learning. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 435–442
Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ACM, pp 939–948
Lloyd SP (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Mccord M, Chuah M (2011) Spam detection on twitter using traditional classifiers. In: Autonomic and trusted computing, Springer, pp 175–186
Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web, ACM, pp 191–200
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Tech. rep, Stanford InfoLab
Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30
Rayana S, Akoglu L (2016) Collective opinion spam detection using active inference. In: Proceedings of the 2016 SIAM International Conference on Data Mining, SIAM, pp 630–638
Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACM, pp 1–9
Sun X, Zhou Y, Guan X, Zhang B (2014) Modeling and reproducing retweeting dynamics in micro-blogging social networks. In: 2014 11th World Congress on Intelligent Control and Automation, IEEE, pp 3539–3544
VanDam C, Tan PN (2016) Detecting hashtag hijacking from twitter. In: Proceedings of the 8th ACM Conference on Web Science, ACM, pp 370–371
Wang G, Xie S, Liu B, Philip SY (2011) Review graph based online store review spammer detection. In: 2011 IEEE 11th International Conference on Data Mining, IEEE, pp 1242–1247
Xu C, Zhang J, Chang K, Long C (2013) Uncovering collusive spammers in chinese review websites. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, ACM, pp 979–988
Ye J, Akoglu L (2015) Discovering opinion spammer groups by network footprints. In: Machine Learning and Knowledge Discovery in Databases, Springer, pp 267–282
Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Tech. rep., Carnegie Mellon University
Acknowledgements
The research presented in this paper is supported in part by the National Natural Science Foundation (61572397, 61502383, 61375040, 61571360), Fundamental Research Project of Natural Science in Shaanxi Province (2015JM6298, 2015JM6299) and Specialized Research Plan Funded Project of Shaanxi Province Department of Education (15JK1505).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: G. Karypis.
Rights and permissions
About this article
Cite this article
Dang, Q., Zhou, Y., Gao, F. et al. Detecting cooperative and organized spammer groups in micro-blogging community. Data Min Knowl Disc 31, 573–605 (2017). https://doi.org/10.1007/s10618-016-0479-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-016-0479-5