Abstract
The rumors, advertisements and malicious links are spread in social networks by social spammers, which affect users’ normal access to social networks and cause security problems. Most methods aim to detect social spammers by various features, such as content features, behavior features and relationship graph features, which rely on a large-scale labeled data. However, labeled data are lacking for training in real world, and manual annotating is time-consuming and labor-intensive. To solve this problem, we propose a novel method which combines active learning algorithm with co-training algorithm to make full use of unlabeled data. In co-training, user features are divided into two views without overlap. Classifiers are trained iteratively with labeled instances and the most confident unlabeled instances with pseudo-labels. In active learning, the most representative and uncertain instances are selected and annotated with real labels to extend labeled dataset. Experimental results on the Twitter and Apontador datasets show that our method can effectively detect social spammers in the case of limited labeled data.


















Similar content being viewed by others
References
Adewole KS, Anuar NB, Kamsin A, Varathan KD, Razak SA (2017) Malicious accounts: dark of the social networks. J Netw Comput Appl 79:41–67
Can U, Alatas B (2019) A new direction in social network analysis: online social network analysis problems and applications. Physica A 535:122372
Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp 35–47
Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104
Xiaotao C, Caixia L, Shuxin L (2015) Graph-based features for identifying spammers in microblog networks. Acta Autom Sin 41(9):1533–1541
Zhang Y, Huang Y, Gan S, Ding Y et al (2017) Weibo spammers’ identification algorithm based on Bayesian model. J Commun 38(1):44
Zheng X, Zeng Z, Chen Z, Yu Y, Rong C (2015) Detecting spammers on social networks. Neurocomputing 159:27–34
Chen K, Chen L, Zhu P, Xiong Y (2015) Interaction based on method for spam detection in online social networks. J Commun 36(7):120–127
Amleshwaram AA, Reddy AN, Yadav S, Gu G, Yang C (2013) Cats: characterizing automation of twitter spammers. In: COMSNETS, Citeseer, pp 1–10
Prasetyo PK, Lo D, Achananuparp P, Tian Y, Lim EP (2012) Automatic classification of software related microblogs. In: 2012 28th IEEE International Conference on Software Maintenance (ICSM), IEEE, pp 596–599
Shen H, Ma F, Zhang X, Zong L, Liu X, Liang W (2017) Discovering social spammers from multiple views. Neurocomputing 225:49–57
Li Z, Zhang X, Shen H, Liang W, He Z (2015) A semi-supervised framework for social spammer detection. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 177–188
Li Y, Lv Y, Wang S, Liang J, Li J, Li X (2019) Cooperative hybrid semi-supervised learning for text sentiment classification. Symmetry 11(2):133
Fu Y, Zhu X, Li B (2013) A survey on instance selection for active learning. Knowl Inf Syst 35(2):249–283
Benevenuto F, Rodrigues T, Almeida V, Almeida J, Gonçalves M (2009) Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 620–627
Ye S, Ye R, Zhu m (2017) Method to find spammer group for Weibo based on network relationship. Comput Eng Appl 06
Li S, Li X, Yang H, Sun G, Lang F (2017) A zombie account detection method in microblog based on the pagerank. In: 2017 IEEE International Conference on Software Quality. Reliability and Security Companion (QRS-C). IEEE, pp 267–270
Tan E, Guo L, Chen S, Zhang X, Zhao Y (2013) Unik: unsupervised social network spam detection. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp 479–488
Wang G, Zhang X, Tang S, Zheng H, Zhao BY (2016) Unsupervised clickstream clustering for user behavior analysis. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp 225–236
Chen H, Liu J, Lv Y, Li MH, Liu M, Zheng Q (2018) Semi-supervised clue fusion for spammer detection in Sina Weibo. Inf Fusion 44:22–32
Wu F, Wu C, Liu J (2018) Semi-supervised collaborative learning for social spammer and spam message detection in microblogging. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp 1791–1794
Settles B (2009) Active learning literature survey
Zhang X, Bai H, Liang W (2016) A social spam detection framework via semi-supervised learning. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 214–226
Zhou ZH, Li M (2010) Semi-supervised learning by disagreement. Knowl Inf Syst 24(3):415–439
Yang C, Harkreader R, Gu G (2013) Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293
Tan K, Gao M, Li W, Tian R, Wen J, Xiong Q (2017) Two-layer sampling active learning algorithm for social spammer detection. Zidonghua Xuebao/Acta Autom Sin
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS), vol 6, p 12
Costa H, Merschmann LH, Barth F, Benevenuto F (2014) Pollution, bad-mouthing, and local marketing: the underground of location-based social networks. Inf Sci 279:123–137
Li W, Gao M, Rong W, Wen J, Xiong Q, Ling B (2016) LSSL-SSD: social spammer detection with Laplacian score and semi-supervised learning. In: International Conference on Knowledge Science. Springer, Engineering and Management, pp 439–450
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, A., Yang, P. & Cheng, P. ACTSSD: social spammer detection based on active learning and co-training. J Supercomput 78, 2744–2771 (2022). https://doi.org/10.1007/s11227-021-03966-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03966-3