Abstract
Spammers create large number of compromised or fake accounts to disseminate harmful information in social networks like Twitter. Identifying social spammers has become a challenging problem. Most of existing algorithms for social spammer detection are based on supervised learning, which needs a large amount of labeled data for training. However, labeling sufficient training set costs too much resources, which makes supervised learning impractical for social spammer detection. In this paper, we propose a semi-supervised framework for social spammer detection(SSSD), which combines the supervised classification model with a ranking scheme on the social graph. First, we train an original classifier with a small number of labeled data. Second, we propose a ranking model to propagate trust and distrust on the social graph. Third, we select confident users that are judged by the classifier and ranking scores as new training data and retrain the classifier. We repeat the all steps above until the classifier cannot be refined any more. Experimental results show that our framework can effectively detect social spammers in the condition of lacking sufficient labeled data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amleshwaram, A.A., Reddy, N., Yadav, S., Gu, G., Yang, C.: Cats: characterizing automation of twitter spammers. In: 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–10. IEEE (2013)
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS) (2010)
Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., Gonçalves, M.: Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR), pp. 620–627. ACM (2009)
Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.: Detecting and characterizing social spam campaigns. In: Proceedings of the 10th Annual Conference on Internet Measurement(IMC), pp. 35–47. ACM (2010)
Ghosh, S., Viswanath, B., Kooti, F., Sharma, N.K., Korlam, G., Benevenuto, F., Ganguly, N., Gummadi, K.P.: Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st International Conference on World Wide Web, pp. 61–70. ACM (2012)
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases-vol. 30, pp. 576–587. VLDB Endowment (2004)
Heymann, P., Koutrika, G., Garcia-Molina, H.: Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing 11(6), 36–45 (2007)
Hu, X., Tang, J., Liu, H.: Online social spammer detection. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)
Hu, X., Tang, J., Zhang, Y., Liu, H.: Social spammer detection in microblogging. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 2633–2639. AAAI Press (2013)
Krishnan, V., Raj, R.: Web spam detection with anti-trust rank. AIRWeb. 6, 37–40 (2006)
Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots+ machine learning. In: Proceeding of the 33rd International ACM (SIGIR) Conference on Research and Development in Information Retrieval, pp. 435–442. ACM (2010)
Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: KDD, pp. 1023–1031 (2012)
Prasetyo, P.K., Lo, D., Achananuparp, P., Tian, Y., Lim, E.P.: Automatic classification of software related microblogs. In: 2012 28th IEEE International Conference on Software Maintenance (ICSM), pp. 596–599. IEEE (2012)
Tan, E., Guo, L., Chen, S., Zhang, X., Zhao, Y.: Unik: unsupervised social network spam detection. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 479–488. ACM (2013)
Wang, A.: Don’t follow me: Spam detection in twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10. IEEE (2010)
Wu, B., Goel, V., Davison, B.D.: Propagating trust and distrust to demote web spam. MTW 190 (2006)
Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Transactions on Information Forensics and Security 8(8), 1280–1293 (2013)
Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: Proceedings of the 21st International Conference on World Wide Web, pp. 71–80. ACM (2012)
Zhang, X., Zhu, S., Liang, W.: Detecting spam and promoting campaigns in the twitter social network. In: Proceedings of the 2012 IEEE 12th International Conference on Data Mining, pp. 1194–1199. IEEE Computer Society (2012)
Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005)
Zhu, Y., Wang, X., Zhong, E., Liu, N.N., Li, H., Yang, Q.: Discovering spammers in social networks. In: AAAI (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, Z., Zhang, X., Shen, H., Liang, W., He, Z. (2015). A Semi-Supervised Framework for Social Spammer Detection. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-18032-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)