ABSTRACT
The popularity of social bookmarking sites has made them prime targets for spammers. Many of these systems require an administrator's time and energy to manually filter or remove spam. Here we discuss the motivations of social spam, and present a study of automatic detection of spammers in a social tagging system. We identify and analyze six distinct features that address various properties of social spam, finding that each of these features provides for a helpful signal to discriminate spammers from legitimate users. These features are then used in various machine learning algorithms for classification, achieving over 98% accuracy in detecting social spammers with 2% false positives. These promising results provide a new baseline for future efforts on social spam. We make our dataset publicly available to the research community.
- J. Attenberg and T. Suel. Cleaning search results using term distance features. In Proc. 4th Intl. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 21--24, 2008. Google ScholarDigital Library
- F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, C. Zhang, and K. Ross. Identifying video spammers in online social networks. In Proc. 4th Intl. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 45--52, 2008. Google ScholarDigital Library
- J. Bian, Y. Liu, E. Agichtein, and H. Zha. A few bad votes too many?: towards robust ranking in social media. In Proc. 4th Intl. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 53--60, 2008. Google ScholarDigital Library
- A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8--13):1157--1166, 1997. Proc 6th Intl. WWW Conf. Google ScholarDigital Library
- C. Cattuto, D. Benz, A. Hotho, and G. Stumme. Semantic grounding of tag relatedness in social bookmarking systems. In Proc. ISWC, vol. 5318 of LNCS, pages 615--631, 2008. Google ScholarDigital Library
- C. Cattuto, C. Schmitz, A. Baldassarri, V. D. P. Servedio, V. Loreto, A. Hotho, M. Grahl, and G. Stumme. Network properties of folksonomies. AI Commun., 20(4):245--262, 2007. Google ScholarDigital Library
- J. Caverlee, L. Liu, and S. Webb. Socialtrust: tamper-resilient trust establishment in online communities. In Proc. 8th ACM/IEEE-CS Joint Conf. on Digital libraries (JCDL), pages 104--114, 2008. Google ScholarDigital Library
- J. Chevalier and P. Gramme. RANK for spam detection ECML - Discovery Challenge. In Proc. Europ. Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2008.Google Scholar
- A. Gkanogiannis and T. Kalamboukis. A novel supervised learning algorithm and its use for spam detection in social bookmarking systems. In Proc. Europ. Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2008.Google Scholar
- S. Golder and B. A. Huberman. The structure of collaborative tagging systems. Journal of Information Science, 32(2):198--208, April 2006. Google ScholarDigital Library
- Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proc. 13th Intl. Conv Very Large Data Bases (VLDB), pages 576--587, 2004. Google ScholarDigital Library
- T. Hammond, T. Hannay, B. Lund, and J. Scott. Social Bookmarking Tools (I): A General Review. D-Lib Magazine, 11(4), April 2005.Google Scholar
- P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11(6):36--45, 2007. Google ScholarDigital Library
- A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In The Semantic Web: Research and Applications, vol. 4011 of LNAI, pages 411--426. Springer, 2006. Google ScholarDigital Library
- C. Kim and K.-B. Hwang. Naive bayes classifier learning with feature selection for spam detection in social bookmarking. In Proc. Europ. Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2008.Google Scholar
- G. Koutrika, F. A. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In Proc. 3rd Intl. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 57--64, 2007. Google ScholarDigital Library
- B. Krause, C. Schmitz, A. Hotho, and G. Stumme. The anti-social tagger: detecting spam in social bookmarking systems. In Proc. 4th Intl. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 61--68, 2008. Google ScholarDigital Library
- R. Lambiotte and M. Ausloos. Collaborative tagging as a tripartite network. LNCS, 3993:1114, Dec 2005. Google ScholarDigital Library
- B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. Evaluating similarity measures for emergent semantics of social tagging. In Proc. 18th Intl. WWW Conf., 2009. Google ScholarDigital Library
- B. Markines, H. Roinestad, and F. Menczer. Efficient assembly of social semantic networks. In Proc. 19th ACM Conf. on Hypertext and Hypermedia (HT), pages 149--156, 2008. Google ScholarDigital Library
- P. Mika. Ontologies are us: A unified model of social networks and semantics. In Proc. ISWC, vol. 3729 of LNCS, pages 522--536, 2005. Google ScholarDigital Library
- I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2 edition, 2005. Google ScholarDigital Library
- Z. Xu, Y. Fu, J. Mao, and D. Su. Towards the semantic web: Collaborative tag suggestions. In Proc. WWW'06 Collaborative Web Tagging Workshop, 2006.Google Scholar
Index Terms
- Social spam detection
Recommendations
A review on social spam detection: Challenges, open issues, and future directions
Highlights- Background information on social spam and the spamming process.
- Social spam taxonomy comprising spam content and spammer account is outlined.
- A review of social spam, Deepfakes, and spammer detection techniques is presented.
- ...
AbstractOnline Social Networks are perpetually evolving and used in plenteous applications such as content sharing, chatting, making friends/followers, customer engagements, commercials, product reviews/promotions, online games, and news, etc. The ...
Social spam, campaigns, misinformation and crowdturfing
WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide WebThis tutorial will introduce peer-reviewed research work on information quality on social systems. Specifically, we will address new threats such as social spam, campaigns, misinformation and crowdturfing, and overview modern techniques to improve ...
Recent developments in social spam detection and combating techniques
Social networking and instant multimedia communication is integral to online existence.Spamming is a new menace in messaging, blogs, video sites, internet telephony etc.The article surveys recent developments on social spam detection and mitigation.A ...
Comments