skip to main content
10.1145/1531914.1531924acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesiea-aeiConference Proceedingsconference-collections
research-article

Social spam detection

Authors Info & Claims
Published:21 April 2009Publication History

ABSTRACT

The popularity of social bookmarking sites has made them prime targets for spammers. Many of these systems require an administrator's time and energy to manually filter or remove spam. Here we discuss the motivations of social spam, and present a study of automatic detection of spammers in a social tagging system. We identify and analyze six distinct features that address various properties of social spam, finding that each of these features provides for a helpful signal to discriminate spammers from legitimate users. These features are then used in various machine learning algorithms for classification, achieving over 98% accuracy in detecting social spammers with 2% false positives. These promising results provide a new baseline for future efforts on social spam. We make our dataset publicly available to the research community.

References

  1. J. Attenberg and T. Suel. Cleaning search results using term distance features. In Proc. 4th Intl. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 21--24, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, C. Zhang, and K. Ross. Identifying video spammers in online social networks. In Proc. 4th Intl. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 45--52, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Bian, Y. Liu, E. Agichtein, and H. Zha. A few bad votes too many?: towards robust ranking in social media. In Proc. 4th Intl. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 53--60, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8--13):1157--1166, 1997. Proc 6th Intl. WWW Conf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Cattuto, D. Benz, A. Hotho, and G. Stumme. Semantic grounding of tag relatedness in social bookmarking systems. In Proc. ISWC, vol. 5318 of LNCS, pages 615--631, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Cattuto, C. Schmitz, A. Baldassarri, V. D. P. Servedio, V. Loreto, A. Hotho, M. Grahl, and G. Stumme. Network properties of folksonomies. AI Commun., 20(4):245--262, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Caverlee, L. Liu, and S. Webb. Socialtrust: tamper-resilient trust establishment in online communities. In Proc. 8th ACM/IEEE-CS Joint Conf. on Digital libraries (JCDL), pages 104--114, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Chevalier and P. Gramme. RANK for spam detection ECML - Discovery Challenge. In Proc. Europ. Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2008.Google ScholarGoogle Scholar
  9. A. Gkanogiannis and T. Kalamboukis. A novel supervised learning algorithm and its use for spam detection in social bookmarking systems. In Proc. Europ. Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2008.Google ScholarGoogle Scholar
  10. S. Golder and B. A. Huberman. The structure of collaborative tagging systems. Journal of Information Science, 32(2):198--208, April 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proc. 13th Intl. Conv Very Large Data Bases (VLDB), pages 576--587, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Hammond, T. Hannay, B. Lund, and J. Scott. Social Bookmarking Tools (I): A General Review. D-Lib Magazine, 11(4), April 2005.Google ScholarGoogle Scholar
  13. P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11(6):36--45, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In The Semantic Web: Research and Applications, vol. 4011 of LNAI, pages 411--426. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Kim and K.-B. Hwang. Naive bayes classifier learning with feature selection for spam detection in social bookmarking. In Proc. Europ. Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2008.Google ScholarGoogle Scholar
  16. G. Koutrika, F. A. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In Proc. 3rd Intl. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 57--64, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Krause, C. Schmitz, A. Hotho, and G. Stumme. The anti-social tagger: detecting spam in social bookmarking systems. In Proc. 4th Intl. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 61--68, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Lambiotte and M. Ausloos. Collaborative tagging as a tripartite network. LNCS, 3993:1114, Dec 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. Evaluating similarity measures for emergent semantics of social tagging. In Proc. 18th Intl. WWW Conf., 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Markines, H. Roinestad, and F. Menczer. Efficient assembly of social semantic networks. In Proc. 19th ACM Conf. on Hypertext and Hypermedia (HT), pages 149--156, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Mika. Ontologies are us: A unified model of social networks and semantics. In Proc. ISWC, vol. 3729 of LNCS, pages 522--536, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2 edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Z. Xu, Y. Fu, J. Mao, and D. Su. Towards the semantic web: Collaborative tag suggestions. In Proc. WWW'06 Collaborative Web Tagging Workshop, 2006.Google ScholarGoogle Scholar

Index Terms

  1. Social spam detection

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            AIRWeb '09: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
            April 2009
            67 pages
            ISBN:9781605584386
            DOI:10.1145/1531914

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 April 2009

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader