skip to main content
10.1145/2783258.2788606acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Collective Spammer Detection in Evolving Multi-Relational Social Networks

Published: 10 August 2015 Publication History

Abstract

Detecting unsolicited content and the spammers who create it is a long-standing challenge that affects all of us on a daily basis. The recent growth of richly-structured social networks has provided new challenges and opportunities in the spam detection landscape. Motivated by the Tagged.com social network, we develop methods to identify spammers in evolving multi-relational social networks. We model a social network as a time-stamped multi-relational graph where vertices represent users, and edges represent different activities between them. To identify spammer accounts, our approach makes use of structural features, sequence modelling, and collective reasoning. We leverage relational sequence information using k-gram features and probabilistic modelling with a mixture of Markov models. Furthermore, in order to perform collective reasoning and improve the predictive power of a noisy abuse reporting system, we develop a statistical relational model using hinge-loss Markov random fields (HL-MRFs), a class of probabilistic graphical models which are highly scalable. We use Graphlab Create and Probabilistic Soft Logic (PSL) to prototype and experimentally evaluate our solutions on internet-scale data from Tagged.com. Our experiments demonstrate the effectiveness of our approach, and show that models which incorporate the multi-relational nature of the social network significantly gain predictive performance over those that do not.

Supplementary Material

MP4 File (p1769.mp4)

References

[1]
Wikipedia. History of email spam -- Wikipedia, the free encyclopedia, 2014. URL http://en.wikipedia.org/wiki/History_of_email_spam.
[2]
Harold Nguyen. 2013 state of social media spam. Technical report, Nexgate. URL http://go.nexgate.com/nexgate-social-media-spam-research-report.
[3]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, 1999.
[4]
J Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, and Alessandro Vespignani. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in neural information processing systems (NIPS), 2005.
[5]
Tommy R Jensen and Bjarne Toft. Graph coloring problems. John Wiley & Sons, 2011.
[6]
S Pemmaraju and S Skiena. Implementing discrete mathematics: Combinatorics and graph theory with mathematica, 2003.
[7]
Thomas Schank. Algorithmic aspects of triangle-based network analysis. Phd in computer science, University Karlsruhe, 2007.
[8]
Zhengzheng Xing, Jian Pei, and Eamonn Keogh. A brief survey on sequence classification. ACM SIGKDD Explorations Newsletter, 2010.
[9]
Fuchun Peng, Dale Schuurmans, and Shaojun Wang. Augmenting naive Bayes classifiers with statistical language models. Information Retrieval, 2004.
[10]
Fei Zheng and Geoffrey I Webb. Tree augmented naive Bayes. In Encyclopedia of Machine Learning, pages 990--991. Springer, 2010.
[11]
Stephen H. Bach, Bert Huang, Ben London, and Lise Getoor. Hinge-loss Markov random fields: Convex inference for structured prediction. In Uncertainty in Artificial Intelligence (UAI), 2013.
[12]
S. H. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss Markov random fields and probabilistic soft logic. arXiv:1505.04406 {cs.LG}, 2015.
[13]
Jay Pujara, Hui Miao, Lise Getoor, and William Cohen. Knowledge graph identification. In International Semantic Web Conference (ISWC), 2013.
[14]
Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daume III, and Lise Getoor. Learning latent engagement patterns of students in online courses. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. AAAI Press, 2014.
[15]
Fakhraei, Huang, Raschid, and Getoor}fakhraei:tcbb14Shobeir Fakhraei, Bert Huang, Louiqa Raschid, and Lise Getoor. Network-based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014\natexlaba.
[16]
Shobeir Fakhraei, Louiqa Raschid, and Lise Getoor. Drug-target interaction prediction for drug repurposing with probabilistic similarity logic. In ACM SIGKDD 12th International Workshop on Data Mining in Bioinformatics (BIOKDD). ACM, 2013.
[17]
Bert Huang, Angelika Kimmig, Lise Getoor, and Jennifer Golbeck. A flexible framework for probabilistic models of social trust. In International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction (SBP), 2013.
[18]
Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. Network sampling: From static to streaming graphs. ACM Trans. Knowl. Discov. Data, 2013.
[19]
Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006.
[20]
Mohammad Al Hasan and Mohammed J. Zaki. Output space sampling for graph patterns. PVLDB, 2009.
[21]
Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189--1232, 2001.
[22]
Enrico Blanzieri and Anton Bryl. A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review, 2008.
[23]
Nikita Spirin and Jiawei Han. Survey on web spam detection: principles and algorithms. ACM SIGKDD Explorations Newsletter, 13 (2): 50--64, 2012.
[24]
Nisheeth Shrivastava, Anirban Majumder, and Rajeev Rastogi. Mining (social) network graphs to detect random link attacks. In Data Engineering, 2008. ICDE 2008. IEEE 24th International conference on, pages 486--495. IEEE, 2008.
[25]
Chi-Yao Tseng and Ming-Syan Chen. Incremental SVM model for spam detection on dynamic email social networks. In Computational Science and Engineering, 2009. CSE'09. International conference on, volume 4, pages 128--135. IEEE, 2009.
[26]
P Oscar and VP Roychowdbury. Leveraging social networks to fight spam. IEEE Computer, 38 (4): 61--68, 2005.
[27]
Luca Becchetti, Carlos Castillo, Debora Donato, Ricardo Baeza-Yates, and Stefano Leonardi. Link analysis for web spam detection. ACM Transactions on the Web (TWEB), 2008.
[28]
Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock, and Fabrizio Silvestri. Know your neighbors: Web spam detection using the web topology. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007.
[29]
yi et al.(2004)Gyöngyi, Garcia-Molina, and Pedersen}gyongyi2004combatingZoltán Gyöngyi, Hector Garcia-Molina, and Jan Pedersen. Combating web spam with trustrank. In Proceedings of the thirtieth international conference on very large data bases, pages 576--587. VLDB Endowment, 2004.
[30]
Paul-Alexandru Chirita, Jörg Diederich, and Wolfgang Nejdl. Mailrank: using ranking for spam detection. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 373--380. ACM, 2005.
[31]
Jacob Abernethy, Olivier Chapelle, and Carlos Castillo. Graph regularization methods for web spam detection. Machine Learning, 81 (2): 207--225, 2010.
[32]
Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. Internet Computing, IEEE, 11 (6): 36--45, 2007.
[33]
Xia Hu, Jiliang Tang, and Huan Liu. Leveraging knowledge across media for spammer detection in microblogging. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 2014.
[34]
Enhua Tan, Lei Guo, Songqing Chen, Xiaodong Zhang, and Yihong Zhao. Unik: unsupervised social network spam detection. In Proceedings of the 22nd ACM international conference on information & knowledge management. ACM, 2013.
[35]
Tao Stein, Erdong Chen, and Karan Mangla. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems. ACM, 2011.
[36]
Hongyu Gao, Jun Hu, Christo Wilson, Zhichun Li, Yan Chen, and Ben Y Zhao. Detecting and characterizing social spam campaigns. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 35--47. ACM, 2010.
[37]
Benjamin Markines, Ciro Cattuto, and Filippo Menczer. Social spam detection. In Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pages 41--48. ACM, 2009.
[38]
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6, page 12, 2010.
[39]
Kyumin Lee, James Caverlee, and Steve Webb. Uncovering social spammers: social honeypots
[40]
machine learning. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 435--442. ACM, 2010.
[41]
Yin Zhu, Xiao Wang, Erheng Zhong, Nathan N Liu, He Li, and Qiang Yang. Discovering spammers in social networks. In Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012.
[42]
Gueorgi Kossinets and Duncan J Watts. Empirical analysis of an evolving social network. Science, 2006.
[43]
Xin Jin, Cindy Xide Lin, Jiebo Luo, and Jiawei Han. Socialspamguard: A data mining-based spam detection system for social media networks. PVLDB, 2011.
[44]
Xianchao Zhang, Shaoping Zhu, and Wenxin Liang. Detecting spam and promoting campaigns in the twitter social network. In ICDM, pages 1194--1199, 2012.
[45]
Garc\'ıa, and Bringas}laorden2012collectiveCarlos Laorden, Borja Sanz, Igor Santos, Patxi Galán-García, and Pablo G Bringas. Collective classification for spam filtering. Logic Journal of IGPL, 2012.
[46]
Guang-Gang Geng, Qiudan Li, and Xinchang Zhang. Link based small sample learning for web spam detection. In Proceedings of the 18th international conference on World wide web, pages 1185--1186. ACM, 2009.
[47]
Mohamadali Torkamani and Daniel Lowd. Convex adversarial collective classification. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 642--650, 2013.
[48]
Fakhraei, Soltanian-Zadeh, and Fotouhi}fakhraei2014biasShobeir Fakhraei, Hamid Soltanian-Zadeh, and Farshad Fotouhi. Bias and stability of single variable classifiers for feature ranking and selection. Expert Systems with Applications, 41 (15): 6945 -- 6958, 2014\natexlabb.

Cited By

View all
  • (2025)A Survey of Change Point Detection in Dynamic GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352385737:3(1030-1048)Online publication date: Mar-2025
  • (2025)Using data analytics to distinguish legitimate and illegitimate shell companiesJournal of Economic Criminology10.1016/j.jeconc.2024.1001237(100123)Online publication date: Mar-2025
  • (2024)A Fully Test-time Training Framework for Semi-supervised Node Classification on Out-of-Distribution GraphsACM Transactions on Knowledge Discovery from Data10.1145/364950718:7(1-19)Online publication date: 26-Feb-2024
  • Show More Cited By

Index Terms

  1. Collective Spammer Detection in Evolving Multi-Relational Social Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2015
    2378 pages
    ISBN:9781450336642
    DOI:10.1145/2783258
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. collective classification
    2. graph mining
    3. graphlab
    4. heterogeneous networks
    5. hinge-loss markov random fields (hl-mrf)
    6. k-grams
    7. multi-relational networks
    8. probabilistic soft logic (psl)
    9. sequence mining
    10. social networks
    11. social spam
    12. spam
    13. tree-augmented naive bayes

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '15
    Sponsor:

    Acceptance Rates

    KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)A Survey of Change Point Detection in Dynamic GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352385737:3(1030-1048)Online publication date: Mar-2025
    • (2025)Using data analytics to distinguish legitimate and illegitimate shell companiesJournal of Economic Criminology10.1016/j.jeconc.2024.1001237(100123)Online publication date: Mar-2025
    • (2024)A Fully Test-time Training Framework for Semi-supervised Node Classification on Out-of-Distribution GraphsACM Transactions on Knowledge Discovery from Data10.1145/364950718:7(1-19)Online publication date: 26-Feb-2024
    • (2024)Enhancing Distribution and Label Consistency for Graph Out-of-Distribution Generalization2024 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM59182.2024.00108(875-880)Online publication date: 9-Dec-2024
    • (2024)Fine-Tuned Understanding: Enhancing Social Bot Detection With Transformer-Based ClassificationIEEE Access10.1109/ACCESS.2024.344065712(118250-118269)Online publication date: 2024
    • (2024)TO-UGDA: target-oriented unsupervised graph domain adaptationScientific Reports10.1038/s41598-024-59890-y14:1Online publication date: 22-Apr-2024
    • (2024)Wasserstein distance regularized graph neural networksInformation Sciences10.1016/j.ins.2024.120608(120608)Online publication date: Apr-2024
    • (2024)Bridging distribution gaps: invariant pattern discovery for dynamic graph learningWorld Wide Web10.1007/s11280-024-01283-227:4Online publication date: 2-Jul-2024
    • (2024)An efficient fake account identification in social media networks: Facebook and Instagram using NSGA-II algorithmNeural Computing and Applications10.1007/s00521-024-10350-8Online publication date: 28-Aug-2024
    • (2024)GraphSAGE-Based Spammer Detection Using Social Attribute RelationshipTechnologies and Applications of Artificial Intelligence10.1007/978-981-97-1711-8_23(300-313)Online publication date: 28-Mar-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media