research-article

Collective Spammer Detection in Evolving Multi-Relational Social Networks

Authors:

Shobeir Fakhraei,

Madhusudana Shashanka,

Lise GetoorAuthors Info & Claims

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1769 - 1778

https://doi.org/10.1145/2783258.2788606

Published: 10 August 2015 Publication History

Abstract

Detecting unsolicited content and the spammers who create it is a long-standing challenge that affects all of us on a daily basis. The recent growth of richly-structured social networks has provided new challenges and opportunities in the spam detection landscape. Motivated by the Tagged.com social network, we develop methods to identify spammers in evolving multi-relational social networks. We model a social network as a time-stamped multi-relational graph where vertices represent users, and edges represent different activities between them. To identify spammer accounts, our approach makes use of structural features, sequence modelling, and collective reasoning. We leverage relational sequence information using k-gram features and probabilistic modelling with a mixture of Markov models. Furthermore, in order to perform collective reasoning and improve the predictive power of a noisy abuse reporting system, we develop a statistical relational model using hinge-loss Markov random fields (HL-MRFs), a class of probabilistic graphical models which are highly scalable. We use Graphlab Create and Probabilistic Soft Logic (PSL) to prototype and experimentally evaluate our solutions on internet-scale data from Tagged.com. Our experiments demonstrate the effectiveness of our approach, and show that models which incorporate the multi-relational nature of the social network significantly gain predictive performance over those that do not.

Supplementary Material

MP4 File (p1769.mp4)

Download
241.47 MB

References

[1]

Wikipedia. History of email spam -- Wikipedia, the free encyclopedia, 2014. URL http://en.wikipedia.org/wiki/History_of_email_spam.

[2]

Harold Nguyen. 2013 state of social media spam. Technical report, Nexgate. URL http://go.nexgate.com/nexgate-social-media-spam-research-report.

[3]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, 1999.

[4]

J Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, and Alessandro Vespignani. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in neural information processing systems (NIPS), 2005.

[5]

Tommy R Jensen and Bjarne Toft. Graph coloring problems. John Wiley & Sons, 2011.

[6]

S Pemmaraju and S Skiena. Implementing discrete mathematics: Combinatorics and graph theory with mathematica, 2003.

[7]

Thomas Schank. Algorithmic aspects of triangle-based network analysis. Phd in computer science, University Karlsruhe, 2007.

[8]

Zhengzheng Xing, Jian Pei, and Eamonn Keogh. A brief survey on sequence classification. ACM SIGKDD Explorations Newsletter, 2010.

Digital Library

[9]

Fuchun Peng, Dale Schuurmans, and Shaojun Wang. Augmenting naive Bayes classifiers with statistical language models. Information Retrieval, 2004.

Digital Library

[10]

Fei Zheng and Geoffrey I Webb. Tree augmented naive Bayes. In Encyclopedia of Machine Learning, pages 990--991. Springer, 2010.

[11]

Stephen H. Bach, Bert Huang, Ben London, and Lise Getoor. Hinge-loss Markov random fields: Convex inference for structured prediction. In Uncertainty in Artificial Intelligence (UAI), 2013.

[12]

S. H. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss Markov random fields and probabilistic soft logic. arXiv:1505.04406 {cs.LG}, 2015.

Digital Library

[13]

Jay Pujara, Hui Miao, Lise Getoor, and William Cohen. Knowledge graph identification. In International Semantic Web Conference (ISWC), 2013.

Digital Library

[14]

Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daume III, and Lise Getoor. Learning latent engagement patterns of students in online courses. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. AAAI Press, 2014.

Digital Library

[15]

Fakhraei, Huang, Raschid, and Getoor}fakhraei:tcbb14Shobeir Fakhraei, Bert Huang, Louiqa Raschid, and Lise Getoor. Network-based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014\natexlaba.

Digital Library

[16]

Shobeir Fakhraei, Louiqa Raschid, and Lise Getoor. Drug-target interaction prediction for drug repurposing with probabilistic similarity logic. In ACM SIGKDD 12th International Workshop on Data Mining in Bioinformatics (BIOKDD). ACM, 2013.

Digital Library

[17]

Bert Huang, Angelika Kimmig, Lise Getoor, and Jennifer Golbeck. A flexible framework for probabilistic models of social trust. In International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction (SBP), 2013.

Digital Library

[18]

Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. Network sampling: From static to streaming graphs. ACM Trans. Knowl. Discov. Data, 2013.

Digital Library

[19]

Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006.

Digital Library

[20]

Mohammad Al Hasan and Mohammed J. Zaki. Output space sampling for graph patterns. PVLDB, 2009.

Digital Library

[21]

Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189--1232, 2001.

[22]

Enrico Blanzieri and Anton Bryl. A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review, 2008.

Digital Library

[23]

Nikita Spirin and Jiawei Han. Survey on web spam detection: principles and algorithms. ACM SIGKDD Explorations Newsletter, 13 (2): 50--64, 2012.

Digital Library

[24]

Nisheeth Shrivastava, Anirban Majumder, and Rajeev Rastogi. Mining (social) network graphs to detect random link attacks. In Data Engineering, 2008. ICDE 2008. IEEE 24th International conference on, pages 486--495. IEEE, 2008.

Digital Library

[25]

Chi-Yao Tseng and Ming-Syan Chen. Incremental SVM model for spam detection on dynamic email social networks. In Computational Science and Engineering, 2009. CSE'09. International conference on, volume 4, pages 128--135. IEEE, 2009.

Digital Library

[26]

P Oscar and VP Roychowdbury. Leveraging social networks to fight spam. IEEE Computer, 38 (4): 61--68, 2005.

Digital Library

[27]

Luca Becchetti, Carlos Castillo, Debora Donato, Ricardo Baeza-Yates, and Stefano Leonardi. Link analysis for web spam detection. ACM Transactions on the Web (TWEB), 2008.

Digital Library

[28]

Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock, and Fabrizio Silvestri. Know your neighbors: Web spam detection using the web topology. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007.

Digital Library

[29]

yi et al.(2004)Gyöngyi, Garcia-Molina, and Pedersen}gyongyi2004combatingZoltán Gyöngyi, Hector Garcia-Molina, and Jan Pedersen. Combating web spam with trustrank. In Proceedings of the thirtieth international conference on very large data bases, pages 576--587. VLDB Endowment, 2004.

Digital Library

[30]

Paul-Alexandru Chirita, Jörg Diederich, and Wolfgang Nejdl. Mailrank: using ranking for spam detection. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 373--380. ACM, 2005.

Digital Library

[31]

Jacob Abernethy, Olivier Chapelle, and Carlos Castillo. Graph regularization methods for web spam detection. Machine Learning, 81 (2): 207--225, 2010.

Digital Library

[32]

Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. Internet Computing, IEEE, 11 (6): 36--45, 2007.

Digital Library

[33]

Xia Hu, Jiliang Tang, and Huan Liu. Leveraging knowledge across media for spammer detection in microblogging. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 2014.

Digital Library

[34]

Enhua Tan, Lei Guo, Songqing Chen, Xiaodong Zhang, and Yihong Zhao. Unik: unsupervised social network spam detection. In Proceedings of the 22nd ACM international conference on information & knowledge management. ACM, 2013.

Digital Library

[35]

Tao Stein, Erdong Chen, and Karan Mangla. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems. ACM, 2011.

Digital Library

[36]

Hongyu Gao, Jun Hu, Christo Wilson, Zhichun Li, Yan Chen, and Ben Y Zhao. Detecting and characterizing social spam campaigns. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 35--47. ACM, 2010.

Digital Library

[37]

Benjamin Markines, Ciro Cattuto, and Filippo Menczer. Social spam detection. In Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pages 41--48. ACM, 2009.

Digital Library

[38]

Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6, page 12, 2010.

[39]

Kyumin Lee, James Caverlee, and Steve Webb. Uncovering social spammers: social honeypots

[40]

machine learning. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 435--442. ACM, 2010.

Digital Library

[41]

Yin Zhu, Xiao Wang, Erheng Zhong, Nathan N Liu, He Li, and Qiang Yang. Discovering spammers in social networks. In Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012.

[42]

Gueorgi Kossinets and Duncan J Watts. Empirical analysis of an evolving social network. Science, 2006.

[43]

Xin Jin, Cindy Xide Lin, Jiebo Luo, and Jiawei Han. Socialspamguard: A data mining-based spam detection system for social media networks. PVLDB, 2011.

[44]

Xianchao Zhang, Shaoping Zhu, and Wenxin Liang. Detecting spam and promoting campaigns in the twitter social network. In ICDM, pages 1194--1199, 2012.

Digital Library

[45]

Garc\'ıa, and Bringas}laorden2012collectiveCarlos Laorden, Borja Sanz, Igor Santos, Patxi Galán-García, and Pablo G Bringas. Collective classification for spam filtering. Logic Journal of IGPL, 2012.

[46]

Guang-Gang Geng, Qiudan Li, and Xinchang Zhang. Link based small sample learning for web spam detection. In Proceedings of the 18th international conference on World wide web, pages 1185--1186. ACM, 2009.

Digital Library

[47]

Mohamadali Torkamani and Daniel Lowd. Convex adversarial collective classification. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 642--650, 2013.

[48]

Fakhraei, Soltanian-Zadeh, and Fotouhi}fakhraei2014biasShobeir Fakhraei, Hamid Soltanian-Zadeh, and Farshad Fotouhi. Bias and stability of single variable classifiers for feature ranking and selection. Expert Systems with Applications, 41 (15): 6945 -- 6958, 2014\natexlabb.

Digital Library

Cited By

Zhou YGao SGuo DWei XRokne JWang H(2025)A Survey of Change Point Detection in Dynamic GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352385737:3(1030-1048)Online publication date: Mar-2025
https://doi.org/10.1109/TKDE.2024.3523857
Tiwari MGepp AKumar K(2025)Using data analytics to distinguish legitimate and illegitimate shell companiesJournal of Economic Criminology10.1016/j.jeconc.2024.1001237(100123)Online publication date: Mar-2025
https://doi.org/10.1016/j.jeconc.2024.100123
Zhang JWang YYang XZhu E(2024)A Fully Test-time Training Framework for Semi-supervised Node Classification on Out-of-Distribution GraphsACM Transactions on Knowledge Discovery from Data10.1145/364950718:7(1-19)Online publication date: 26-Feb-2024
https://dl.acm.org/doi/10.1145/3649507
Show More Cited By

Index Terms

Collective Spammer Detection in Evolving Multi-Relational Social Networks
1. Computing methodologies
  1. Machine learning

Recommendations

Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Graph Neural Networks (GNNs) have been widely applied to fraud detection problems in recent years, revealing the suspiciousness of nodes by aggregating their neighborhood information via different relations. However, few prior works have noticed the ...
Robust Spammer Detection by Nash Reinforcement Learning
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Online reviews provide product evaluations for customers to make decisions. Unfortunately, the evaluations can be manipulated using fake reviews ("spams") by professional spammers, who have learned increasingly insidious and powerful spamming strategies ...
Collective Opinion Spam Detection: Bridging Review Networks and Metadata
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Online reviews capture the testimonials of "real" people and help shape the decisions of other consumers. Due to the financial gains associated with positive reviews, however, opinion spam has become a widespread problem, with often paid spam reviewers ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2015

2378 pages

ISBN:9781450336642

DOI:10.1145/2783258

General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

KDD '15

Sponsor:

KDD '15: The 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 10 - 13, 2015

NSW, Sydney, Australia

Acceptance Rates

KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

63
Total Citations
View Citations
1,005
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou YGao SGuo DWei XRokne JWang H(2025)A Survey of Change Point Detection in Dynamic GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352385737:3(1030-1048)Online publication date: Mar-2025
https://doi.org/10.1109/TKDE.2024.3523857
Tiwari MGepp AKumar K(2025)Using data analytics to distinguish legitimate and illegitimate shell companiesJournal of Economic Criminology10.1016/j.jeconc.2024.1001237(100123)Online publication date: Mar-2025
https://doi.org/10.1016/j.jeconc.2024.100123
Zhang JWang YYang XZhu E(2024)A Fully Test-time Training Framework for Semi-supervised Node Classification on Out-of-Distribution GraphsACM Transactions on Knowledge Discovery from Data10.1145/364950718:7(1-19)Online publication date: 26-Feb-2024
https://dl.acm.org/doi/10.1145/3649507
Wang SYang XIslam RChen HXu MLi JCai Y(2024)Enhancing Distribution and Label Consistency for Graph Out-of-Distribution Generalization2024 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM59182.2024.00108(875-880)Online publication date: 9-Dec-2024
https://doi.org/10.1109/ICDM59182.2024.00108
Sallah AArbi Abdellaoui Alaoui EAgoujil SAhmad Wani MHammad MMaleh YAbd El-Latif A(2024)Fine-Tuned Understanding: Enhancing Social Bot Detection With Transformer-Based ClassificationIEEE Access10.1109/ACCESS.2024.344065712(118250-118269)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3440657
Zeng ZXie JYang ZMa TChen D(2024)TO-UGDA: target-oriented unsupervised graph domain adaptationScientific Reports10.1038/s41598-024-59890-y14:1Online publication date: 22-Apr-2024
https://doi.org/10.1038/s41598-024-59890-y
Shi YZheng LQuan PNiu L(2024)Wasserstein distance regularized graph neural networksInformation Sciences10.1016/j.ins.2024.120608(120608)Online publication date: Apr-2024
https://doi.org/10.1016/j.ins.2024.120608
Jin YWang MXiong YRen ZHuo CZhu FZhang JWang GChen H(2024)Bridging distribution gaps: invariant pattern discovery for dynamic graph learningWorld Wide Web10.1007/s11280-024-01283-227:4Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1007/s11280-024-01283-2
Sallah AAbdellaoui Alaoui EHessane AAgoujil SNayyar A(2024)An efficient fake account identification in social media networks: Facebook and Instagram using NSGA-II algorithmNeural Computing and Applications10.1007/s00521-024-10350-8Online publication date: 28-Aug-2024
https://doi.org/10.1007/s00521-024-10350-8
Jin BLi SHuang J(2024)GraphSAGE-Based Spammer Detection Using Social Attribute RelationshipTechnologies and Applications of Artificial Intelligence10.1007/978-981-97-1711-8_23(300-313)Online publication date: 28-Mar-2024
https://doi.org/10.1007/978-981-97-1711-8_23
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten