ABSTRACT
The popularity of Online Social Networks (OSNs) is often faced with challenges of dealing with undesirable users and their malicious activities in the social networks. The most common form of malicious activity over OSNs is spamming wherein a bot (fake user) disseminates content, malware/viruses, etc. to the legitimate users of the social networks. The common motives behind such activity include phishing, scams, viral marketing and so on which the recipients do not indent to receive. It is thus a highly desirable task to devise techniques and methods for identifying spammers (spamming accounts) in OSNs. With an aim of exploiting social network characteristics of community formation by legitimate users, this paper presents a community-based framework to identify spammers in OSNs. The framework uses community-based features of OSN users to learn classification models for identification of spamming accounts. The preliminary experiments on a real-world dataset with simulated spammers reveal that proposed approach is promising and that using community-based node features of OSN users can improve the performance of classifying spammers and legitimate users.
- L. A. Adamic and E. Adar, "Friends and neighbors on the web," Social Networks, vol. 25, no. 3, pp. 211--230, 2003.Google ScholarCross Ref
- R. Kumar, J. Novak, and A. Tomkins, "Structure and evolution of online social networks," in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD '06. New York, NY, USA: ACM, 2006, pp. 611--617. Google ScholarDigital Library
- M. Girvan and M. E. J. Newman, "Community structure in social and biological networks," Proceedings of the National Academy of Sciences, vol. 99, no. 12, pp. 7821--7826, Jun. 2002.Google ScholarCross Ref
- S. Y. Bhat and M. Abulaish, "Octracker: A density-based framework for tracking the evolution of overlapping communities in osns," in 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Los Alamitos, CA, USA: IEEE Computer Society, 2012, pp. 501--505. Google ScholarDigital Library
- M. E. Newman and M. Girvan, "Finding and evaluating community structure in networks," Physical Review E, vol. 69, 2004.Google Scholar
- G. Palla, I. Derényi, I. Farkas, and T. Vicsek, "Uncovering the overlapping community structure of complex networks in nature and society," Nature, vol. 435, no. 7043, pp. 814--818, 2005.Google ScholarCross Ref
- S. Gregory, "An algorithm to find overlapping community structure in networks," in Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, 2007, pp. 91--102. Google ScholarDigital Library
- Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, and B. L. Tseng, "Analyzing communities and their evolutions in dynamic social networks," ACM Trans. Knowl. Discov. Data, vol. 3, pp. 8:1--8:31, April 2009. Google ScholarDigital Library
- L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates, "Link-based characterization and detection of web spam," in Second International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), Seattle, USA, 2006.Google Scholar
- F. J. Ortega, C. Macdonald, J. A. Troyano, and F. Cruz, "Spam detection with a content-based random-walk algorithm," in Proceedings of the 2nd international workshop on Search and mining user-generated contents, ser. SMUC '10. New York, NY, USA: ACM, 2010, pp. 45--52. Google ScholarDigital Library
- N. Shrivastava, A. Majumder, and R. Rastogi, "Mining (social) network graphs to detect random link attacks," in Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ser. ICDE '08. Washington, DC, USA: IEEE Computer Society, 2008, pp. 486--495. Google ScholarDigital Library
- L. H. Gomes, R. B. Almeida, L. M. A. Bettencourt, V. Almeida, and J. M. A., "Comparative graph theoretical characterization of networks of spam and legitimate email," in Proceedings of the 2nd Conference on Email and Anti-Spam (CEAS), 2005.Google Scholar
- H. Lam, A Learning Approach to Spam Detection Based on Social Networks. Hong Kong University of Science and Technology, 2007.Google Scholar
- P. O. Boykin and V. P. Roychowdhury, "Leveraging social networks to fight spam," Computer, vol. 38, no. 4, pp. 61--68, Apr. 2005. Google ScholarDigital Library
- A. Ramachandran, A. Dasgupta, N. Feamster, and K. Weinberger, "Spam or ham?: characterizing and detecting fraudulent "not spam" reports in web mail systems," in Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, ser. CEAS '11. New York, NY, USA: ACM, 2011, pp. 210--219. Google ScholarDigital Library
- E. Damiani, S. D. C. di Vimercati, S. Paraboschi, and P. Samarati, "P2p-based collaborative spam detection and filtering," in Proceedings of the Fourth International Conference on Peer-to-Peer Computing, ser. P2P '04. Washington, DC, USA: IEEE Computer Society, 2004, pp. 176--183. Google ScholarDigital Library
- F. Li and M. H. Hsieh, "An empirical study of clustering behavior of spammers and group-based anti-spam strategies," in CEAS 2006 - The Third Conference on Email and Anti-Spam, July 27--28, 2006, Mountain View, California, USA, 2006, pp. 27--28.Google Scholar
- G. Stringhini, C. Kruegel, and G. Vigna, "Detecting spammers on social networks," in Proceedings of the 26th Annual Computer Security Applications Conference, ser. ACSAC '10. New York, NY, USA: ACM, 2010, pp. 1--9. Google ScholarDigital Library
- D. DeBarr and H. Wechsler, "Using social network analysis for spam detection," in Proceedings of the Third international conference on Social Computing, Behavioral Modeling, and Prediction, ser. SBP'10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 62--69. Google ScholarDigital Library
- A. H. Wang, "Don't follow me: Spam detection in twitter," in Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, 2010, pp. 1--10.Google Scholar
- C. X. Jin, X. and Lin, J. Luo, and J. Han, "Socialspamguard: A data mining-based spam detection system for social media networks." PVLDB, no. 12, pp. 1458--1461, 2011.Google Scholar
- F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, and M. Gonçalves, "Detecting spammers and content promoters in online video social networks," in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, ser. SIGIR '09. New York, NY, USA: ACM, 2009, pp. 620--627. Google ScholarDigital Library
- C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri, "Know your neighbors: web spam detection using the web topology," in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ser. SIGIR '07. New York, NY, USA: ACM, 2007, pp. 423--430. Google ScholarDigital Library
- K. Lee, J. Caverlee, and S. Webb, "Uncovering social spammers: social honeypots + machine learning," in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ser. SIGIR '10. New York, NY, USA: ACM, 2010, pp. 435--442. Google ScholarDigital Library
- Q. Gan and T. Suel, "Improving web spam classifiers using link structure," in Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, ser. AIRWeb '07. New York, NY, USA: ACM, 2007, pp. 17--20. Google ScholarDigital Library
- A. Ramachandran, N. Feamster, and S. Vempala, "Filtering spam with behavioral blacklisting," in Proceedings of the 14th ACM conference on Computer and communications security, ser. CCS '07. New York, NY, USA: ACM, 2007, pp. 342--351. Google ScholarDigital Library
- Y. Zhao, Y. Xie, F. Yu, Q. Ke, Y. Yu, Y. Chen, and E. Gillum, "Botgraph: large scale spamming botnet detection," in Proceedings of the 6th USENIX symposium on Networked systems design and implementation, ser. NSDI'09. Berkeley, CA, USA: USENIX Association, 2009, pp. 321--334. Google ScholarDigital Library
- H. Gao, J. Hu, C. Wilson, Z. Li, Y. Chen, and B. Y. Zhao, "Detecting and characterizing social spam campaigns," in Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, ser. IMC '10. New York, NY, USA: ACM, 2010, pp. 35--47. Google ScholarDigital Library
- Y. Xie, F. Yu, K. Achan, R. Panigrahy, G. Hulten, and I. Osipkov, "Spamming botnets: signatures and characteristics," SIGCOMM Comput. Commun. Rev., vol. 38, no. 4, pp. 171--182, Aug. 2008. Google ScholarDigital Library
- R. Brendel and H. Krawczyk, "Application of social relation graphs for early detection of transient spammers," WSEAS Trans. Info. Sci. and App., vol. 5, no. 3, pp. 267--276, Mar. 2008. Google ScholarDigital Library
- M. Fire, G. Katz, and Y. Elovici, "Strangers intrusion detection-detecting spammers and fake proles in social networks based on topology anomalies," Human Journal, vol. 1, no. 1, pp. 26--39, 2012.Google Scholar
- E. Frank, M. Hall, G. Holmes, R. Kirkby, B. Pfahringer, I. Witten, and L. Trigg, "Weka," in Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach, Eds. Springer US, 2005, pp. 1305--1314.Google Scholar
- B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi, "On the evolution of user interaction in Facebook," in Proc. Workshop on Online Social Networks, 2009, pp. 37--42. Google ScholarDigital Library
- B. Klimt and Y. Yang, "The Enron corpus: A new dataset for email classification research," in Proc. European Conf. on Machine Learning, 2004, pp. 217--226.Google ScholarDigital Library
- M. Bouguessa, "An unsupervised approach for identifying spammers in social networks," in Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence, ser. ICTAI '11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 832--840. Google ScholarDigital Library
- A. Lancichinetti and S. Fortunato, "Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities," Physical Review E, vol. 80, p. 016118, 2009.Google ScholarCross Ref
- J. R. Quinlan, C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993. Google ScholarDigital Library
- Y. Freund and L. Mason, "The alternating decision tree learning algorithm," in Proceedings of the Sixteenth International Conference on Machine Learning, ser. ICML '99. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1999, pp. 124--133. Google ScholarDigital Library
- D. W. Aha, D. Kibler, and M. K. Albert, "Instance-based learning algorithms," Mach. Learn., vol. 6, no. 1, pp. 37--66, Jan. 1991. Google ScholarDigital Library
- G. H. John and P. Langley, "Estimating continuous distributions in bayesian classifiers," in Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, ser. UAI'95. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1995, pp. 338--345. Google ScholarDigital Library
Index Terms
- Community-based features for identifying spammers in online social networks
Recommendations
Uncovering social spammers: social honeypots + machine learning
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalWeb-based social systems enable new community-based opportunities for participants to engage, share, and interact. This community value and related services like search and advertising are threatened by spammers, content polluters, and malware ...
Detecting spammers on social networks
ACSAC '10: Proceedings of the 26th Annual Computer Security Applications ConferenceSocial networking has become a popular way for users to meet and interact online. Users spend a significant amount of time on popular social network platforms (such as Facebook, MySpace, or Twitter), storing and sharing a wealth of personal information. ...
Identifying video spammers in online social networks
AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the webIn many video social networks, including YouTube, users are permitted to post video responses to other users' videos. Such a response can be legitimate or can be a video response spam, which is a video response whose content is not related to the topic ...
Comments