SPADE: a social-spam analytics and detection framework

Wang, De; Irani, Danesh; Pu, Calton

doi:10.1007/s13278-014-0189-1

SPADE: a social-spam analytics and detection framework

Original Article
Published: 10 April 2014

Volume 4, article number 189, (2014)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

De Wang¹,
Danesh Irani¹ &
Calton Pu¹

483 Accesses
16 Citations
Explore all metrics

Abstract

Social media such as Facebook, MySpace, and Twitter have become increasingly important for attracting millions of users. Consequently, spammers are increasing using such networks for propagating spam. Although existing filtering techniques such as collaborative filters and behavioral analysis filters are able to significantly reduce spam, each social network needs to build its own independent spam filter and support a spam team to keep spam prevention techniques current. To alleviate those problems, we propose a framework for spam analytics and detection which can be used across all social network sites. Specifically, the proposed framework SPADE has numerous benefits including (1) new spam detected on one social network can quickly be identified across social networks; (2) accuracy of spam detection will be improved through cross-domain classification and associative classification; (3) other techniques (such as blacklists and message shingling) can be integrated and centralized; (4) new social networks can plug into the system easily, preventing spam at an early stage. In SPADE, we present a uniform schema model to allow cross-social network integration. In this paper, we define the user, message, and web page model. Moreover, we provide an experimental study of real datasets from social networks to demonstrate the flexibility and feasibility of our framework. We extensively evaluated two major classification approaches in SPADE: cross-domain classification and associative classification. In cross-domain classification, SPADE achieved over 0.92 F-measure and over 91 % detection accuracy on web page model using Naïve Bayes classifier. In associative classification, SPADE also achieved 0.89 F-measure on message model and 0.87 F-measure on user profile model, respectively. Both detection accuracies are beyond 85 %. Based on those results, our SPADE has been demonstrated to be a competitive spam detection solution to social media.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online social networks security and privacy: comprehensive review and analysis

Article Open access 01 June 2021

Social media analytics: a survey of techniques, tools and platforms

Article Open access 26 July 2014

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Article Open access 11 May 2022

Notes

Weka is an open-source collection of machine learning algorithms that has become a standard tool in the machine learning community.
https://mahout.apache.org/.

References

Becchetti L, Castillo C, Donato D, Baeza-Yates R, Leonardi S (2008) Link analysis for web spam detection. ACM Trans Web 2(1):42. Art No. 2
Google Scholar
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Proceedings of the seventh annual collaboration, electronic messaging, antiabuse and spam conference (CEAS 2010)
Bosma M, Meij E, Weerkamp W (2012) A framework for unsupervised spam detection in social networking sites. In: ECIR 2012 34th European conference on information retrieval, Barcelona, pp 364–375
Byun B, Lee C, Webb S, Irani D, Pu C (2009) An anti-spam filter combination framework for text-and-image emails through incremental learning. In: Proceedings of the sixth conference on email and anti-spam (CEAS 2009)
Carreras X, Marquez L (2001) Boosting trees for anti-spam email filtering. Arxiv preprint
Caverlee J, Liu L, Webb S (2008) Socialtrust: tamper-resilient trust establishment in online communities. In: Proceedings of the 8th ACM/IEEE-CS joint conference on digital libraries
Caverlee J, Webb S (2008) A large-scale study of MySpace: observations and implications for online social networks. Proceedings of the international conference on weblogs and social media 8
Drucker H, Wu D, Vapnik V (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5):1048–1054
Article Google Scholar
Fazeen M, Dantu R, Guturu P (2011) Identification of leaders, lurkers, associates and spammers in a social network: context-dependent and context-independent approaches. Soc Netw Anal Min 1(3):241–254
Article Google Scholar
Fetterly D, Manasse M, Najork M (2003) On the evolution of clusters of near-duplicate web pages. In: Proceedings of the first conference on Latin American web congress, LA-WEB ’03
Fetterly D, Manasse M, Najork M (2004) Spam, damn spam, and statistics: using statistical analysis to locate spam web pages. In: Proceedings of the 7th international workshop on the web and databases: colocated with ACM SIGMOD/PODS 2004, WebDB ’04
Fetterly D, Manasse M, Najork M (2005) Detecting phrase-level duplication on the world wide web. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’05
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407
Article MathSciNet MATH Google Scholar
Google opensocial API (2011). http://code.google.com/apis/opensocial/
Gosier G (2009) Social networks as an attack platform: Facebook case study. In: Proceedings of the eighth international conference on networks
Gyongyi Z, Garcia-Monlina H, Pedersen J (2004) Combating web spam with trustrank. In: Proceeding of the thirtieth international conference on very large data bases, vol 30
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The WEKA data mining software. ACM SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Han B, Baldwin T (2011) Lexical normalisation of short text messages: makn sens a #twitter. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. HLT ’11 association for computational linguistics, Stroudsburg, pp 368–378
Han JS, Park BJ (2012) Efficient detection of content polluters in social networks. In: ICITCS, pp 991–996
Hao S, Syed NA, Feamster N, Gray AG, Krasser S (2009) Detecting spammers with snare: Spatio-temporal network-level automatic reputation engine. In: Proceedings of the 18th conference on USENIX security symposium., SSYM’09CA, Berkeley, pp 101–118
He Q, Zhuang F, Li J, Shi Z (2010) Parallel implementation of classification algorithms based on MapReduce. Rough set and knowledge technology. Lecture notes in computer science vol 6401, pp 655–662
Hirai J, Raghavan S, Garcia-Molina H, Paepcke A (2000) WebBase: a repository of web pages. Comput Netw 33(1–6):277–293
Article Google Scholar
HOOTSUITE social media dashboard (2011). http://hootsuite.com/
Irani D, Webb S, Giffin J, Pu C (2008) Evolutionary study of phishing. In: eCrime researchers summit, pp 1–10
Irani D, Webb S, Pu C (2010) Study of static classification of social spam profiles in myspace. In: Proceedings of the international AAAI conference on weblogs and social media
Irani D, Webb S, Pu C, Li K (2010) Study of trend-stuffing on twitter through text classification. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS 2010)ACM, New York, pp 112–117
Jensen D, Neville J, Gallagher B (2004) Why collective inference improves relational classification. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’04, pp 593–598
Jin X, Lin CX, Luo J, Han J (2011) Socialspamguard: a data mining-based spam detection system for social media networks. In: Proceedings of the international conference on very large data bases
Kreibich C, Kanich C, Levchenko K, Enright B, Voelker G, Paxson V, Savage S (2008) On the spam campaign trail. In: Proceedings of the 1st usenix workshop on large-scale exploits and emergent threats, USENIX association, pp 1–9
Learmonth M (2010) Twitter getting serious about spam issue. http://adage.com/article/digital/digital-marketing-twitter-spam-issue/142800/
Lee K, Caverlee J, Kamath KY, Cheng Z (2012) Detecting collective attention spam. In: Proceedings of the 2nd joint WICOW/AIRWeb workshop on web quality, WebQuality ’12NY, New York, pp 48–55
Lex E, Seifert C, Granitzer M, Juffinger A (2010) Efficient cross-domain classification of weblogs. Int J Intell Comput Res 1(1):36–45
Google Scholar
Liu Y, Zhang M, Ma S, Ru L (2008) User behavior oriented web spam detection. In: Proceedings of the 17th international conference on world wide web, WWW ’08
Ma Y, Wang L, Li L (2010) A parallel and convergent support vector machine based on MapReduce. In: Computer engineering and networking, Lecture notes in electrical engineering, vol 277. Springer International Publishing, pp 585–592
Modi S (2013) Relational classification using multiple view approach with voting. Int J Comput Appl 70(16):31–36. Published by Foundation of Computer Science, New York
Google Scholar
Ntoulas A, Najork M, Manasse M, Fetterly D (2006) Detecting spam web pages through content analysis. In: Proceedings of the 15th international conference on world wide web, WWW ’06
Pan SJ, Ni X, Sun JT, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th international conference on World wide web, WWW ’10, pp 751–760
Pu C, Webb S (2006) Observed trends in spam construction techniques: a case study of spam evolution. In: Proceedings of the third conference on email and anti-spam (CEAS 2006)
Pu C, Webb S, Kolesnikov O, Lee W, Lipton R (2006) Towards the integration of diverse spam filtering techniques. In: Proceedings of the IEEE international conference on granular computing (GrC06), pp 17–20
Radlinski F (2007) Addressing malicious noise in clickthrough data. In: Proceedings of the 3rd international workshop on adversarial information retrieval on the web (AIRWeb).
Rosen D, Barnett GA, Kim JH (2011) Social networks and online environments: when science and practice co-evolve. Soc Netw Anal Min 1(1):27–42
Article Google Scholar
Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk e-mail. In: Learning for text categorization: papers from the 1998 workshop, vol 62, AAAI Technical, Report WS-98-05, Madison, pp 98–05
Sebastiani F (2005) Text categorization. In: Text mining and its applications to intelligence, CRM and knowledge management, WIT Press, pp 109–129
Spirin N, Han J (2012) Survey on web spam detection: principles and algorithms. SIGKDD Explor Newsl 13(2):50–64
Article Google Scholar
Stein T, Chen E, Mangla K (2011) Facebook immune system. In: Proceedings of the forth ACM EuroSys workshop on social network systems (SNS2011)
Thomas K, Grier C, Ma J, Paxson V, Song D (2011) Design and evaluation of a real-time url spam filtering service. In: Proceedings of the IEEE symposium on security and privacy
Tweetdeck by twitter (2011). http://tweetdeck.com/
Voorhees E, Harman D, U.S. National Institute of Standards and Technology (2005) TREC: experiment and evaluation in information retrieval, MIT press, USA
Wang D (2014) Analysis and detection of low quality information in social networks. In: Proceedings of Ph.D. symposium at 30th IEEE international conference on data engineering (ICDE 2014), Chicago
Wang D, Irani D, Pu C (2011) A social-spam detection framework. In: Proceedings of the annual collaboration, electronic messaging, antiabuse and spam conference (CEAS 2011), pp 46–54
Wang D, Irani D, Pu C (2012) Evolutionary study of web spam: Webb spam corpus 2011 versus webb spam corpus 2006. In: Proceedings of 8th IEEE international conference on collaborative computing: networking, applications and worksharing (CollaborateCom), pp 40–49
Wang D, Navathe SB, Liu L, Irani D, Tamersoy A, Pu C (2013) Click traffic analysis of short url spam on twitter. In: Proceedings of 9th IEEE international conference on collaborative computing: networking, applications and worksharing (CollaborateCom), pp 250–259
Wang P, Domeniconi C, Hu J (2008) Cross-domain text classification using wikipedia. IEEE Intell Inf Bull 9(1):36–45
Google Scholar
Webb S, Caverlee J, Pu C (2006) Introducing the webb spam corpus: using email spam to identify web spam automatically. In: Proceedings of the third conference on email and anti-spam (CEAS 2006)
Webb S, Caverlee J, Pu C (2007) Characterizing web spam using content and http session analysis. In: Proceedings of the fourth conference on email and anti-spam (CEAS 2007), pp 84–89
Webb S, Caverlee J, Pu C (2008) Predicting web spam with http session information. In: Proceedings of the seventeenth conference on information and knowledge management (CIKM 2008)
Webb S, Caverlee J, Pu C (2008) Social honeypots: making friends with a spammer near you. In: Proceedings of the fifth conference on email and anti-spam (CEAS 2008)
Wolfe AW (2011) Anthropologist view of social network analysis and data mining. Soc Netw Anal Min 1(1):3–19
Article Google Scholar
Zhen Y, Li C (2008) Cross-domain knowledge transfer using semi-supervised classification. In: AI 2008: advances in artificial intelligence, vol 5360. Lecture notes in computer science, Springer, Berlin, pp 362–371
Zou M, Wang T, Li H, Yang D (2010) A general multi-relational classification approach using feature generation and selection. In: Cao L, Zhong J, Feng Y (eds) Advanced data mining and applications, vol 6441. Lecture notes in computer science, Springer, Berlin, pp 21–33

Download references

Acknowledgments

This research has been partially funded by National Science Foundation by CNS/SAVI (1250260), IUCRC/FRP (1127904), CISE/CNS (1138666), RAPID (1138666), CISE/CRI (0855180), NetSE (0905493) programs, and gifts, grants, or contracts from DARPA/I2O, Singapore Government, Fujitsu Labs, and Georgia Tech Foundation through the John P. Imlay, Jr. Chair endowment. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or other funding agencies and companies mentioned above.

Author information

Authors and Affiliations

College of Computing, Georgia Institute of Technology, Atlanta, GA, 30332-0250, USA
De Wang, Danesh Irani & Calton Pu

Authors

De Wang
View author publications
You can also search for this author in PubMed Google Scholar
Danesh Irani
View author publications
You can also search for this author in PubMed Google Scholar
Calton Pu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to De Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, D., Irani, D. & Pu, C. SPADE: a social-spam analytics and detection framework. Soc. Netw. Anal. Min. 4, 189 (2014). https://doi.org/10.1007/s13278-014-0189-1

Download citation

Received: 02 May 2013
Revised: 22 March 2014
Accepted: 27 March 2014
Published: 10 April 2014
DOI: https://doi.org/10.1007/s13278-014-0189-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SPADE: a social-spam analytics and detection framework

Abstract

Access this article

Similar content being viewed by others

Online social networks security and privacy: comprehensive review and analysis

Social media analytics: a survey of techniques, tools and platforms

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SPADE: a social-spam analytics and detection framework

Abstract

Access this article

Similar content being viewed by others

Online social networks security and privacy: comprehensive review and analysis

Social media analytics: a survey of techniques, tools and platforms

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation