Abstract
Content filtering is a popular approach to spam detection. It focuses on analysis of the message content to identify spam. In this paper, we evaluate the use of social network analysis measures to improve the performance of a content filtering model. By measuring the degree centrality of message transfer agents, we observed performance improvements for spam detection in repeated experiments; e.g. a 70% increase in the proportion of spam detected with a false positive rate of 0.1%. We were also able to use anomaly detection to identify mislabeled messages in a publicly available spam data set. Messages claiming unusually long paths between the sender’s message transfer agent and the recipient’s message transfer agent turned out to be spam.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Calais, P., Guedes, D., Meria Jr., W., Hoepers, C., Chaves, M., Steding-Jessen, K.: Spamming Chains: A New Way of Understanding Spammer Behavior. In: Proceedings of the 6th Conference on E-Mail and Anti-Spam (2009), http://www.ceas.cc/papers-2009/ceas2009-paper-23.pdf
Cormack, G.V.: TREC 2007 Spam Track Overview. NIST Special Publication 500-274. In: The 16th Text REtrieval Conference, TREC (2007), http://trec.nist.gov/pubs/trec16/papers/SPAM.OVERVIEW16.pdf
Crocker, H.D.: Standard for the Format of ARPA Internet Text Messages. ARPANET Request for Comments (RFC) No. 822 (August 1982), http://www.ietf.org/rfc/rfc0822.txt
Fawcett, T.: An Introduction to ROC Analysis. Pattern Recognition Letters 27(8), 861–874 (2006)
Freeman, L.C.: Centrality in Social Networks: Concept Clarification. Social Networks 1(3), 215–239 (1979)
Friedman, J., Hastie, T., Tibshirani, R.: Additive Logistic Regression: A Statistical View of Boosting. Annals of Statistics 28(2), 337–407 (2000)
Kaufman, L., Rousseeuw, P.J.: Partitioning Around Medoids. In: Finding Groups in Data, pp. 68–125. Wiley-Interscience, Hoboken (2005)
Manning, C.D., Raghavan, P., Schutze, H.S.: Term Weighting, and the Vector Space Model. In: Introduction to Information Retrieval, pp. 109–133. Cambridge University Press, Cambridge (2008), http://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf
TREC 2007 Public Spam Corpus, http://plg.uwaterloo.ca/~gvcormac/treccorpus07/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
DeBarr, D., Wechsler, H. (2010). Using Social Network Analysis for Spam Detection. In: Chai, SK., Salerno, J.J., Mabry, P.L. (eds) Advances in Social Computing. SBP 2010. Lecture Notes in Computer Science, vol 6007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12079-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-12079-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12078-7
Online ISBN: 978-3-642-12079-4
eBook Packages: Computer ScienceComputer Science (R0)