Skip to main content
Log in

Largemargin classification for combating disguise attacks on spam filters

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

This paper addresses the challenge of large margin classification for spam filtering in the presence of an adversary who disguises the spam mails to avoid being detected. In practice, the adversary may strategically add good words indicative of a legitimate message or remove bad words indicative of spam. We assume that the adversary could afford to modify a spam message only to a certain extent, without damaging its utility for the spammer. Under this assumption, we present a large margin approach for classification of spam messages that may be disguised. The proposed classifier is formulated as a second-order cone programming optimization. We performed a group of experiments using the TREC 2006 Spam Corpus. Results showed that the performance of the standard support vector machine (SVM) degrades rapidly when more words are injected or removed by the adversary, while the proposed approach is more stable under the disguise attack.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Carpinter, J., Hunt, R., 2006. Tightening the net: a review of current and next generation spam filtering tools. Comput. Secur., 25(8):566–578. [doi:10.1016/j.cose.2006.06.001]

    Article  Google Scholar 

  • Chang, C., Lin, C., 2011. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol., 2:27:1–27:27.

    Article  Google Scholar 

  • Chapelle, O., 2007. Training a support vector machine in the primal. Neur. Comput., 19(5):1155–1178. [doi:10.1162/neco.2007.19.5.1155]

    Article  MathSciNet  MATH  Google Scholar 

  • Chechik, G., Heitz, G., Elidan, G., Abbeel, P., Koller, D., 2008. Max-margin classification of data with absent features. J. Mach. Learn. Res., 9:1–21.

    MATH  Google Scholar 

  • Dalvi, N., Domingos, P., Mausam, Sanghai, S., Verma, D., 2004. Adversarial Classification. Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.99–108. [doi:10.1145/1014052.1014066]

  • Debnath, R., Muramatsu, M., Takahashi, H., 2004. The Support Vector Machine Learning Using the Second Order Cone Programming. Proc. IEEE Int. Joint Conf. on Neural Networks, 4:2991–2996.

    Google Scholar 

  • Drucker, H., Wu, D., Vapnik, V.N., 1999. Support vector machines for spam categorization. IEEE Trans. Neur. Networks, 10(5):1048–1054. [doi:10.1109/72.788645]

    Article  Google Scholar 

  • Jennings, R., 2005. The Global Economic Impact of Spam. Technical Report, Ferris Research, San Diego, CA, USA.

  • Jorgensen, Z., Zhou, Y., Inge, M., 2008. A multiple instance learning strategy for combating good word attacks on spam filters. J. Mach. Learn. Res., 9:1115–1146.

    Google Scholar 

  • Krause, N., Singer, Y., 2004. Leveraging the Margin More Carefully. Int. Conf. on Machine Learning.

  • Lowd, D., Meek, C., 2005a. Adversarial Learning. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining.

  • Lowd, D., Meek, C., 2005b. Good Word Attacks on Statistical Spam Filters. Proc. 2nd Conf. on Email and Anti-Spam.

  • MOSEK, 2011. The MOSEK Optimization Tools Version 6.0. User’s Manual and Reference 2011. Available from www.mosek.com

  • Shivaswamy, P.K., Bhattacharyya, C., Smola, A.J., 2006. Second order cone programming approaches for handling missing and uncertain data. J. Mach. Learn. Res., 7:1283–1314.

    MathSciNet  MATH  Google Scholar 

  • Song, Q., Hu, W., Xie, W., 2002. Robust support vector machine with bullet hole image classification. IEEE Trans. Syst. Man Cybern. C, 32(4):440–448. [doi:10.1109/TSMCC.2002.807277]

    Article  Google Scholar 

  • Webb, S., Chitti, S., Pu, C., 2005. An Experimental Evaluation of Spam Filter Performance and Robustness Against Attack. 1st Int. Conf. on Collaborative Computing: Networking, Applications and Worksharing, p.19–21.

  • Wu, Y., Liu, Y., 2007. Robust truncated hinge loss support vector machines. J. Am. Stat. Assoc., 102(479):974–983. [doi:10.1198/016214507000000617]

    Article  MATH  Google Scholar 

  • Xu, L., Crammer, K., Schuurmans, D., 2006. Robust Support Vector Machine Training via Convex Outlier Ablation. Proc. National Conf. of Artificial Intelligence, p.1–7.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi-chuan Zhou.

Additional information

Project supported by the National Natural Science Foundation of China (No. 61103212) and the Natural Science Foundation of CQ CSTC, China (No. cstcjjA40005)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Xc., Shen, Hb., Huang, Zy. et al. Largemargin classification for combating disguise attacks on spam filters. J. Zhejiang Univ. - Sci. C 13, 187–195 (2012). https://doi.org/10.1631/jzus.C1100259

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C1100259

Key words

CLC number

Navigation