Largemargin classification for combating disguise attacks on spam filters

Zhou, Xi-chuan; Shen, Hai-bin; Huang, Zhi-yong; Li, Guo-jun

doi:10.1631/jzus.C1100259

Largemargin classification for combating disguise attacks on spam filters

Published: 06 March 2012

Volume 13, pages 187–195, (2012)
Cite this article

Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Xi-chuan Zhou¹,
Hai-bin Shen²,
Zhi-yong Huang¹ &
…
Guo-jun Li³

117 Accesses
2 Citations
Explore all metrics

Abstract

This paper addresses the challenge of large margin classification for spam filtering in the presence of an adversary who disguises the spam mails to avoid being detected. In practice, the adversary may strategically add good words indicative of a legitimate message or remove bad words indicative of spam. We assume that the adversary could afford to modify a spam message only to a certain extent, without damaging its utility for the spammer. Under this assumption, we present a large margin approach for classification of spam messages that may be disguised. The proposed classifier is formulated as a second-order cone programming optimization. We performed a group of experiments using the TREC 2006 Spam Corpus. Results showed that the performance of the standard support vector machine (SVM) degrades rapidly when more words are injected or removed by the adversary, while the proposed approach is more stable under the disguise attack.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised Machine Learning Classifier for Email Spam Filtering

Spam Filtering Using Regularized Neural Networks with Rectified Linear Units

A Comparative Study of Spam SMS Detection Techniques for English Content Using Supervised Machine Learning Algorithms

References

Carpinter, J., Hunt, R., 2006. Tightening the net: a review of current and next generation spam filtering tools. Comput. Secur., 25(8):566–578. [doi:10.1016/j.cose.2006.06.001]
Article Google Scholar
Chang, C., Lin, C., 2011. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol., 2:27:1–27:27.
Article Google Scholar
Chapelle, O., 2007. Training a support vector machine in the primal. Neur. Comput., 19(5):1155–1178. [doi:10.1162/neco.2007.19.5.1155]
Article MathSciNet MATH Google Scholar
Chechik, G., Heitz, G., Elidan, G., Abbeel, P., Koller, D., 2008. Max-margin classification of data with absent features. J. Mach. Learn. Res., 9:1–21.
MATH Google Scholar
Dalvi, N., Domingos, P., Mausam, Sanghai, S., Verma, D., 2004. Adversarial Classification. Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.99–108. [doi:10.1145/1014052.1014066]
Debnath, R., Muramatsu, M., Takahashi, H., 2004. The Support Vector Machine Learning Using the Second Order Cone Programming. Proc. IEEE Int. Joint Conf. on Neural Networks, 4:2991–2996.
Google Scholar
Drucker, H., Wu, D., Vapnik, V.N., 1999. Support vector machines for spam categorization. IEEE Trans. Neur. Networks, 10(5):1048–1054. [doi:10.1109/72.788645]
Article Google Scholar
Jennings, R., 2005. The Global Economic Impact of Spam. Technical Report, Ferris Research, San Diego, CA, USA.
Jorgensen, Z., Zhou, Y., Inge, M., 2008. A multiple instance learning strategy for combating good word attacks on spam filters. J. Mach. Learn. Res., 9:1115–1146.
Google Scholar
Krause, N., Singer, Y., 2004. Leveraging the Margin More Carefully. Int. Conf. on Machine Learning.
Lowd, D., Meek, C., 2005a. Adversarial Learning. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining.
Lowd, D., Meek, C., 2005b. Good Word Attacks on Statistical Spam Filters. Proc. 2nd Conf. on Email and Anti-Spam.
MOSEK, 2011. The MOSEK Optimization Tools Version 6.0. User’s Manual and Reference 2011. Available from www.mosek.com
Shivaswamy, P.K., Bhattacharyya, C., Smola, A.J., 2006. Second order cone programming approaches for handling missing and uncertain data. J. Mach. Learn. Res., 7:1283–1314.
MathSciNet MATH Google Scholar
Song, Q., Hu, W., Xie, W., 2002. Robust support vector machine with bullet hole image classification. IEEE Trans. Syst. Man Cybern. C, 32(4):440–448. [doi:10.1109/TSMCC.2002.807277]
Article Google Scholar
Webb, S., Chitti, S., Pu, C., 2005. An Experimental Evaluation of Spam Filter Performance and Robustness Against Attack. 1st Int. Conf. on Collaborative Computing: Networking, Applications and Worksharing, p.19–21.
Wu, Y., Liu, Y., 2007. Robust truncated hinge loss support vector machines. J. Am. Stat. Assoc., 102(479):974–983. [doi:10.1198/016214507000000617]
Article MATH Google Scholar
Xu, L., Crammer, K., Schuurmans, D., 2006. Robust Support Vector Machine Training via Convex Outlier Ablation. Proc. National Conf. of Artificial Intelligence, p.1–7.

Download references

Author information

Authors and Affiliations

College of Communications Engineering, Chongqing University, Chongqing, 400044, China
Xi-chuan Zhou & Zhi-yong Huang
Institute of VLSI Design, Zhejiang University, Hangzhou, 310027, China
Hai-bin Shen
Chongqing Communication Institute, Chongqing, 400032, China
Guo-jun Li

Authors

Xi-chuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hai-bin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-yong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Guo-jun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi-chuan Zhou.

Additional information

Project supported by the National Natural Science Foundation of China (No. 61103212) and the Natural Science Foundation of CQ CSTC, China (No. cstcjjA40005)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Xc., Shen, Hb., Huang, Zy. et al. Largemargin classification for combating disguise attacks on spam filters. J. Zhejiang Univ. - Sci. C 13, 187–195 (2012). https://doi.org/10.1631/jzus.C1100259

Download citation

Received: 02 September 2011
Accepted: 25 October 2011
Published: 06 March 2012
Issue Date: March 2012
DOI: https://doi.org/10.1631/jzus.C1100259

Key words

CLC number

TP393.098

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Largemargin classification for combating disguise attacks on spam filters

Abstract

Access this article

Similar content being viewed by others

Supervised Machine Learning Classifier for Email Spam Filtering

Spam Filtering Using Regularized Neural Networks with Rectified Linear Units

A Comparative Study of Spam SMS Detection Techniques for English Content Using Supervised Machine Learning Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Largemargin classification for combating disguise attacks on spam filters

Abstract

Access this article

Similar content being viewed by others

Supervised Machine Learning Classifier for Email Spam Filtering

Spam Filtering Using Regularized Neural Networks with Rectified Linear Units

A Comparative Study of Spam SMS Detection Techniques for English Content Using Supervised Machine Learning Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation