A Counting-Based Method for Massive Spam Mail Classification

Luo, Hao; Fang, Binxing; Yun, Xiaochun

doi:10.1007/11689522_5

Hao Luo¹⁹,
Binxing Fang¹⁹ &
Xiaochun Yun¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 3903))

Included in the following conference series:

International Conference on Information Security Practice and Experience

490 Accesses
1 Citations

Abstract

The past research works have explored the effectiveness of machine learning classifiers for filtering spam email, and the results have shown that machine learning classifiers can obtain a high degree of precision and recall. However, these methods cannot avoid classifying normal mail as spam mail for probability characteristics. The evident difference between spam mail and normal mail is that one spam mail will be delivered to many users, while most normal mails have only one single receiver. Based on this observation, this paper presents a server-based massive mail classifier incorporating counting-based classifier, bitmap-based white list (BWL) and grey list to filter massive spam mails. Results show that the spam mail classifier using our method can filter spam with a very low degree of false positive and also preserves performance while coping with large volumes of spam mail. With optimized parameter configuration, our method achieves a precision of 100% and recall of 75.3% in spam mail classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Paul, N.C., Monitor, C.S.: New strategies aimed at blocking spam e-mail, http://newsobserver.com/24hour-/technology/story/655215p-4921708c.html
Nelson, M.: Anti-spam for business and isps: Market size 2003-2008. Ferris research - analyzer information service report (2003)
Google Scholar
Fallows, D.: Spam: How it is hurting e-mail and degrading life on the internet Tech. Rep. 1100, PEW Internet & American Life Project (2003)
Google Scholar
Harris, E.: The next step in the spam control war: Greylisting (2003), http://projects.puremagic.com/greylisting/
Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorization. In: Proc. of 19th ACM International conference on Research and Development in Information Reterival (1996)
Google Scholar
Gabber, E., Jakobsson, M., Matias, Y., Mayer, A.: Curbing Junk E-Mail via Secure Classification. Financial Cryptography (1998)
Google Scholar
Sahami, M., Dumais, S., Hecherman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-Mail. AAAI Tech. Rep. WS-98-05 (1998)
Google Scholar
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naïve Bayesian Anti-Spam Filtering. In: Proc. of Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning, Barcelona (2000)
Google Scholar
Iooannidis, J.: Fighting Spam by Encapsulating Policy in Email Addressed. In: 10th Network and Distributed System Security Symposium (2003)
Google Scholar
Postel, J.B.: Simple Mail Transfer Protocol, http://www.faqs.org/rfcs/rfc821.html
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Communications of the ACM 18(6), 333–340 (1975)
Article MATH MathSciNet Google Scholar
Apache SpamAssassin Project. http://spamassassin.apache.org/

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001
Hao Luo, Binxing Fang & Xiaochun Yun

Authors

Hao Luo
View author publications
You can also search for this author in PubMed Google Scholar
Binxing Fang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Yun
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240, Shanghai, China
Kefei Chen & Xuejia Lai &
Singapore Management University (SMU), 80 Stamford Road, 178902, Singapore
Robert Deng
Cryptography and Security Department Institute for Infocomm Research, Singapore
Jianying Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, H., Fang, B., Yun, X. (2006). A Counting-Based Method for Massive Spam Mail Classification. In: Chen, K., Deng, R., Lai, X., Zhou, J. (eds) Information Security Practice and Experience. ISPEC 2006. Lecture Notes in Computer Science, vol 3903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11689522_5

Download citation

DOI: https://doi.org/10.1007/11689522_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33052-3
Online ISBN: 978-3-540-33058-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics