Abstract
This paper describes a two-phase method for filtering spam mails based on textual information and hyperlinks. Since the body of a spam mail has little text information, it provides insufficient hints to distinguish spam mails from legitimate mails. To resolve this problem, we follows hyperlinks contained in the email body, fetches contents of a remote webpage, and extracts hints (i.e., features) from original email body and fetched webpages. We divided hints into two kinds of information: definite information and less definite textual information. In our experiment, the method of fetching web pages achieved an improvement of F-measure by 9.4% over the method of using an original email header and body only.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cranor, L.F., LaMacchia, B.A.: Spam! Communications of ACM 41(8), 74–83 (1998)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk e-mail. In: AAAI 1998 Workshop on Learning for Text Categorization, pp. 55–62 (1998)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Drucker, H., Wu, D., Vapnik, V.: Support Vector Machines for Spam Categorization. IEEE Trans. on Neural Networks 10(5), 1048–1054 (1999)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML (1998)
Yang, J., Chalasani, V., Park, S.: Intelligent email categorization based on textual information and metadata. IEICE Transactions on Information and System E86-D (7), 1280–1288 (2003)
Yang, Y., Pedersen, J.P.: A comparative study on feature selection in text categorization. In: Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and Techniques with java implementations. Morgan Kaufmann, San Francisco (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kang, SJ., Lee, SB., Kim, JW., Nam, IG. (2004). Two Phase Approach for Spam-Mail Filtering. In: Zhang, J., He, JH., Fu, Y. (eds) Computational and Information Science. CIS 2004. Lecture Notes in Computer Science, vol 3314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30497-5_124
Download citation
DOI: https://doi.org/10.1007/978-3-540-30497-5_124
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24127-0
Online ISBN: 978-3-540-30497-5
eBook Packages: Computer ScienceComputer Science (R0)