Two Phase Approach for Spam-Mail Filtering

Kang, Sin-Jae; Lee, Sae-Bom; Kim, Jong-Wan; Nam, In-Gil

doi:10.1007/978-3-540-30497-5_124

Sin-Jae Kang¹⁹,
Sae-Bom Lee¹⁹,
Jong-Wan Kim¹⁹ &
…
In-Gil Nam¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3314))

Included in the following conference series:

International Conference on Computational and Information Science

1194 Accesses

Abstract

This paper describes a two-phase method for filtering spam mails based on textual information and hyperlinks. Since the body of a spam mail has little text information, it provides insufficient hints to distinguish spam mails from legitimate mails. To resolve this problem, we follows hyperlinks contained in the email body, fetches contents of a remote webpage, and extracts hints (i.e., features) from original email body and fetched webpages. We divided hints into two kinds of information: definite information and less definite textual information. In our experiment, the method of fetching web pages achieved an improvement of F-measure by 9.4% over the method of using an original email header and body only.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cranor, L.F., LaMacchia, B.A.: Spam! Communications of ACM 41(8), 74–83 (1998)
Article Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk e-mail. In: AAAI 1998 Workshop on Learning for Text Categorization, pp. 55–62 (1998)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
MATH Google Scholar
Drucker, H., Wu, D., Vapnik, V.: Support Vector Machines for Spam Categorization. IEEE Trans. on Neural Networks 10(5), 1048–1054 (1999)
Article Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML (1998)
Google Scholar
Yang, J., Chalasani, V., Park, S.: Intelligent email categorization based on textual information and metadata. IEICE Transactions on Information and System E86-D (7), 1280–1288 (2003)
Google Scholar
Yang, Y., Pedersen, J.P.: A comparative study on feature selection in text categorization. In: Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and Techniques with java implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Technology, Daegu University, Gyeonsan, Gyeongbuk, 712-714, South Korea
Sin-Jae Kang, Sae-Bom Lee, Jong-Wan Kim & In-Gil Nam

Authors

Sin-Jae Kang
View author publications
You can also search for this author in PubMed Google Scholar
Sae-Bom Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Wan Kim
View author publications
You can also search for this author in PubMed Google Scholar
In-Gil Nam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electronic Science and Technology, Anhui University, Hefei, Anhui, China
Jun Zhang
College of Science, Donghua University, 1882 Yan’an Xilu Road, 20051, Shanghai, China
Ji-Huan He
BASICS, Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
Yuxi Fu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, SJ., Lee, SB., Kim, JW., Nam, IG. (2004). Two Phase Approach for Spam-Mail Filtering. In: Zhang, J., He, JH., Fu, Y. (eds) Computational and Information Science. CIS 2004. Lecture Notes in Computer Science, vol 3314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30497-5_124

Download citation

DOI: https://doi.org/10.1007/978-3-540-30497-5_124
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24127-0
Online ISBN: 978-3-540-30497-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics