Skip to main content

Classifying E-Mails Via Support Vector Machine

  • Conference paper
  • 1195 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4016))

Abstract

For addressing the growing problem of junk E-mail on the Internet, this paper proposes an effective E-mail classifying technique. Our work handles E-mail messages as semi-structured documents consisting of a set of fields with predefined semantics and a number of variable length free-text contents. The main contributions of this paper include the following: First, we present a Support Vector Machine (SVM) based model that incorporates the Principal Component Analysis (PCA) technique to reduce the data in terms of size and dimensionality of the input feature space. As a result, the input data become classifiable with fewer features, and the training process has faster convergence speed. Second, we build the classification model using both the \(\mathcal{C}\)-support vector machine and v-support vector machine algorithms. Various control parameters for performance tuning are studied in an extensive set of experiments. The results of our performance evaluation indicate that the proposed technique is effective in E-mail classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burges, C.J.C.: A Tutorial on Support Vector Machine for Pattern Recognition Data Mining and Knowledge Discovery 2, 121–167 (1998)

    Google Scholar 

  2. Cohen, W.W.: Learning rules that classify e-mail. In: Proc. AAAI Spring Symposium on Machine Learning in Information Access, pp. 124–143 (1996)

    Google Scholar 

  3. Cui, B., Mondal, A., Shen, J., Cong, G., Tan, K.-L.: On Effective E-mail Classification via Neural Networks. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 85–94. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Diao, Y., Lu, H., Wu, D.: A Comparative Study of Classification Based Personal E-mail Filtering. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 408–419. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  5. Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machine for Spam Categorization. IEEE Trans. on Neural Networks 10(5), 1048–1054 (1999)

    Article  Google Scholar 

  6. Joachims, T.: Making large-Scale SVM Learning Practical. In: Advances in KernelMethods - Support Vector Learning, ch. 11. MIT Press, Cambridge (1999)

    Google Scholar 

  7. Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (1986)

    Google Scholar 

  8. Kiritchenko, S., Matwin, S.: E-mail Classification with Co-Training. In: Proc. Of CASCON, Toronto, Canada, pp. 192–201 (2001)

    Google Scholar 

  9. Rüping, S.: mySVM-Manual. University of Dortmund, Lehrstuhl Informatik 8 (2000), http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/

  10. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach tofiltering junk e-mail. In: Proc. AAAI Workshop Learning for Text Categorization, Madison, Wisconsin (1998)

    Google Scholar 

  11. Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Computation 12, 1207–1245 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shou, L., Cui, B., Chen, G., Dong, J. (2006). Classifying E-Mails Via Support Vector Machine. In: Yu, J.X., Kitsuregawa, M., Leong, H.V. (eds) Advances in Web-Age Information Management. WAIM 2006. Lecture Notes in Computer Science, vol 4016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11775300_36

Download citation

  • DOI: https://doi.org/10.1007/11775300_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35225-9

  • Online ISBN: 978-3-540-35226-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics