Abstract
For addressing the growing problem of junk E-mail on the Internet, this paper proposes an effective E-mail classifying technique. Our work handles E-mail messages as semi-structured documents consisting of a set of fields with predefined semantics and a number of variable length free-text contents. The main contributions of this paper include the following: First, we present a Support Vector Machine (SVM) based model that incorporates the Principal Component Analysis (PCA) technique to reduce the data in terms of size and dimensionality of the input feature space. As a result, the input data become classifiable with fewer features, and the training process has faster convergence speed. Second, we build the classification model using both the \(\mathcal{C}\)-support vector machine and v-support vector machine algorithms. Various control parameters for performance tuning are studied in an extensive set of experiments. The results of our performance evaluation indicate that the proposed technique is effective in E-mail classification.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Burges, C.J.C.: A Tutorial on Support Vector Machine for Pattern Recognition Data Mining and Knowledge Discovery 2, 121–167 (1998)
Cohen, W.W.: Learning rules that classify e-mail. In: Proc. AAAI Spring Symposium on Machine Learning in Information Access, pp. 124–143 (1996)
Cui, B., Mondal, A., Shen, J., Cong, G., Tan, K.-L.: On Effective E-mail Classification via Neural Networks. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 85–94. Springer, Heidelberg (2005)
Diao, Y., Lu, H., Wu, D.: A Comparative Study of Classification Based Personal E-mail Filtering. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 408–419. Springer, Heidelberg (2000)
Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machine for Spam Categorization. IEEE Trans. on Neural Networks 10(5), 1048–1054 (1999)
Joachims, T.: Making large-Scale SVM Learning Practical. In: Advances in KernelMethods - Support Vector Learning, ch. 11. MIT Press, Cambridge (1999)
Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (1986)
Kiritchenko, S., Matwin, S.: E-mail Classification with Co-Training. In: Proc. Of CASCON, Toronto, Canada, pp. 192–201 (2001)
Rüping, S.: mySVM-Manual. University of Dortmund, Lehrstuhl Informatik 8 (2000), http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach tofiltering junk e-mail. In: Proc. AAAI Workshop Learning for Text Categorization, Madison, Wisconsin (1998)
Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Computation 12, 1207–1245 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shou, L., Cui, B., Chen, G., Dong, J. (2006). Classifying E-Mails Via Support Vector Machine. In: Yu, J.X., Kitsuregawa, M., Leong, H.V. (eds) Advances in Web-Age Information Management. WAIM 2006. Lecture Notes in Computer Science, vol 4016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11775300_36
Download citation
DOI: https://doi.org/10.1007/11775300_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35225-9
Online ISBN: 978-3-540-35226-6
eBook Packages: Computer ScienceComputer Science (R0)