Abstract
Spam detection has become a necessity for successful email communications, security and convenience. This paper describes a learning process where the text of incoming emails is analysed and filtered based on the salient features identified. The method described has promising results and at the same time significantly better performance than other statistical and probabilistic methods. The salient features of emails are selected automatically based on functions combining word frequency and other discriminating matrices, and emails are then encoded into a representative vector model. Several classifiers are then used for identifying spam, and self-organising maps seem to give significantly better results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Manomaisupat, P., Vrusias, B., Ahmad, K.: Categorization of Large Text Collections: Feature Selection for Training Neural Networks. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 1003–1013. Springer, Heidelberg (2006)
Kohonen, T.: Self-organizing maps, 2nd edn. Springer, New York (1997)
Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naïve Bayes – Which Naïve Bayes? In: CEAS, 3rd Conf. on Email and AntiSpam, California, USA (2006)
Zhang, L., Zhu, J., Yao, T.: An Evaluation of Statistical Spam Filtering Techniques. ACM Trans. on Asian Language Information Processing 3(4), 243–269 (2004)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: Learning for Text Categorization – Papers from the AAAI Workshop, Madison, Wisconsin, pp. 55–62 (1998)
Androutsopoulos, I., Paliouras, G., Karkaletsi, V., Sakkis, G., Spyropoulos, C.D., Stamatopoulos, P.: Learning to Filter Spam E-Mail: A Comparison of a Naïve Bayesian and a Memory-Based Approach. In: Proceedings of the Workshop Machine Learning and Textual Information Access. 4th European Conf. on KDD, Lyon, France, pp. 1–13 (2000)
Youn, S., McLeod, D.: Efficient Spam Email Filtering using Adaptive Ontology. In: 4th International Conf. on Information Technology, ITNG 2007, pp. 249–254 (2007)
Hunt, R., Carpinter, J.: Current and New Developments in Spam Filtering. In: 14th IEEE International Conference on Networks, ICON 2006, vol. 2, pp. 1–6 (2006)
Peng, F., Schuurmans, D., Wang, S.: Augmenting Naive Bayes Classifiers with Statistical Language Models. Information Retrieval 7, 317–345 (2004)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Vrusias, B.: Combining Unsupervised Classifiers: A Multimodal Case Study, PhD thesis, University of Surrey (2004)
Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vrusias, B., Golledge, I. (2009). Adaptable Text Filters and Unsupervised Neural Classifiers for Spam Detection. In: Corchado, E., Zunino, R., Gastaldo, P., Herrero, Á. (eds) Proceedings of the International Workshop on Computational Intelligence in Security for Information Systems CISIS’08. Advances in Soft Computing, vol 53. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88181-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-88181-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88180-3
Online ISBN: 978-3-540-88181-0
eBook Packages: EngineeringEngineering (R0)