Abstract
In this paper, we investigate how to combine multiple e-mail filters based on multivariate statistical analysis for providing a barrier to spam, which is stronger than a single filter alone. Three evaluation criteria are suggested for cost-sensitive filters, and their rationality is discussed. Furthermore, a principle that minimizes the error cost is described to avoid filtering an e-mail of “Legitimate” into “Spam”. Comparing with other major methods, the experimental results show that our method of combining multiple filters has preferable performance when appropriate running parameters are adopted.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Elisabeth, C., Judy, K., Eric, M.: Automatic induction of rules for e-mail classification. In: Proceedings of the Australasian Document Computing Symposium (2001)
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., et al.: An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 160–167 (2000)
Sahami, M., Dumais, S., Heckerman, D., et al.: A bayesian approach to filtering junk e-mail. In: Proceedings the AAAI Workshop on Learning for Text Categorization, Madison Wisconsin, pp. 55–62 (1998)
Soonthornphisaj, N., Chaikulseriwat, K., Tang-On, P.: Anti-spam filtering: a centroid-based classification approach. In: Proceedings of the International Conference on Signal Processing, pp. 1096–1099 (2002)
James, C., Irena, K., Josiah, P.: A neural network based approach to automated e-mail classification. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence, pp. 702–711 (2003)
Sun, D., Tran, Q.A., Duan, H., et al.: A novel method for Chinese spam detection based on one-class support vector machine. Journal of Information and Computational Science 2(1), 109–114 (2005)
Li, W.B., Liu, C.N., Chen, Y.Y.: Combining multiple email filters of Naive Bayes based on GMM. Acta Electronica Sinica 34(2), 247–251 (2006)
Jos, M.G.H., Manuel, M.L., Enrique, P.S.: Combining text and heuristics for cost-sensitive spam filtering. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, vol. 7, pp. 99–102 (2000)
Segal, R., Crawford, J., Kephart, J., et al.: SpamGuru: An enterprise anti-spam filtering system. In: Proceedings of the First Conference on Email and Anti-Spam (2004)
Hardle, W., Simar, L.: Applied Multivariate Statistical Analysis, 341–357 (2003)
Lewis, D.D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: Proceedings of AAAI 1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)
PU1 and Ling-Spam dataset: http://iit.demokritos.gr/skel/i-config/downloads/
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp. 412–420 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, W., Zhong, N., Liu, C. (2006). Combining Multiple Email Filters Based on Multivariate Statistical Analysis. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds) Foundations of Intelligent Systems. ISMIS 2006. Lecture Notes in Computer Science(), vol 4203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875604_81
Download citation
DOI: https://doi.org/10.1007/11875604_81
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45764-0
Online ISBN: 978-3-540-45766-4
eBook Packages: Computer ScienceComputer Science (R0)