Abstract
This paper focuses on the problem of designing effective spam filters using combined Näive Bayes classifiers. Firstly, we describe different tokenization methods which allow us for extracting valuable features from the e-mails. The methods are used to create training sets for individual Bayesian classifiers, because different methods of feature extraction ensure the desirable diversity of classifier ensemble. Because of the lack of an adequate analytical methods of ensemble evaluation the most valuable and diverse committees are chosen on the basis of computer experiments which are carried out on the basis of our own spam dataset. Then the number of well known fusion methods using class labels and class supports are compared to establish the final proposition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Biggio, B., Fumera, G., Roli, F.: Multiple Classifier Systems under Attack. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 74–83. Springer, Heidelberg (2010)
Hershkop, S., Stolfo, S.J.: Combining email models for false positive reduction. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 98–107. ACM, New York (2005)
Kurlej, B., Wozniak, M.: Active learning approach to concept drift problem. Logic Journal of the IGPL 20(3), 550–559 (2012)
Erdélyi, M., Benczúr, A.A., Masanés, J., Siklósi, D.: Web spam filtering in internet archives. In: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2009, pp. 17–20. ACM, New York (2009)
Pu, C., Webb, S.: Observed trends in spam construction techniques: A case study of spam evolution. In: CEAS (2006)
Henzinger, M.R., Motwani, R., Silverstein, C.: Challenges in web search engines. SIGIR Forum 36(2), 11–22 (2002)
Lai, C.C., Tsai, M.C.: An empirical performance comparison of machine learning methods for spam e-mail categorization. In: Proceedings of the Fourth International Conference on Hybrid Intelligent Systems, HIS 2004, pp. 44–48. IEEE Computer Society, Washington, DC (2004)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk E-mail. In: Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin. AAAI Technical Report WS-98-05 (1998)
Graham, P.: A plan for spam (August 2002), http://www.paulgraham.com/spam.html
Siefkes, C., Assis, F., Chhabra, S., Yerazunis, W.S.: Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 410–421. Springer, Heidelberg (2004)
Gargiulo, F., Penta, A., Picariello, A., Sansone, C.: A personal antispam system based on a behaviour-knowledge space approach. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, vol. 245, pp. 39–57. Springer, Heidelberg (2009)
Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29(1), 63–92 (2008)
Erdélyi, M., Garzó, A., Benczúr, A.A.: Web spam classification: a few features worth more. In: Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality. WebQuality 2011, pp. 27–34. ACM, New York (2011)
Kuncheva, L.I.: Combining pattern classifiers: methods and algorithms. Wiley-Interscience (2004)
Wozniak, M.: Proposition of common classifier construction for pattern recognition with context task. Knowledge-Based Systems 19(8), 617–624 (2006)
van Erp, M., Vuurpijl, L., Schomaker, L.: An overview and comparison of voting methods for pattern recognition. In: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition, IWFHR 2002. IEEE Computer Society, Washington, DC (2002)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus (2006)
Duin, R.P.W.: The combining classifier: To train or not to train? In: International Conference on Pattern Recognition, vol. 2, p. 20765 (2002)
Jacobs, R.A.: Methods for combining experts’ probability assessments. Neural Comput. 7(5), 867–888 (1995)
Burduk, R.: Imprecise information in bayes classifier. Pattern Anal. Appl. 15(2), 147–153 (2012)
Shipp, C.A., Kuncheva, L.I.: Relationships between combination methods and measures of diversity in combining classifiers. Information Fusion 3, 135–148 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wrótniak, K., Woźniak, M. (2013). Combined Bayesian Classifiers Applied to Spam Filtering Problem. In: Herrero, Á., et al. International Joint Conference CISIS’12-ICEUTE´12-SOCO´12 Special Sessions. Advances in Intelligent Systems and Computing, vol 189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33018-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-33018-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33017-9
Online ISBN: 978-3-642-33018-6
eBook Packages: EngineeringEngineering (R0)