Combined Bayesian Classifiers Applied to Spam Filtering Problem

Wrótniak, Karol; Woźniak, Michał

doi:10.1007/978-3-642-33018-6_26

Karol Wrótniak¹⁰ &
Michał Woźniak¹⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 189))

1899 Accesses
4 Citations

Abstract

This paper focuses on the problem of designing effective spam filters using combined Näive Bayes classifiers. Firstly, we describe different tokenization methods which allow us for extracting valuable features from the e-mails. The methods are used to create training sets for individual Bayesian classifiers, because different methods of feature extraction ensure the desirable diversity of classifier ensemble. Because of the lack of an adequate analytical methods of ensemble evaluation the most valuable and diverse committees are chosen on the basis of computer experiments which are carried out on the basis of our own spam dataset. Then the number of well known fusion methods using class labels and class supports are compared to establish the final proposition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Biggio, B., Fumera, G., Roli, F.: Multiple Classifier Systems under Attack. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 74–83. Springer, Heidelberg (2010)
Chapter Google Scholar
Hershkop, S., Stolfo, S.J.: Combining email models for false positive reduction. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 98–107. ACM, New York (2005)
Chapter Google Scholar
Kurlej, B., Wozniak, M.: Active learning approach to concept drift problem. Logic Journal of the IGPL 20(3), 550–559 (2012)
Article Google Scholar
Erdélyi, M., Benczúr, A.A., Masanés, J., Siklósi, D.: Web spam filtering in internet archives. In: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2009, pp. 17–20. ACM, New York (2009)
Chapter Google Scholar
Pu, C., Webb, S.: Observed trends in spam construction techniques: A case study of spam evolution. In: CEAS (2006)
Google Scholar
Henzinger, M.R., Motwani, R., Silverstein, C.: Challenges in web search engines. SIGIR Forum 36(2), 11–22 (2002)
Article Google Scholar
Lai, C.C., Tsai, M.C.: An empirical performance comparison of machine learning methods for spam e-mail categorization. In: Proceedings of the Fourth International Conference on Hybrid Intelligent Systems, HIS 2004, pp. 44–48. IEEE Computer Society, Washington, DC (2004)
Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk E-mail. In: Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin. AAAI Technical Report WS-98-05 (1998)
Google Scholar
Graham, P.: A plan for spam (August 2002), http://www.paulgraham.com/spam.html
Siefkes, C., Assis, F., Chhabra, S., Yerazunis, W.S.: Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 410–421. Springer, Heidelberg (2004)
Chapter Google Scholar
Gargiulo, F., Penta, A., Picariello, A., Sansone, C.: A personal antispam system based on a behaviour-knowledge space approach. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, vol. 245, pp. 39–57. Springer, Heidelberg (2009)
Chapter Google Scholar
Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29(1), 63–92 (2008)
Article Google Scholar
Erdélyi, M., Garzó, A., Benczúr, A.A.: Web spam classification: a few features worth more. In: Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality. WebQuality 2011, pp. 27–34. ACM, New York (2011)
Chapter Google Scholar
Kuncheva, L.I.: Combining pattern classifiers: methods and algorithms. Wiley-Interscience (2004)
Google Scholar
Wozniak, M.: Proposition of common classifier construction for pattern recognition with context task. Knowledge-Based Systems 19(8), 617–624 (2006)
Article Google Scholar
van Erp, M., Vuurpijl, L., Schomaker, L.: An overview and comparison of voting methods for pattern recognition. In: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition, IWFHR 2002. IEEE Computer Society, Washington, DC (2002)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus (2006)
Google Scholar
Duin, R.P.W.: The combining classifier: To train or not to train? In: International Conference on Pattern Recognition, vol. 2, p. 20765 (2002)
Google Scholar
Jacobs, R.A.: Methods for combining experts’ probability assessments. Neural Comput. 7(5), 867–888 (1995)
Article Google Scholar
Burduk, R.: Imprecise information in bayes classifier. Pattern Anal. Appl. 15(2), 147–153 (2012)
Article Google Scholar
Shipp, C.A., Kuncheva, L.I.: Relationships between combination methods and measures of diversity in combining classifiers. Information Fusion 3, 135–148 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Karol Wrótniak & Michał Woźniak

Authors

Karol Wrótniak
View author publications
You can also search for this author in PubMed Google Scholar
Michał Woźniak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

, Department of Civil Engineering, University of Burgos, Campus Vena (Edif.C), C/ Francisco de Vitoria, s/n, Burgos, 09006, Spain
Álvaro Herrero
VŠB-TU Ostrava, 17. listopadu 15, Ostrava, 70833, Czech Republic
Václav Snášel
MIR Labs, Scientific Network for Innovation, Machine Intelligence Research Labs, Auburn, 98071, USA
Ajith Abraham
VŠB-TU Ostrava, 17. listopadu 15, Ostrava, 70833, Czech Republic
Ivan Zelinka
, Department of Civil Engineering, University of Burgos, Campus Vena (Edif.C), C/ Francisco de Vitoria, s/n, Burgos, 09006, Spain
Bruno Baruque
Universidad de Salamanca, Plaza de la Merced S/N, Salamanca, 37008, Spain
Héctor Quintián
University of Coruña, Avda. 19 de febrero, s/n, Coruña, 15405 A, Spain
José Luis Calvo
y León, Pol. Ind. Villalonquéjar, Instituto Tecnológico de Castilla, Lopez Bravo 70, Burgos, 09001, Spain
Javier Sedano
Universidad de Salamanca, Plaza de la Merced S/N, Salamanca, 37008, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wrótniak, K., Woźniak, M. (2013). Combined Bayesian Classifiers Applied to Spam Filtering Problem. In: Herrero, Á., et al. International Joint Conference CISIS’12-ICEUTE´12-SOCO´12 Special Sessions. Advances in Intelligent Systems and Computing, vol 189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33018-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-33018-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33017-9
Online ISBN: 978-3-642-33018-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics