Skip to main content

Combining Multiple Email Filters Based on Multivariate Statistical Analysis

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4203))

Abstract

In this paper, we investigate how to combine multiple e-mail filters based on multivariate statistical analysis for providing a barrier to spam, which is stronger than a single filter alone. Three evaluation criteria are suggested for cost-sensitive filters, and their rationality is discussed. Furthermore, a principle that minimizes the error cost is described to avoid filtering an e-mail of “Legitimate” into “Spam”. Comparing with other major methods, the experimental results show that our method of combining multiple filters has preferable performance when appropriate running parameters are adopted.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elisabeth, C., Judy, K., Eric, M.: Automatic induction of rules for e-mail classification. In: Proceedings of the Australasian Document Computing Symposium (2001)

    Google Scholar 

  2. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., et al.: An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 160–167 (2000)

    Google Scholar 

  3. Sahami, M., Dumais, S., Heckerman, D., et al.: A bayesian approach to filtering junk e-mail. In: Proceedings the AAAI Workshop on Learning for Text Categorization, Madison Wisconsin, pp. 55–62 (1998)

    Google Scholar 

  4. Soonthornphisaj, N., Chaikulseriwat, K., Tang-On, P.: Anti-spam filtering: a centroid-based classification approach. In: Proceedings of the International Conference on Signal Processing, pp. 1096–1099 (2002)

    Google Scholar 

  5. James, C., Irena, K., Josiah, P.: A neural network based approach to automated e-mail classification. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence, pp. 702–711 (2003)

    Google Scholar 

  6. Sun, D., Tran, Q.A., Duan, H., et al.: A novel method for Chinese spam detection based on one-class support vector machine. Journal of Information and Computational Science 2(1), 109–114 (2005)

    Google Scholar 

  7. Li, W.B., Liu, C.N., Chen, Y.Y.: Combining multiple email filters of Naive Bayes based on GMM. Acta Electronica Sinica 34(2), 247–251 (2006)

    Google Scholar 

  8. Jos, M.G.H., Manuel, M.L., Enrique, P.S.: Combining text and heuristics for cost-sensitive spam filtering. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, vol. 7, pp. 99–102 (2000)

    Google Scholar 

  9. Segal, R., Crawford, J., Kephart, J., et al.: SpamGuru: An enterprise anti-spam filtering system. In: Proceedings of the First Conference on Email and Anti-Spam (2004)

    Google Scholar 

  10. Hardle, W., Simar, L.: Applied Multivariate Statistical Analysis, 341–357 (2003)

    Google Scholar 

  11. Lewis, D.D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  12. McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: Proceedings of AAAI 1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)

    Google Scholar 

  13. PU1 and Ling-Spam dataset: http://iit.demokritos.gr/skel/i-config/downloads/

  14. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, W., Zhong, N., Liu, C. (2006). Combining Multiple Email Filters Based on Multivariate Statistical Analysis. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds) Foundations of Intelligent Systems. ISMIS 2006. Lecture Notes in Computer Science(), vol 4203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875604_81

Download citation

  • DOI: https://doi.org/10.1007/11875604_81

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45764-0

  • Online ISBN: 978-3-540-45766-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics