Skip to main content

Time-Sensitive Sampling for Spam Filtering

  • Conference paper
Advances in Artificial Intelligence (Canadian AI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3060))

Abstract

Email filters based on learned models should be developed from appropriate training and test sets. A k-fold cross-validation is commonly presented in the literature as a method of mixing old and new messages to produce these data sets. We show that this results in overly optimistic estimates of the email filter’s accuracy in classifying future messages because the training set has a higher probability of containing messages that are similar to those in the test set. We propose a method that preserves the chronology of the email messages in the data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

References

  1. Androutsopoulos, I., et al.: Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach. In: Zaragoza, H., Gallinari, P., Rajman, M. (eds.) Proceedings of the workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000), Lyon, France, September 2000, pp. 1–13 (2000)

    Google Scholar 

  2. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  3. Androutsopoulos homepage, http://www.aueb.gr/users/ion/publications.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fu, CL., Silver, D. (2004). Time-Sensitive Sampling for Spam Filtering. In: Tawfik, A.Y., Goodwin, S.D. (eds) Advances in Artificial Intelligence. Canadian AI 2004. Lecture Notes in Computer Science(), vol 3060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24840-8_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24840-8_54

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22004-6

  • Online ISBN: 978-3-540-24840-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics