Skip to main content

Text Mining for Spam Filtering

  • Reference work entry
Encyclopedia of Machine Learning
  • 413 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Bratko, A., Cormack, G. V., Filipic, B., Lynam, T. R., & Zupan, B. (2006). Spam filtering using statistical data compression models. Journal of Machine Learning Research, 7, 2673–2698.

    MathSciNet  Google Scholar 

  • Carreras, X., & Màrquez, L. (2001). Boosting trees for anti-spam email filtering. In Proceedings of RANLP-01, the 4th international conference on recent advances in natural language processing. New York: ACM.

    Google Scholar 

  • Cormack, G. V., & Lynam, T. R. (2006). On-line supervised spam filter evaluation. ACM Transactions on Information Systems, 25(3), 11.

    Google Scholar 

  • Dalvi, N., Domingos, P., Sanghai, M. S., & Verma, D. (2004). Adversarial classification. In Proceedings of the tenth international conference on knowledge discovery and data mining (Vol. 1, pp. 99–108). New York: ACM.

    Google Scholar 

  • Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 5(10), 1048–1054.

    Article  Google Scholar 

  • Fawcett, T. (2003). In vivo’ spam filtering: A challenge problem for data mining. KDD Explorations, 5(2), 140–148.

    Article  Google Scholar 

  • Goodman, J., & Yih, W. (2006). Online discriminative spam filter training. In Proceedings of the third conference on email and anti-spam.Mountain View, CA. (CEAS-2006).

    Google Scholar 

  • Kołcz, A. (2005). Local sparsity control for naive bayes with extreme misclassification costs. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM.

    Google Scholar 

  • Kołcz, A., & Alspector, J. (2001). SVM-based filtering of e-mail spam with content-specific misclassification costs. TextDM’2001 (IEEE ICDM-2001 workshop on text mining), San Jose, CA.

    Google Scholar 

  • Kołcz, A., Bond, M., & Sargent, J. (2006). The challenges of service-side personalized spam filtering: Scalability and beyond. In Proceedings of the first international conference on scalable information systems (INFOSCALE). New York: ACM.

    Google Scholar 

  • Kołcz, A. M., & Chowdhury, A. (2007). Hardening fingerprinting by context. In Proceedings of the fourth international conference on email and anti-spam.

    Google Scholar 

  • Lowd, D., & Meek, C. (2005). Good word attacks on statistical spam filters. In Proceedings of the second conference on email and anti-spam. Mountain View, CA. (CEAS-2005).

    Google Scholar 

  • Metsis, V., Androutsopoulos, I., & Paliouras, G. (2006). Spam filtering with naive bayes - which naive bayes? In Proceedings of the third conference on email and anti-spam. (CEAS-2006).

    Google Scholar 

  • Rigoutsos, I., & Huynh, T. (2004). Chung-Kwei: a pattern-discovery-based system for the automatic identification of unsolicited e-mail messages (SPAM). In Proceedings of the first conference on email and anti-spam. (CEAS-2004).

    Google Scholar 

  • Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998). A Bayesian approach to filtering junk email. AAAI workshop on learning for text categorization, Madison, Wisconsin. AAAI Technical Report WS-98-05.

    Google Scholar 

  • Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C. D., & Stamatopoulos, P. (2001). Stacking classifiers for anti-spam filtering of e-mail. In L. Lee & D. Harman (Eds.). Proceedings of empirical methods in natural language processing (EMNLP 2001) (pp. 44–50). http://www.cs.cornell.edu/home/llee/emnlp/proceeding.html.

  • Sculley, D., & Wachman, G. (2007). Relaxed online support vector machines for spam filtering. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. New York: ACM.

    Google Scholar 

  • Segal, R., Crawford, J., Kephart, J., & Leiba, B. (2004). SpamGuru: An enterprise anti-spam filtering system. In Proceedings of the first conference on email and anti-spam. (CEAS-2004).

    Google Scholar 

  • Siefkes, C., Assis, F., Chhabra, S., & Yerazunis, W. (2004). Combining winnow and orthogonal sparse bigrams for incremental spam filtering. In Proceedings of the european conference on principle and practice of knowledge discovery in databases. New York: Springer.

    Google Scholar 

  • Yoshida, K., Adachi, F., Washio, T., Motoda, H., Homma, T., Nakashima, A., et al. (2004). Densitiy-based spam detection. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 486–493). New York: ACM.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Kołcz, A. (2011). Text Mining for Spam Filtering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_828

Download citation

Publish with us

Policies and ethics