Definition
Spam filtering is the process of detecting unsolicited commercial email (UCE) messages on behalf of an individual recipient or a group of recipients. Machine learning applied to this problem is used to create discriminating models based on labeled and unlabeled examples of spam and nonspam. Such models can serve populations of users (e.g., departments, corporations, ISP customers) or they can be personalized to reflect the judgments of an individual. An important aspect of spam detection is the way in which textual information contained in email is extracted and used for the purpose of discrimination.
Motivation and Background
Spam has become the bane of existence for both Internet users and entities providing email services. Time is lost when sifting through unwanted messages and important emails may be lost through omission or accidental deletion. According to...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Bratko A, Cormack GV, Filipic B, Lynam TR, Zupan B (2006) Spam filtering using statistical data compression models. J Mach Learn Res 7:2673–2698
Carreras X, Màrquez L (2001) Boosting trees for anti-spam email filtering. In: Proceedings of RANLP-01, the 4th international conference on recent advances in natural language processing. ACM, New York
Cormack GV, Lynam TR (2006) On-line supervised spam filter evaluation. ACM Trans Inf Syst 25(3):11
Dalvi N, Domingos P, Sanghai MS, Verma D (2004) Adversarial classification. In: Proceedings of the tenth international conference on knowledge discovery and data mining, vol 1. ACM, New York, pp 99–108
Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 5(10):1048–1054
Fawcett T (2003) In vivo’ spam filtering: a challenge problem for data mining. KDD Explor 5(2):140–148
Goodman J, Yih W (2006) Online discriminative spam filter training. In: Proceedings of the third conference on email and anti-spam (CEAS-2006), Mountain View
Kołcz A (2005) Local sparsity control for naive bayes with extreme misclassification costs. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York
Kołcz A, Alspector J (2001) SVM-based filtering of e-mail spam with content-specific misclassification costs. In: TextDM’2001 (IEEE ICDM-2001 workshop on text mining), San Jose
Kołcz A, Bond M, Sargent J (2006) The challenges of service-side personalized spam filtering: scalability and beyond. In: Proceedings of the first international conference on scalable information systems (INFOSCALE). ACM, New York
Kołcz AM, Chowdhury A (2007) Hardening fingerprinting by context. In: Proceedings of the fourth international conference on email and anti-spam, Mountain View
Lowd D, Meek C (2005) Good word attacks on statistical spam filters. In: Proceedings of the second conference on email and anti-spam (CEAS-2005), Mountain View
Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with naive bayes – which naive bayes? In: Proceedings of the third conference on email and anti-spam (CEAS-2006), Mountain View
Rigoutsos I, Huynh T (2004) Chung-Kwei: a pattern-discovery-based system for the automatic identification of unsolicited e-mail messages (SPAM). In: Proceedings of the first conference on email and anti-spam (CEAS-2004), Mountain View
Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk email. In: AAAI workshop on learning for text categorization, Madison. AAAI technical report WS-98-05
Sakkis G, Androutsopoulos I, Paliouras G, Karkaletsis V, Spyropoulos CD, Stamatopoulos P (2001) Stacking classifiers for anti-spam filtering of e-mail. In: Lee L, Harman D (eds) Proceedings of empirical methods in natural language processing (EMNLP 2001), pp 44–50. http://www.cs.cornell.edu/home/llee/emnlp/proceeding.html
Sculley D, Wachman G (2007) Relaxed online support vector machines for spam filtering. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York
Segal R, Crawford J, Kephart J, Leiba B (2004) SpamGuru: an enterprise anti-spam filtering system. In: Proceedings of the first conference on email and anti-spam (CEAS-2004), Mountain View
Siefkes C, Assis F, Chhabra S, Yerazunis W (2004) Combining winnow and orthogonal sparse bigrams for incremental spam filtering. In: Proceedings of the European conference on principle and practice of knowledge discovery in databases. Springer, New York
Yoshida K, Adachi F, Washio T, Motoda H, Homma T, Nakashima A et al (2004) Densitiy-based spam detection. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 486–493
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
KoŁcz, A. (2017). Text Mining for Spam Filtering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_828
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_828
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering