skip to main content
10.1145/1553374.1553424acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Bayesian clustering for email campaign detection

Published:14 June 2009Publication History

ABSTRACT

We discuss the problem of clustering elements according to the sources that have generated them. For elements that are characterized by independent binary attributes, a closed-form Bayesian solution exists. We derive a solution for the case of dependent attributes that is based on a transformation of the instances into a space of independent feature functions. We derive an optimization problem that produces a mapping into a space of independent binary feature vectors; the features can reflect arbitrary dependencies in the input space. This problem setting is motivated by the application of spam filtering for email service providers. Spam traps deliver a real-time stream of messages known to be spam. If elements of the same campaign can be recognized reliably, entire spam and phishing campaigns can be contained. We present a case study that evaluates Bayesian clustering for this application.

References

  1. Haider, P., Brefeld, U., & Scheffer, T. (2007). Supervised Clustering of Streaming Data for Email Batch Detection. Proceedings of the 24th International Conference on Machine Learning (pp. 345--352). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Heller, K. A., & Ghahramani, Z. (2005). Bayesian hierarchical clustering. Proceedings of the 22nd International Conference on Machine Learning (pp. 297--304). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lau, J., & Green, P. (2007). Bayesian Model-Based Clustering Procedures. Journal of Computational and Graphical Statistics, 16, 526--558.Google ScholarGoogle ScholarCross RefCross Ref
  4. Teo, C., Globerson, A., Roweis, S., & Smola, A. (2008). Convex Learning with Invariances. Advances in Neural Information Processing Systems, 20, 1489--1496.Google ScholarGoogle Scholar
  5. Webb, G., Boughton, J., & Wang, Z. (2005). Not So Naive Bayes: Aggregating One-Dependence Estimators. Machine Learning, 58, 5--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Williams, C. (2000). A MCMC approach to hierarchical mixture modelling. Advances in Neural Information Processing Systems, 12, 680--686.Google ScholarGoogle Scholar
  7. Zheng, Z., & Webb, G. (2000). Lazy Learning of Bayesian Rules. Machine Learning, 41, 53--84. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bayesian clustering for email campaign detection

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Other conferences
                    ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
                    June 2009
                    1331 pages
                    ISBN:9781605585161
                    DOI:10.1145/1553374

                    Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 14 June 2009

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article

                    Acceptance Rates

                    Overall Acceptance Rate140of548submissions,26%

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader