research-article

Mining spam email to identify common origins for forensic application

Authors:
Chun Wei

Univ. of Alabama at Birmingham, Birmingham, AL

Univ. of Alabama at Birmingham, Birmingham, AL
View Profile

,
Alan Sprague

Univ. of Alabama at Birmingham, Birmingham, AL

Univ. of Alabama at Birmingham, Birmingham, AL
View Profile

,
Gary Warner

Univ. of Alabama at Birmingham, Birmingham, AL

Univ. of Alabama at Birmingham, Birmingham, AL
View Profile

,
Anthony Skjellum

Univ. of Alabama at Birmingham, Birmingham, AL

Univ. of Alabama at Birmingham, Birmingham, AL
View Profile

SAC '08: Proceedings of the 2008 ACM symposium on Applied computingMarch 2008Pages 1433–1437https://doi.org/10.1145/1363686.1364019

Published:16 March 2008Publication History

SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

Pages 1433–1437

ABSTRACT

In recent years, spam email has become a major tool for criminals to conduct illegal business on the Internet. Therefore, in this paper we describe a new research approach that uses data mining techniques to study spam emails with the focus on law enforcement forensic analysis. After we retrieve useful attributes from spam emails, we use a connected components clustering algorithm to form relationships between messages. These initial clusters are then refined by using a weighted edges model where membership in the cluster requires the weight to exceed a chosen threshold. The results of the cluster membership are validated by WHOIS data, by the IP address of the computer hosting the advertised sites, and through comparison of graphical images of website fetches. This technique has been successful in identifying relationships between spam campaigns that were not identified by human researchers, enabling additional data to be brought into a single investigation.

References

Airoldi, E. and Malin, B. ScamSlam: An Architecture for Learning the Criminal Relations Behind Scam Spam. Carnegie Mellon University, School of Computer Science, Technical Report CMU-ISRI-04-121. Pittsburgh: May 2004.Google Scholar
Baase, S. Computer Algorithms: Introduction to Design and Analysis. (2^nd ed.). Addison-Wesley, 1988. Google ScholarDigital Library
Clark, J., Koprinska, I. and Poon, J. A neural network based approach to automated e-mail classification. In Proceedings of IEEE/WIC International Conference on Web Intelligence, 13, 17, (Oct. 2003), 702--705. Google ScholarDigital Library
Drucker, H., Wu, D. and Vapnik, V. N. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10, 5, (Sep 1999), 1048--1054. Google ScholarDigital Library
Han, J. and Kamber, M. Data Mining: Concepts and Techniques. (2^nd ed.). Morgan Kaufmann, San Francisco, CA, 2006. Google ScholarDigital Library
Jung, J. and Sit, E. An empirical study of spam traffic and the use of DNS black lists. In Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement. (Oct. 2004) 370--375. Google ScholarDigital Library
Sahami, M., Dumais S., Heckerman, D. and Horvitz, E. A Bayesian approach to filtering junk email. AAAI Workshop on Learning for Text Categorization, AAAI Technical Report WS-98-05. Madison, Wisconsin. July 1998. 55--62.Google Scholar
Sanpakdee, U., Walairacht, A. and Walairacht, S. Adaptive spam mail filtering using genetic algorithm. In Proceedings of the 8th International Conference on Advanced Communication Technology. (Feb. 2006). 441--445.Google Scholar
Soucy. P and Mineau, G. W. A simple KNN algorithm for text categorization. In Proceedings of 2001 IEEE International Conference on Data Mining, (Nov - Dec 2001) 647--648. Google ScholarDigital Library
Stolfo, S. Email Mining Toolkit Supporting Law Enforcement Forensic Analyses. NSF Final Report. DG.o 2005 Atlanta, GA. May 2005.Google Scholar
Vel, O. D., Anderson, A., Corney, M. and Mohay, G. Mining Email Content for Author Identification Forensics. SIGMOD: Special Section on Data Mining for Intrusion Detection and Threat Analysis, 30, 4, (Dec. 2001) 55--64. Google ScholarDigital Library
Yang, Y. and Liu, X. A Re-examination of text categorization methods. In Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (Aug. 1999). 42--49. Google ScholarDigital Library
Zhao, W. and Zhang, Z. An email classification model based on rough set theory. In Proceedings of the 2005 International Conference on Active Media Technology. (May 2005). 403--40.Google Scholar

Index Terms

Mining spam email to identify common origins for forensic application
1. Information systems
  1. World Wide Web
    1. Web applications
      1. Internet communications tools
2. Social and professional topics
  1. Computing / technology policy

Recommendations

Filtering spam with behavioral blacklisting
CCS '07: Proceedings of the 14th ACM conference on Computer and communications security

Spam filters often use the reputation of an IP address (or IP address range) to classify email senders. This approach worked well when most spam originated from senders with fixed IP addresses, but spam today is also sent from IP addresses for which ...
Read More
Detection of networks blocks used by the Storm Worm botnet
ACM-SE 46: Proceedings of the 46th Annual Southeast Regional Conference on XX

Storm Worm is a prolific web-spread Trojan virus that infects computers and turns them into nodes (called bots) of a botnet. The bots then can be used to distribute spam messages, launch DOS attacks, host phishing web sites, etc. This paper investigated ...
Read More
Clustering malware-generated spam emails with a novel fuzzy string matching algorithm
SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing

In this paper, a fuzzy-matching clustering algorithm is introduced to group subjects found in spam emails which are generated by malware. A modified scoring strategy is applied in dynamic programming to find subjects that are similar to each other. A ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing
March 2008
2586 pages
ISBN:9781595937537
DOI:10.1145/1363686
Conference Chairs:
Roger L. Wainwright
University of Tulsa
,
Hisham M. Haddad
Kennesaw State University
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 March 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cyber crime
data mining
electronic mail
forensic analysis
spam
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 979
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining spam email to identify common origins for forensic application

SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Filtering spam with behavioral blacklisting

Detection of networks blocks used by the Storm Worm botnet

Clustering malware-generated spam emails with a novel fuzzy string matching algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Mining spam email to identify common origins for forensic application

SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Filtering spam with behavioral blacklisting

Detection of networks blocks used by the Storm Worm botnet

Clustering malware-generated spam emails with a novel fuzzy string matching algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media