skip to main content
10.1145/1299015.1299020acmotherconferencesArticle/Chapter ViewAbstractPublication PagesecrimeConference Proceedingsconference-collections
Article

Fighting unicode-obfuscated spam

Published: 04 October 2007 Publication History

Abstract

In the last few years, obfuscation has been used more and more by spammers to make spam emails bypass filters. The standard method is to use images that look like text, since typical spam filters are unable to parse such messages; this is what is used in so-called "rock phishing". To fight image-based spam, many spam filters use heuristic rules in which emails containing images are flagged, and since not many legit emails are composed mainly of a big image, this aids in detecting image-based spam. The spammers are thus interested in circumventing these methods. Unicode transliteration is a convenient tool for spammers, since it allows a spammer to create a large number of homomorphic clones of the same looking message; since Unicode contains many characters that are unique but appear very similar, spammers can translate a message's characters at random to hide black-listed words in an effort to bypass filters. In order to defend against these unicode-obfuscated spam emails, we developed a prototype tool that can be used with Spam Assassin to block spam obfuscated in this way by mapping polymorphic messages to a common, more homogeneous representation. This representation can then be filtered using traditional methods. We demonstrate the ease with which Unicode polymorphism can be used to circumvent spam filters such as SpamAssassin, and then describe a de-obfuscation technique that can be used to catch messages that have been obfuscated in this fashion.

References

[1]
S. Ahmed, F. Mithun, "Word Stemming to Enhance Spam Filtering," in the Conference on Email and Anti-Spam (CEAS'04) 2004. http://www.ceas.cc/papers-2004/167.
[2]
R. Cockerham, "There are 600, 426, 974, 379, 824, 381, 952 ways to spell Viagra." http://cockeyed.com/lessons/viagra/viagra.html. Retrieved on 25 July 2007.
[3]
D. Cook, J. Hartnett, K. Manderson, J. Scanlan, "Catching Spam Before it Arrives:Domain Specific Dynamic Blacklists," http://crpit.com/confpapers/CRPITV54Cook.pdf.
[4]
L. F. Cranor, B. A. LaMacchia, "Spam!" Communications of the ACM, August 1998.
[5]
A. Y. Fu, W. Zhang, X. Deng, W. Liu, "Safeguard against unicode attacks: generation and Application of UC-simlist," in the 15th International World Wide Web Conference (WWW'06), May 2006.
[6]
A. Y. Fu, X. Deng, W. Liu, G. Little, "The Methodology and an Application to Fight Against Unicode Attacks," in Proceedings of the Second Symposium on Usable Privacy and Security (SOUPS'06) July 2006. ACM Press.
[7]
F. D. Garcia, J. H. Hoepman, J. V. Nieuwenhuizen, "Spam Filter Analysis," arXiv report, February 2004. Available at http://arxiv.org/PS_cache/cs/pdf/0402/0402046v1.pdf
[8]
S. L. Garfinkel and R. C. Miller, "Johnny 2: a user test of key continuity management with S/MIME and Outlook Express," Proceedings of the 2005 Symposium on Usable Privacy and Security, 2005, pp. 13--24
[9]
P. Graham, "Better Bayesian Filtering," Spam Conference, January 2003. Available at http://www.paulgraham.com/better.html.
[10]
E. Gabber, M. Jakobsson, Y. Matias, A. Mayer, "Curbing Junk E-mail via Secure Classification," Financial Cryptograpy, 1998.
[11]
E. Gabrilovich, A. Gontmakher, "The Homograph Attack," Communications of the ACM, February 2002.
[12]
J. Goodman, G. V. Cormack, D. Heckerman, "Spam and the Ongoing Battle for the Inbox," Communications of the ACM, February 2007.
[13]
R. J. Hall, "Channels: Avoiding Unwanted Electronic Mail," Communications of the ACM, Volume 41 Issue 3, 1998.
[14]
R. J. Hall, "A Countermeasure to Duplicate-detecting Anti-spam Techniques," Available at http://citeseer.ist.psu.edu/279802.html, accessed 25 July 2007.
[15]
M. Jakobsson, "Modeling and Preventing Phishing Attacks," Phishing Panel in Financial Cryptography 2005. Available at www.informatics.indiana.edu/markus/papers/phishing_jakobsson.pdf
[16]
M. Jakobsson, J. Linn, J. Algesheimer, "How to Protect Against a Militant Spammer," http://www.informatics.indiana.edu/markus/papers/spam.pdf, accessed 1 July 2007.
[17]
M. Jakobsson and S. A. Myers (Eds.), Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft. ISBN 0-471-78245-9, Hardcover, 739 pages, December 2006.
[18]
J. Nazario, "Phishing Corpus," http://monkey.org/~jose/blog/viewpage.php?page=phishing_corpus. Accessed 22 May 2007.
[19]
U. Shardanand, P. Maes, "Social Information Filtering: Algorithms for Automating 'Word of Mouth'," Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. May 1995.
[20]
B. Thorson, "How Spammers Bypass E-mail Security," EE Times, 25 July 2007. http://www.eetimes.com/showArticle.jhtml? articleID=23900564
[21]
A. Tsow and M. Jakobsson, "Deceit and Deception: A Large User Study of Phishing," Technical Report TR649, Indiana University, August 2007. http://www.cs.indiana.edu/pub/techreports/TR649.pdf
[22]
S. Srikwan, M. Jakobsson, "Using Cartoons to Teach Internet Security." DIMACS Technical Report 2007-11, July 2007. http://www.informatics.indiana.edu/markus/documents/security-education.pdf
[23]
CRM114. http://crm114.sourceforge.net, Accessed 22 May 2007.
[24]
Anti-Phishing Group of City University of Hong Kong, http://antiphishing.cs.cityu.edu.hk.
[25]
Messaging Anti-Abuse Working Group, Email Metrics Program: "The Network Operator's Perspective, Report #4--3rd and 4th Quarters 2006," Available at http://www.maawg.org/about/MAAWGMetric_2006_3_4_report.pdf
[26]
SpamAssassin. http://wiki.apache.org/spamassassin, Accessed 22 May 2007.
[27]
SpamAssassin Readme file. http://www.cpan.org/modules/by-module/Mail/Mail-SpamAssassin-2.64.readme Accessed 22 May 2007.
[28]
SpamAssassin public Corpus, http://spamassassin.apache.org/publiccorpus, Accessed 25 May 2006.

Cited By

View all
  • (2023)Active Countermeasures for Email Fraud2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP57164.2023.00012(39-55)Online publication date: Jul-2023
  • (2021)Targeting the Weakest Link: Social Engineering Attacks in Ethereum Smart ContractsProceedings of the 2021 ACM Asia Conference on Computer and Communications Security10.1145/3433210.3453085(787-801)Online publication date: 24-May-2021
  • (2020)A Case of Identity: Detection of Suspicious IDN Homograph Domains Using Active DNS Measurements2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW51379.2020.00082(559-564)Online publication date: Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
October 2007
90 pages
ISBN:9781595939395
DOI:10.1145/1299015
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SpamAssassin
  2. deobfuscated emails
  3. obfuscated emails
  4. spam emails
  5. unicode characters

Qualifiers

  • Article

Conference

eCrime '07

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Active Countermeasures for Email Fraud2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP57164.2023.00012(39-55)Online publication date: Jul-2023
  • (2021)Targeting the Weakest Link: Social Engineering Attacks in Ethereum Smart ContractsProceedings of the 2021 ACM Asia Conference on Computer and Communications Security10.1145/3433210.3453085(787-801)Online publication date: 24-May-2021
  • (2020)A Case of Identity: Detection of Suspicious IDN Homograph Domains Using Active DNS Measurements2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW51379.2020.00082(559-564)Online publication date: Sep-2020
  • (2019)Adversarial Machine Learning10.1017/9781107338548Online publication date: 14-Mar-2019
  • (2019)Notify This: Exploiting Android Notifications for Fun and ProfitInformation Systems Security and Privacy10.1007/978-3-030-25109-3_5(86-108)Online publication date: 5-Jul-2019
  • (2018)Determining Resilience Gains From Anomaly Detection for Event Integrity in Wireless Sensor NetworksACM Transactions on Sensor Networks10.1145/317662114:1(1-35)Online publication date: 1-Feb-2018
  • (2017)Breaking and fixing content-based filtering2017 APWG Symposium on Electronic Crime Research (eCrime)10.1109/ECRIME.2017.7945054(52-56)Online publication date: Apr-2017
  • (2017)Phishing environments, techniques, and countermeasuresComputers and Security10.1016/j.cose.2017.04.00668:C(160-196)Online publication date: 1-Jul-2017
  • (2016)Traditional Countermeasures to Unwanted EmailUnderstanding Social Engineering Based Scams10.1007/978-1-4939-6457-4_5(51-62)Online publication date: 2016
  • (2015)Detection of phishing attacks in Iranian e-banking using a fuzzy-rough hybrid systemApplied Soft Computing10.1016/j.asoc.2015.05.05935:C(482-492)Online publication date: 1-Oct-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media