Detecting Image Based Spam Email

Ma, Wanli; Tran, Dat; Sharma, Dharmendra

doi:10.1007/978-3-540-77368-9_17

Detecting Image Based Spam Email

Wanli Ma¹,
Dat Tran¹ &
Dharmendra Sharma¹

Conference paper

1296 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4413))

Abstract

Image based spam email can easily circumvent widely used text based spam email filters. More and more spammers are adapting the technology. Being able to detect the nature of email from its image content is urgently needed. We propose to use OCR (optical character recognition) technology to extract the embedded text from the images and then assess the nature of the email by the extracted text using the same text based engine. This approach avoids maintaining an extra image based detection engine and also takes the benefit of the strong and reasonably mature text based engine. The success of this approach relies on the accuracy of the OCR. However, regardless of how good an OCR is, misrecognition is unavoidable. Therefore, a Markov model which has the ability to tolerate misspells is also proposed. The solution proposed in this paper can be integrated smoothly into existing spam email filters.

This research work is supported by the divisional grants from the Division of Business, Law and Information Sciences, University of Canberra, Australia, and the university grants from University of Canberra, Australia.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Keizer, G.: Spam Could Cost Businesses Worldwide $50 Billion (accessed 09 October 2005), http://www.informationweek.com/story/showArticle.jhtml?articleID=60403649
Symantec: Love Letter Worm (accessed October 2005), http://securityresponse.symantec.com/avcenter/venc/data/vbs.loveletter.a.html
Symantec: Slammer Virus, (accessed October 2005), http://securityresponse.symantec.com/avcenter/venc/data/w32.sqlexp.worm.html
Lemos, R.: Counting the cost of Slammer (accessed 11 October 2005), http://news.com.com/2102-1001_3-982955.html?tag=st.util.print
SpamAssassin: The Apache SpamAssassin Project, http://spamassassin.apache.org/
Sahami, M., et al.: A Bayesian Approach to Filtering Junk E-mail. In: AAAI- 1998 Workshop on Learning for Text Categorization (1998)
Google Scholar
Sakkis, G., et al.: A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists. INFORMATION RETRIEVAL 6(1), 49–73 (2003)
Article Google Scholar
Carreras, X., Marquez, L.: Boosting Trees for Anti-Spam Email Filtering. In: 4th International Conference on Recent Advances in Natural Language Processing (RANLP-2001) (2001)
Google Scholar
Zhang, L., Yao, T.-S.: Filtering Junk Mail with A Maximum Entropy Model. In: 20th International Conference on Computer Processing of Oriental Languages (ICCPOL 2003) (2003)
Google Scholar
Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)
Article Google Scholar
Chuan, Z., et al.: A LVQ-based neural network anti-spam email approach. ACM SIGOPS Operating Systems Review 39(1), 34–39 (2005)
Article MathSciNet Google Scholar
Zhou, Y., Mulekar, M.S., Nerellapalli, P.: Adaptive Spam Filtering Using Dynamic Feature Space. In: 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2005) (2005)
Google Scholar
Graham-Cumming, J.: The Spammers’ Compendium, http://www.jgc.org/tsc/
Wu, C.-T., et al.: Using visual features for anti-spam filtering. In: IEEE International Conference on Image Processing (ICIP 2005) (2005)
Google Scholar
Aradhye, H.B., Myers, G.K., Herson, J.A.: Image Analysis for Efficient Categorization of Image-based Spam E-mail. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), IEEE, Los Alamitos (2005)
Google Scholar
Wu, C.-T.: Embedded-Text Detection and Its Application to Anti-Spam Filtering. University of California, Santa Barbara: Santa Barbarra, CA, USA (2005)
Google Scholar
Eikvil, L.O.: Optical Character Recognition. Oslo, Norway, Norwegian Computing Center (1993)
Google Scholar
Tran, D., et al.: A Proposed Statistical Model for Spam Email Detection (submitted for publishing 2006)
Google Scholar
Postel, J.B.: Simple Mail Transfer Protocol, http://www.ietf.org/rfc/rfc0821.txt
Freed, N., Borenstein, N.: Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types (accessed May 2006), http://www.ietf.org/rfc/rfc2046.txt
ripmime (accessed May 2006), http://www.pldaniels.com/ripmime
gocr (accessed May 2006), http://jocr.sourceforge.net
Pelletier, L., Almhana, J., Choulakian, V.: Adaptive filtering of spam. In: (CNSR 2004) Second Annual Conference on Communication Networks and Services Research (2004)
Google Scholar
Tran, D., Sharma, D.: Markov Modeling Method for Written Language Identification and Verification. In: the Sixth International Conference on Intelligent Technologies InTech 2005, Thailand (2005)
Google Scholar
Tran, D.: New Background Modeling for Speaker Verification. In: INTERSPEECH, ICSLP Conference, Korea (2004)
Google Scholar
Ma, W., Tran, D., Sharma, D.: Detecting image based spam email by using OCR and trigram methods. In: Proceedings of Asia-Pacific Workshop on Visual Information Processing (VIP 2006), Beijing, China (November 2006)
Google Scholar
Tran, D., Markov, D.S.: Models for Written Language Identification. In: The 12th International Conference on Neural Information Processing, Taiwan, pp. 67–70 (30 October-2 November 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Sciences and Engineering, University of Canberra, Australia
Wanli Ma, Dat Tran & Dharmendra Sharma

Authors

Wanli Ma
View author publications
You can also search for this author in PubMed Google Scholar
Dat Tran
View author publications
You can also search for this author in PubMed Google Scholar
Dharmendra Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Marcin S. Szczuka Daniel Howard Dominik Ślȩzak Haeng-kon Kim Tai-hoon Kim Il-seok Ko Geuk Lee Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, W., Tran, D., Sharma, D. (2007). Detecting Image Based Spam Email. In: Szczuka, M.S., et al. Advances in Hybrid Information Technology. ICHIT 2006. Lecture Notes in Computer Science(), vol 4413. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77368-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-77368-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77367-2
Online ISBN: 978-3-540-77368-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics