Skip to main content

Detecting Image Based Spam Email

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4413))

Abstract

Image based spam email can easily circumvent widely used text based spam email filters. More and more spammers are adapting the technology. Being able to detect the nature of email from its image content is urgently needed. We propose to use OCR (optical character recognition) technology to extract the embedded text from the images and then assess the nature of the email by the extracted text using the same text based engine. This approach avoids maintaining an extra image based detection engine and also takes the benefit of the strong and reasonably mature text based engine. The success of this approach relies on the accuracy of the OCR. However, regardless of how good an OCR is, misrecognition is unavoidable. Therefore, a Markov model which has the ability to tolerate misspells is also proposed. The solution proposed in this paper can be integrated smoothly into existing spam email filters.

This research work is supported by the divisional grants from the Division of Business, Law and Information Sciences, University of Canberra, Australia, and the university grants from University of Canberra, Australia.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Keizer, G.: Spam Could Cost Businesses Worldwide $50 Billion (accessed 09 October 2005), http://www.informationweek.com/story/showArticle.jhtml?articleID=60403649

  2. Symantec: Love Letter Worm (accessed October 2005), http://securityresponse.symantec.com/avcenter/venc/data/vbs.loveletter.a.html

  3. Symantec: Slammer Virus, (accessed October 2005), http://securityresponse.symantec.com/avcenter/venc/data/w32.sqlexp.worm.html

  4. Lemos, R.: Counting the cost of Slammer (accessed 11 October 2005), http://news.com.com/2102-1001_3-982955.html?tag=st.util.print

  5. SpamAssassin: The Apache SpamAssassin Project, http://spamassassin.apache.org/

  6. Sahami, M., et al.: A Bayesian Approach to Filtering Junk E-mail. In: AAAI- 1998 Workshop on Learning for Text Categorization (1998)

    Google Scholar 

  7. Sakkis, G., et al.: A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists. INFORMATION RETRIEVAL 6(1), 49–73 (2003)

    Article  Google Scholar 

  8. Carreras, X., Marquez, L.: Boosting Trees for Anti-Spam Email Filtering. In: 4th International Conference on Recent Advances in Natural Language Processing (RANLP-2001) (2001)

    Google Scholar 

  9. Zhang, L., Yao, T.-S.: Filtering Junk Mail with A Maximum Entropy Model. In: 20th International Conference on Computer Processing of Oriental Languages (ICCPOL 2003) (2003)

    Google Scholar 

  10. Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)

    Article  Google Scholar 

  11. Chuan, Z., et al.: A LVQ-based neural network anti-spam email approach. ACM SIGOPS Operating Systems Review 39(1), 34–39 (2005)

    Article  MathSciNet  Google Scholar 

  12. Zhou, Y., Mulekar, M.S., Nerellapalli, P.: Adaptive Spam Filtering Using Dynamic Feature Space. In: 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2005) (2005)

    Google Scholar 

  13. Graham-Cumming, J.: The Spammers’ Compendium, http://www.jgc.org/tsc/

  14. Wu, C.-T., et al.: Using visual features for anti-spam filtering. In: IEEE International Conference on Image Processing (ICIP 2005) (2005)

    Google Scholar 

  15. Aradhye, H.B., Myers, G.K., Herson, J.A.: Image Analysis for Efficient Categorization of Image-based Spam E-mail. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), IEEE, Los Alamitos (2005)

    Google Scholar 

  16. Wu, C.-T.: Embedded-Text Detection and Its Application to Anti-Spam Filtering. University of California, Santa Barbara: Santa Barbarra, CA, USA (2005)

    Google Scholar 

  17. Eikvil, L.O.: Optical Character Recognition. Oslo, Norway, Norwegian Computing Center (1993)

    Google Scholar 

  18. Tran, D., et al.: A Proposed Statistical Model for Spam Email Detection (submitted for publishing 2006)

    Google Scholar 

  19. Postel, J.B.: Simple Mail Transfer Protocol, http://www.ietf.org/rfc/rfc0821.txt

  20. Freed, N., Borenstein, N.: Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types (accessed May 2006), http://www.ietf.org/rfc/rfc2046.txt

  21. ripmime (accessed May 2006), http://www.pldaniels.com/ripmime

  22. gocr (accessed May 2006), http://jocr.sourceforge.net

  23. Pelletier, L., Almhana, J., Choulakian, V.: Adaptive filtering of spam. In: (CNSR 2004) Second Annual Conference on Communication Networks and Services Research (2004)

    Google Scholar 

  24. Tran, D., Sharma, D.: Markov Modeling Method for Written Language Identification and Verification. In: the Sixth International Conference on Intelligent Technologies InTech 2005, Thailand (2005)

    Google Scholar 

  25. Tran, D.: New Background Modeling for Speaker Verification. In: INTERSPEECH, ICSLP Conference, Korea (2004)

    Google Scholar 

  26. Ma, W., Tran, D., Sharma, D.: Detecting image based spam email by using OCR and trigram methods. In: Proceedings of Asia-Pacific Workshop on Visual Information Processing (VIP 2006), Beijing, China (November 2006)

    Google Scholar 

  27. Tran, D., Markov, D.S.: Models for Written Language Identification. In: The 12th International Conference on Neural Information Processing, Taiwan, pp. 67–70 (30 October-2 November 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Marcin S. Szczuka Daniel Howard Dominik Ślȩzak Haeng-kon Kim Tai-hoon Kim Il-seok Ko Geuk Lee Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ma, W., Tran, D., Sharma, D. (2007). Detecting Image Based Spam Email. In: Szczuka, M.S., et al. Advances in Hybrid Information Technology. ICHIT 2006. Lecture Notes in Computer Science(), vol 4413. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77368-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77368-9_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77367-2

  • Online ISBN: 978-3-540-77368-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics