Skip to main content
Log in

Image spam analysis and detection

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

Image spam is unsolicited bulk email, where the message is embedded in an image. Spammers use such images to evade text-based filters. In this research, we analyze and compare two methods for detecting spam images. First, we consider principal component analysis (PCA), where we determine eigenvectors corresponding to a set of spam images and compute scores by projecting images onto the resulting eigenspace. The second approach focuses on the extraction of a broad set of image features and selection of an optimal subset using support vector machines (SVM). Both of these detection strategies provide high accuracy with low computational complexity. Further, we develop a new spam image dataset that cannot be detected using our PCA or SVM approach. This new dataset should prove valuable for improving image spam detection capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Annadatha, A.S.: Image spam analysis. Master’s Report, Department of Computer Science, San Jose State University (2016)

  2. Annadatha, A.S.: Improved spam image dataset. https://www.dropbox.com/s/7zh7r9dopuh554e/New_Spam.zip?dl=0. Accessed 8 Aug 2016

  3. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)

    Article  Google Scholar 

  4. Brownlee, J.: An introduction to feature selection (2014). http://machinelearningmastery.com/an-introduction-to-feature-selection. Accessed 11 April 2016

  5. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)

    Article  Google Scholar 

  6. Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. CEAS, India (2007)

  7. Dredze, M.: Image spam dataset (2007). http://www.cs.jhu.edu/~mdredze/datasets/image_spam/. Accessed 15 Febr 2016

  8. Gao, Y., Choudhary, A., Hua, G.: A comprehensive server to client side approach to image spam detection. IEEE Trans. Inf. Foren. Secur. 5(4), 826–836 (2010)

  9. Gao, Y., Choudhary, A.: Active learning image spam hunter. Adv. Vis. Comput. Lect. Not. Comput. Sci. 5876, 293–302 (2009)

  10. Gao, Y., Yang, M., Choudhary, A.: Semi supervised image spam hunter: a regularized discriminant em approach. Adv. Data Min. Appl. Lect. Not. Comput. Sci. 5678, 152–164 (2009)

  11. Gao, Y., Yang, M., Zhao, X., Pardo, B., Wu, Y., Pappas, T.N., Choudhary, A.: Image spam hunter, acoustics, speech and signal processing (ICASSP 2008), pp. 1765–1768

  12. Gao, Y.: Image spam hunter dataset (2008). http://www.cs.northwestern.edu/~yga751/ML/ISH.htm. Accessed 20 Sept 2015

  13. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  14. Hearst, M.A., Dumais, S.T., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)

    Article  Google Scholar 

  15. He, P., Wen, X., Zheng, W.: A simple method for filtering image spam. In: Eighth IEEE/ACIS International Conference, pp. 910–913 (2009)

  16. Jain, U., Dhavale, S.: Image spam detection technique based on fuzzy inference system. Master’s Report, Department of Computer Engineering, Defense Institute of Advanced Technology (2015)

  17. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  18. Mäenpää, T., Pietikäinen, M.: Texture analysis with local binary patterns. Handbook of Pattern Recognition and Computer Vision, pp. 197–216, 3rd ed (2005)

  19. NIST, Mean vector and covariance matrix. http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm. Accessed 20 Jan 2016

  20. Mladeni, D., Brank, J., Grobelnik, M., Milic-Frayling, N.: Feature selection using linear classifier weights: interaction with classification models. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 234–241 (2004)

  21. Nixon, M.: Feature Extraction & Image Processing. Academic Press, New York (2008)

    Google Scholar 

  22. Princeton spam image benchmark (2007). http://www.cs.princeton.edu/cass/spam/

  23. Rakotomamonjy, A.: Variable selection using SVM based criteria. J. Mach. Learn. Res. 3, 1357–1370 (2003)

    MathSciNet  MATH  Google Scholar 

  24. Saleh, M., Mohamed, A., Nabi, A.: Eigenviruses for metamorphic virus recognition. IET Inf. Secur. 5(4), 191–198 (2011)

    Article  Google Scholar 

  25. Shlens, J.: A Tutorial on Principal Component Analysis. http://www.cs.cmu.edu/~elaw/papers/pca, Accessed 5 March 2016

  26. Spam Assassin. http://spamassassin.apache.org/. Accessed 1 March 2016

  27. Stamp, M.: Introduction to Machine Learning with Applications in Information Security. Chapman & Hall/CRC Press (in press)

  28. Symantec trend report. https://www.symantec.com/security_response/publications/monthlythreatreport.jsp#Spam. Accessed 15 April 2016

  29. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognit. Neurosci. 3(1), 71–86 (1991)

    Article  Google Scholar 

  30. Wang, Z., Josephson, W.K., Lv, Q., Charikar, M., Li, K.: Filtering Image Spam with Near-Duplicate Detection. CEAS, India (2007)

  31. Win, Z.M., Aye, N.: Detecting image spam based on file properties, histogram and hough transform. J. Adv. Comput. Netw. 2(4), 287–292 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Stamp.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Annadatha, A., Stamp, M. Image spam analysis and detection. J Comput Virol Hack Tech 14, 39–52 (2018). https://doi.org/10.1007/s11416-016-0287-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-016-0287-x

Keywords

Navigation