Abstract
Image spam is unsolicited bulk email, where the message is embedded in an image. Spammers use such images to evade text-based filters. In this research, we analyze and compare two methods for detecting spam images. First, we consider principal component analysis (PCA), where we determine eigenvectors corresponding to a set of spam images and compute scores by projecting images onto the resulting eigenspace. The second approach focuses on the extraction of a broad set of image features and selection of an optimal subset using support vector machines (SVM). Both of these detection strategies provide high accuracy with low computational complexity. Further, we develop a new spam image dataset that cannot be detected using our PCA or SVM approach. This new dataset should prove valuable for improving image spam detection capabilities.
Similar content being viewed by others
References
Annadatha, A.S.: Image spam analysis. Master’s Report, Department of Computer Science, San Jose State University (2016)
Annadatha, A.S.: Improved spam image dataset. https://www.dropbox.com/s/7zh7r9dopuh554e/New_Spam.zip?dl=0. Accessed 8 Aug 2016
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)
Brownlee, J.: An introduction to feature selection (2014). http://machinelearningmastery.com/an-introduction-to-feature-selection. Accessed 11 April 2016
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. CEAS, India (2007)
Dredze, M.: Image spam dataset (2007). http://www.cs.jhu.edu/~mdredze/datasets/image_spam/. Accessed 15 Febr 2016
Gao, Y., Choudhary, A., Hua, G.: A comprehensive server to client side approach to image spam detection. IEEE Trans. Inf. Foren. Secur. 5(4), 826–836 (2010)
Gao, Y., Choudhary, A.: Active learning image spam hunter. Adv. Vis. Comput. Lect. Not. Comput. Sci. 5876, 293–302 (2009)
Gao, Y., Yang, M., Choudhary, A.: Semi supervised image spam hunter: a regularized discriminant em approach. Adv. Data Min. Appl. Lect. Not. Comput. Sci. 5678, 152–164 (2009)
Gao, Y., Yang, M., Zhao, X., Pardo, B., Wu, Y., Pappas, T.N., Choudhary, A.: Image spam hunter, acoustics, speech and signal processing (ICASSP 2008), pp. 1765–1768
Gao, Y.: Image spam hunter dataset (2008). http://www.cs.northwestern.edu/~yga751/ML/ISH.htm. Accessed 20 Sept 2015
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Hearst, M.A., Dumais, S.T., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
He, P., Wen, X., Zheng, W.: A simple method for filtering image spam. In: Eighth IEEE/ACIS International Conference, pp. 910–913 (2009)
Jain, U., Dhavale, S.: Image spam detection technique based on fuzzy inference system. Master’s Report, Department of Computer Engineering, Defense Institute of Advanced Technology (2015)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Mäenpää, T., Pietikäinen, M.: Texture analysis with local binary patterns. Handbook of Pattern Recognition and Computer Vision, pp. 197–216, 3rd ed (2005)
NIST, Mean vector and covariance matrix. http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm. Accessed 20 Jan 2016
Mladeni, D., Brank, J., Grobelnik, M., Milic-Frayling, N.: Feature selection using linear classifier weights: interaction with classification models. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 234–241 (2004)
Nixon, M.: Feature Extraction & Image Processing. Academic Press, New York (2008)
Princeton spam image benchmark (2007). http://www.cs.princeton.edu/cass/spam/
Rakotomamonjy, A.: Variable selection using SVM based criteria. J. Mach. Learn. Res. 3, 1357–1370 (2003)
Saleh, M., Mohamed, A., Nabi, A.: Eigenviruses for metamorphic virus recognition. IET Inf. Secur. 5(4), 191–198 (2011)
Shlens, J.: A Tutorial on Principal Component Analysis. http://www.cs.cmu.edu/~elaw/papers/pca, Accessed 5 March 2016
Spam Assassin. http://spamassassin.apache.org/. Accessed 1 March 2016
Stamp, M.: Introduction to Machine Learning with Applications in Information Security. Chapman & Hall/CRC Press (in press)
Symantec trend report. https://www.symantec.com/security_response/publications/monthlythreatreport.jsp#Spam. Accessed 15 April 2016
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognit. Neurosci. 3(1), 71–86 (1991)
Wang, Z., Josephson, W.K., Lv, Q., Charikar, M., Li, K.: Filtering Image Spam with Near-Duplicate Detection. CEAS, India (2007)
Win, Z.M., Aye, N.: Detecting image spam based on file properties, histogram and hough transform. J. Adv. Comput. Netw. 2(4), 287–292 (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Annadatha, A., Stamp, M. Image spam analysis and detection. J Comput Virol Hack Tech 14, 39–52 (2018). https://doi.org/10.1007/s11416-016-0287-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-016-0287-x