Image spam analysis and detection

Annadatha, Annapurna; Stamp, Mark

doi:10.1007/s11416-016-0287-x

Image spam analysis and detection

Original Paper
Published: 14 October 2016

Volume 14, pages 39–52, (2018)
Cite this article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Annapurna Annadatha¹ &
Mark Stamp¹

877 Accesses
30 Citations
Explore all metrics

Abstract

Image spam is unsolicited bulk email, where the message is embedded in an image. Spammers use such images to evade text-based filters. In this research, we analyze and compare two methods for detecting spam images. First, we consider principal component analysis (PCA), where we determine eigenvectors corresponding to a set of spam images and compute scores by projecting images onto the resulting eigenspace. The second approach focuses on the extraction of a broad set of image features and selection of an optimal subset using support vector machines (SVM). Both of these detection strategies provide high accuracy with low computational complexity. Further, we develop a new spam image dataset that cannot be detected using our PCA or SVM approach. This new dataset should prove valuable for improving image spam detection capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of Learning Algorithms to Image Spam Evolution

Image Spam Classification Using Neural Network

Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

References

Annadatha, A.S.: Image spam analysis. Master’s Report, Department of Computer Science, San Jose State University (2016)
Annadatha, A.S.: Improved spam image dataset. https://www.dropbox.com/s/7zh7r9dopuh554e/New_Spam.zip?dl=0. Accessed 8 Aug 2016
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)
Article Google Scholar
Brownlee, J.: An introduction to feature selection (2014). http://machinelearningmastery.com/an-introduction-to-feature-selection. Accessed 11 April 2016
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Article Google Scholar
Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. CEAS, India (2007)
Dredze, M.: Image spam dataset (2007). http://www.cs.jhu.edu/~mdredze/datasets/image_spam/. Accessed 15 Febr 2016
Gao, Y., Choudhary, A., Hua, G.: A comprehensive server to client side approach to image spam detection. IEEE Trans. Inf. Foren. Secur. 5(4), 826–836 (2010)
Gao, Y., Choudhary, A.: Active learning image spam hunter. Adv. Vis. Comput. Lect. Not. Comput. Sci. 5876, 293–302 (2009)
Gao, Y., Yang, M., Choudhary, A.: Semi supervised image spam hunter: a regularized discriminant em approach. Adv. Data Min. Appl. Lect. Not. Comput. Sci. 5678, 152–164 (2009)
Gao, Y., Yang, M., Zhao, X., Pardo, B., Wu, Y., Pappas, T.N., Choudhary, A.: Image spam hunter, acoustics, speech and signal processing (ICASSP 2008), pp. 1765–1768
Gao, Y.: Image spam hunter dataset (2008). http://www.cs.northwestern.edu/~yga751/ML/ISH.htm. Accessed 20 Sept 2015
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Article MATH Google Scholar
Hearst, M.A., Dumais, S.T., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
Article Google Scholar
He, P., Wen, X., Zheng, W.: A simple method for filtering image spam. In: Eighth IEEE/ACIS International Conference, pp. 910–913 (2009)
Jain, U., Dhavale, S.: Image spam detection technique based on fuzzy inference system. Master’s Report, Department of Computer Engineering, Defense Institute of Advanced Technology (2015)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet MATH Google Scholar
Mäenpää, T., Pietikäinen, M.: Texture analysis with local binary patterns. Handbook of Pattern Recognition and Computer Vision, pp. 197–216, 3rd ed (2005)
NIST, Mean vector and covariance matrix. http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm. Accessed 20 Jan 2016
Mladeni, D., Brank, J., Grobelnik, M., Milic-Frayling, N.: Feature selection using linear classifier weights: interaction with classification models. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 234–241 (2004)
Nixon, M.: Feature Extraction & Image Processing. Academic Press, New York (2008)
Google Scholar
Princeton spam image benchmark (2007). http://www.cs.princeton.edu/cass/spam/
Rakotomamonjy, A.: Variable selection using SVM based criteria. J. Mach. Learn. Res. 3, 1357–1370 (2003)
MathSciNet MATH Google Scholar
Saleh, M., Mohamed, A., Nabi, A.: Eigenviruses for metamorphic virus recognition. IET Inf. Secur. 5(4), 191–198 (2011)
Article Google Scholar
Shlens, J.: A Tutorial on Principal Component Analysis. http://www.cs.cmu.edu/~elaw/papers/pca, Accessed 5 March 2016
Spam Assassin. http://spamassassin.apache.org/. Accessed 1 March 2016
Stamp, M.: Introduction to Machine Learning with Applications in Information Security. Chapman & Hall/CRC Press (in press)
Symantec trend report. https://www.symantec.com/security_response/publications/monthlythreatreport.jsp#Spam. Accessed 15 April 2016
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognit. Neurosci. 3(1), 71–86 (1991)
Article Google Scholar
Wang, Z., Josephson, W.K., Lv, Q., Charikar, M., Li, K.: Filtering Image Spam with Near-Duplicate Detection. CEAS, India (2007)
Win, Z.M., Aye, N.: Detecting image spam based on file properties, histogram and hough transform. J. Adv. Comput. Netw. 2(4), 287–292 (2014)

Download references

Author information

Authors and Affiliations

Department of Computer Science, San Jose State University, San Jose, USA
Annapurna Annadatha & Mark Stamp

Authors

Annapurna Annadatha
View author publications
You can also search for this author in PubMed Google Scholar
Mark Stamp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Stamp.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Annadatha, A., Stamp, M. Image spam analysis and detection. J Comput Virol Hack Tech 14, 39–52 (2018). https://doi.org/10.1007/s11416-016-0287-x

Download citation

Received: 18 June 2016
Accepted: 28 September 2016
Published: 14 October 2016
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11416-016-0287-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image spam analysis and detection

Abstract

Access this article

Similar content being viewed by others

Application of Learning Algorithms to Image Spam Evolution

Image Spam Classification Using Neural Network

Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Image spam analysis and detection

Abstract

Access this article

Similar content being viewed by others

Application of Learning Algorithms to Image Spam Evolution

Image Spam Classification Using Neural Network

Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation