skip to main content
10.1145/1367497.1367565acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Detecting image spam using visual features and near duplicate detection

Published: 21 April 2008 Publication History

Abstract

Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam generation have posed a new challenge: image-based spam. Image based spam is email which includes embedded images containing the spam messages, but in binary format. In this paper, we study the characteristics of image spam to propose two solutions for detecting image-based spam, while drawing a comparison with the existing techniques. The first solution, which uses the visual features for classification, offers an accuracy of about 98%, i.e. an improvement of at least 6% compared to existing solutions. SVMs (Support Vector Machines) are used to train classifiers using judiciously decided color, texture and shape features. The second solution offers a novel approach for near duplication detection in images. It involves clustering of image GMMs (Gaussian Mixture Models) based on the Agglomerative Information Bottleneck (AIB) principle, using Jensen-Shannon divergence (JS) as the distance measure.

References

[1]
K. Albrecht, N. Burri, and R. Wattenhofer. Spamato-An Extendable Spam Filter System. 2nd Conference on Email and Anti-Spam (CEAS), Stanford University, Palo Alto, California, USA, 2005.
[2]
C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2):121--167, 1998.
[3]
C. Carson, M. Thomas, S. Belongie, J. Hellerstein, and J. Malik. Blobworld: A system for region-based image indexing and retrieval. Third International Conference on Visual Information Systems, pages 509--516, 1999.
[4]
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.
[5]
M. Dredze, R. Gevaryahu, and A. Elias-Bachrach. Learning Fast Classifiers for Image Spam. In proceedings of the Conference on Email and Anti-Spam (CEAS), 2007, pages 487--493, 2007.
[6]
G. Fumera, I. Pillai, and F. Roli. Spam Filtering Based On The Analysis Of Text Information Embedded Into Images. The Journal of Machine Learning Research, 7:2699--2720, 2006.
[7]
J. Goldberger, S. Gordon, and H. Greenspan. An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures. Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pages 487--493, 2003.
[8]
J. Goldberger, H. Greenspan, and S. Gordon. Unsupervised Image clustering using the Information Bottleneck method. Proc. DAGM, 2002.
[9]
R. Haralick, I. Dinstein, and K. Shanmugam. Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics, 3:610--621, 1973.
[10]
T. Joachims. Making large-scale SVM Learning Practical. Advances in Kernel Methods-Support Vector Learning.
[11]
R. Kraut, J. Morris, R. Telang, D. Filer, M. Cronin, and S. Sunder. Markets for attention: will postage for email help? Proceedings of the 2002 ACM conference on Computer supported cooperative work, pages 206--215, 2002.
[12]
Secure Computing. Image spam: The latest attack on the enterprise inbox. Secure Computing Whitepaper, available online, Nov 2006.
[13]
N. Slonim and N. Tishby. Agglomerative information bottleneck. Advances in Neural Information Processing Systems, 12:617--23, 2000.
[14]
M. Stricker and M. Orengo. Similarity of color images. Proc. SPIE Storage and Retrieval for Image and Video Databases, 2420:381--392, 1995.
[15]
M. Tuceryan and A. Jain. Texture analysis. Handbook of Pattern Recognition and Computer Vision, pages 235--276, 1993.
[16]
Z. Wang, W. Josephson, Q. Lv, M. Charikar, and K. Li. Filtering Image Spam with Near-Duplicate Detection. Proceedings of the 4th Conference on Email and Anti-Spam (CEAS), 2007.

Cited By

View all
  • (2023)Rare Category Analysis for Complex Data: A ReviewACM Computing Surveys10.1145/362652056:5(1-35)Online publication date: 27-Nov-2023
  • (2022)A review of spam email detection: analysis of spammer strategies and the dataset shift problemArtificial Intelligence Review10.1007/s10462-022-10195-456:2(1145-1173)Online publication date: 11-May-2022
  • (2020)IntroductionText Segmentation and Recognition for Enhanced Image Spam Detection10.1007/978-3-030-53047-1_1(1-10)Online publication date: 11-Aug-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '08: Proceedings of the 17th international conference on World Wide Web
April 2008
1326 pages
ISBN:9781605580852
DOI:10.1145/1367497
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. email spam
  2. image analysis
  3. machine learning

Qualifiers

  • Research-article

Conference

WWW '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Rare Category Analysis for Complex Data: A ReviewACM Computing Surveys10.1145/362652056:5(1-35)Online publication date: 27-Nov-2023
  • (2022)A review of spam email detection: analysis of spammer strategies and the dataset shift problemArtificial Intelligence Review10.1007/s10462-022-10195-456:2(1145-1173)Online publication date: 11-May-2022
  • (2020)IntroductionText Segmentation and Recognition for Enhanced Image Spam Detection10.1007/978-3-030-53047-1_1(1-10)Online publication date: 11-Aug-2020
  • (2019)An efficient character recognition method using enhanced HOG for spam image detectionSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-018-03728-z23:22(11759-11774)Online publication date: 1-Nov-2019
  • (2019)Image Spam Detection on Instagram Using Convolutional Neural NetworkIntelligent and Interactive Computing10.1007/978-981-13-6031-2_19(295-303)Online publication date: 17-May-2019
  • (2019)Restoration as a Defense Against Adversarial Perturbations for Spam Image DetectionArtificial Neural Networks and Machine Learning – ICANN 2019: Image Processing10.1007/978-3-030-30508-6_56(711-723)Online publication date: 9-Sep-2019
  • (2018)Infinite Scaled Dirichlet Mixture Models for Spam Filtering via Bayesian and Variational Bayes Learning2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData)10.1109/Cybermatics_2018.2018.00306(1841-1847)Online publication date: Jul-2018
  • (2018)Shape Based Feature Extraction in Detection of Image EmailJournal of Physics: Conference Series10.1088/1742-6596/1142/1/0120031142(012003)Online publication date: 30-Nov-2018
  • (2018)Distributed classification for image spam detectionMultimedia Tools and Applications10.1007/s11042-017-4944-y77:11(13249-13278)Online publication date: 1-Jun-2018
  • (2018)EP-Based Infinite Inverted Dirichlet Mixture Learning: Application to Image Spam DetectionRecent Trends and Future Technology in Applied Intelligence10.1007/978-3-319-92058-0_33(342-354)Online publication date: 30-May-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media