Abstract
Spammers often embed text into images in order to avoid filtering by text-based spam filters, which result in a large number of advertisement spam images. Garbage image recognition has become one of the hotspots in the field of Internet spam filtering research. Its goal is to solve the problem that traditional spam information filtering methods encounter a sharp performance decline or even failure when filtering spam image information. Based on the clustering algorithm, this paper proposes a method to expand the data samples, which greatly improves the number of high-quality training samples and meets the needs of model training. Then, we train a convolutional neural networks using the enlarged data samples to recognize the SPAM in real time. The experimental results show that the accuracy of the model is increased by more than 14% after using the method of data augmentation. The accuracy of the model can be improved by 6% compared with other methods of data augmentation. Combined with convolutional neural networks and the proposed method of data augmentation, the accuracy of our SPAM filtering model is 7–11% higher than that of the traditional method.
Similar content being viewed by others
References
Russakovsky O, Deng J, Su H, Fei-Fei L et al. (2015) imagenet large scale visual recognition challenge[J]. Int J Comput Vis 115(3):211–252
Le Cun Y et al (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks[C]. In: International conference on neural information processing systems, curran associates Inc., pp 1097–1105
Sermanet P, Eigen D, Zhang X et al. (2013) OverFeat: integrated recognition, localization and detection using convolutional networks[J]. Eprint Arxiv
He K, Zhang X, Ren S et al. (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks[C]. In: European conference on computer vision. Springer, Cham, pp 818–833
Szegedy C, Liu W, Jia Y et al. (2014) Going deeper with convolutions[J]. pp 1–9
He K, Zhang X, Ren S et al. (2015) Deep residual learning for image recognition[J]. pp 770–778
Le Cun Y et al (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical Report TR-2009 University of Toronto
Russakovsky O, Deng J, Su H et al. (2014) ImageNet large scale visual recognition challenge[J]. Int J Comput Vis 115(3):211–252
Everingham M, Gool LV, Williams CKI et al. (2010) The pascal visual object classes (VOC) challenge[J]. Int J Comput Vision 88(2):303–338
Lin T-Y, Maire M, Belongie SJ et al (2014) Microsoft COCO: common objects in context. CoRR. arXiv:1405.0312
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121C2159
Zeiler MD (2012) ADADELTA: an adaptive learning rate method[J]. Computer Science
Kingma D, Adam BJ (2014) A method for stochastic optimization[J]. Computer Science
Zhu X, Meng Q, Gu L (2017) J Real-time image proc. https://doi.org/10.1007/s11554-017-0743-y
Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks[J]. Eprint Arxiv
Boureau YL, Ponce J, Lecun Y (2010) A theoretical analysis of feature pooling in visual recognition. In: International conference on machine learning, DBLP, pp 111–118
Donahue J, Jia Y, Vinyals O et al (2014) DeCAF: a deep convolutional activation feature for generic visual recognition[C]. In: International conference on international conference on machine learning. JMLR.org, pp I–647
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR
Uijlings J, van de Sande K, Gevers T, Smeulders A (2013) Selective search for object recognition. IJCV
Attar A, Rad RM, Atani RE (2013) A survey of image spamming and filtering techniques[J]. Artif Intell Rev 40(1):71–105
Zhang Y, Wang S, Phillips P et al. (2014) Binary PSO with mutation operator for feature selection using decision tree applied to spam detection[J]. Knowl-Based Syst 64(1):22–31
Kim SY, Sohn KA (2015) Graph-based spam image detection for mobile phone spam image filtering[J]. Laryngoscope 3(4):72–86
Liu Q, Zhang FL, Qin ZG et al. (2010) Feature selection for image spam classification[J]. IEEE, pp 294–297
Shen J, Deng RH, Cheng Z et al. (2015) On robust image spam filtering via comprehensive visual modeling[J]. Pattern Recogn 48(10):3227–3238
Soranamageswari M, Meena DC (2010) Histogram based image spam detection using back propagation neural networks[C]. In: 2010 international conference on control automation and systems (ICCAS), IEEE, pp 3985–3988
Amir A, Srinivasan B, Khan AI (2017) Distributed classification for image spam detection. Multimed Tools Appl 77:13249–13278
Adarshya SP, Mekala R, Arayakkandiyil R et al. (2012) Image spam detection through server-client filtering by tracing the source IP of the spammer[J]. Digital Image Processing
Yang G, Zhang Y, Yang J, Ji G, Dong Z, Wang S, Feng C, Wang Q (2016) Automated classification of brain images using wavelet-energy and biogeography-based optimization. Multimedia Tools and Applications 75 (33):15601C15617
Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: Proceedings of the conference on email and anti-spam (CEAS)
Harada T, Ushiku Y, Yamashita Y et al. (2011) Discriminative spatial pyramid[C]. In: Computer vision and pattern recognition, IEEE, pp 1617–1624
Zhu X, Meng Q, Ding B et al. (2018) Cluster Comput. https://doi.org/10.1007/s10586-018-2165-4
Abadi M, Agarwal A, Barham P et al. (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systems[J]
Jia YS et al. (2014) Caffe: convolutional architecture for fast feature embedding[J]. Eprint Arxiv, pp 675–678
Team TD, Alrfou R, Alain G et al. (2016) Theano: a python framework for fast computation of mathematical expressions[J]
Acknowledgments
The authors would like to thank the reviewers for their helpful advices. The National Youth Science Foundation project of China (Grant no. F020101), the Henan Province Science and Technology key Project (Grant no. 1521022101936), the Natural Science Foundation of Hunan Province, China(Grant No.2018JJ2023) and the Key projects of Science and Technology Research in Henan Education Department (grant nos. 15A520091, 17B520031) are gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aiwan, F., Zhaofeng, Y. Image spam filtering using convolutional neural networks. Pers Ubiquit Comput 22, 1029–1037 (2018). https://doi.org/10.1007/s00779-018-1168-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-018-1168-8