An intelligent character recognition method to filter spam images on cloud

Chen, Jun; Zhao, Hong; Yang, Jufeng; Zhang, Jian; Li, Tao; Wang, Kai

doi:10.1007/s00500-015-1811-5

An intelligent character recognition method to filter spam images on cloud

Methodologies and Application
Published: 25 July 2015

Volume 21, pages 753–763, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

Jun Chen¹,
Hong Zhao¹,
Jufeng Yang¹,
Jian Zhang¹,
Tao Li¹ &
…
Kai Wang¹

500 Accesses
8 Citations
Explore all metrics

Abstract

Cloud storage has become an important way for data sharing in recent years. Data protection for data owner and harmful data filtering for data recipients are two non-negligible problems in cloud storage. Illegal or unsuitable messages on cloud have a negative impact on minors and they are easily converted into images to avoid text-based filtering. To detect the spam image with the embedded harmful messages on cloud, soft computing methods are required for intelligent character recognition. HOG, proposed by Dalal and Triggs, has been demonstrated so far to be one of the best features for intelligent character recognition. A pre-defined sliding window is always used for the generation of candidate character images when HOG is applied to recognize the whole word. However, due to the difference in character sizes, the pre-defined window cannot exactly match with each character. Variations on scale and translation usually occur in the character image to be recognized, which have a great influence on the performance of intelligent character recognition. Aiming to solve this problem, STRHOG, an extended version of HOG, is proposed in this paper. Experiments on two public datasets and one our dataset have shown encouraging results for our work. The improved intelligent character recognition is helpful for filtering spam images on cloud. To make a fair comparison with other methods, nearest neighbor classifier is used for the intelligent character recognition. It is expected that the performance should be further improved by using better classifiers such as fuzzy neural network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Article Open access 11 May 2022

Francisco Jáñez-Martino, Rocío Alaiz-Rodríguez, … Enrique Alegre

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Article 21 March 2022

Sumitra Das Guptta, Khandaker Tayef Shahriar, … Iqbal H. Sarker

Improving spam email classification accuracy using ensemble techniques: a stacking approach

Article Open access 20 September 2023

Muhammad Adnan, Muhammad Osama Imam, … Iqbal Murtza

References

Attar A, Rad RM, Atani RE (2013) A survey of image spamming and filtering techniques 40:71–105
Biggio B, Fumera G, Pillai I, Roli F (2011) A survey and experimental evaluation of image spam filtering techniques. Pattern Recogn Lett 32(10):1436–1446
Article Google Scholar
Blanzieri E, Bryl A (2008) A survey of learning-based techniques of email spam filtering. Artif Intell Rev 29:63–92
Article Google Scholar
Caruana G, Li M (2012) A survey of emerging approaches to spam filtering. ACM Comput Surv 44(2) (article 9)
Cho MS, Seok JH, Lee S, Kim JH (2011) Scene text extraction by superpixel crfs combining multiple character features. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1034–1038
Creusen IM, Wijnhoven RGJ, Herbschleb E, de With PHN (2010) Color exploitation in hog-based traffic sign detection. In: Proc. of the 17th Int’l Conf. on Image Processing. IEEE, pp 2669–2672
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, pp 886–893
de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. In: Proc. of the Int’l Conf. on Computer Vision Theory and Application
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, pp 2963–2970
Esposito C, Ficco M, Palmieri F, Castiglione A (2015) Smart cloud storage service selection based on fuzzy logic, theory of evidence and game theory. IEEE Trans Comput
Garg R, Hassan E, Chaudhury S, Gopal M (2011) A crf based scheme for overlapping multi-colored text graphics separation. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1215–1219
Ghebghoub Y, Boussaid O, Oukid S (2014) security model based encryption to protect data on cloud. In: Proc. of the International Conference on Information Systems and Design of Communication. pp 50–55
He JY, Li SF (2004) Hybrid chinese/english text identification in web images. In: Proc. of the 3rd Int’l Conf. on Image and Graphics. pp 361–364
Iwamura M, Kobayashi T, Kise K (2011) Recognition of multiple characters in a scene image using arrangement of local features. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1409–1413
Janeela Theresa MM, Joseph Raj V (2015) A maximum spanning tree-based dynamic fuzzy supervised neural network architecture for classification of murder cases. In: Soft Computing
Karatzas D, Mestre SR, Mas J, Nourbakhsh F, Roy PP (2011) Icdar 2011 robust reading competition challenge 1: reading text in born-digital images (web and email). In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 485–1490
Keys R (1981) Cubic convolution interpolation for digital image processing. IEEE Trans Acoust Speech Signal Process 29(6):1153–1160
Article MathSciNet MATH Google Scholar
Lee C, Chung P, Hwang M (2013) A survey on attribute-based encryption schemes of access control in cloud environments. Int J Netw Secur 15(4):231–240
Google Scholar
Lee JJ, Lee PH, Lee SW, Yuille A, Koch C (2011) Adaboost for text detection in natural scene. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 429–434
Li J, Kim K (2010) Hidden attribute-based signatures without anonymity revocation. IEEE Trans Parallel Distrib Syst 180(9):1681–1689
MathSciNet MATH Google Scholar
Li J, Wang Q, Wang C, Cao N, Ren K, Lou W (2010) Fuzzy keyword search over encrypted data in cloud computing. In: Proc. of the 29th IEEE International Conference on Computer Communications(INFOCOM 2010). IEEE, pp 441–445
Li J, Huang X, Li J, Chen X, Xiang Y (2014) Securely outsourcing attribute-based encryption with checkability. IEEE Trans Parallel Distrib Syst 25(8):2201–2210
Article Google Scholar
Li J, Li J, Chen X, Jia C, Lou W (2015) Identity-based encryption with outsourced revocation in cloud computing. In: IEEE Transactions on Computer. IEEE (available online)
Li M, Yu S, Zheng Y, Ren K, Lou W (2013) Scalable and secure sharing of personal health records in cloud computing using attribute-based encryption. IEEE Trans Parallel Distrib Syst 24(1):131–143
Article Google Scholar
Liu J, Zhang SW, Li HP, Liang W (2011) A chinese character localization method based on intergrating structure and cc-clustering for advertising images. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1044–1048
Liu Z, Li J, Chen X, Yang J, Jia C (2014) Thin-model data sharing scheme supporting keyword search in cloud storage. In: 19th Australasian Conference on Information Security and Privacy (ACISP)
Mishra A, Alahari K, Jawahar CV (2011) An mrf model for binarization of natural scene text. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 11–16
Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, pp 2687–2694
Nagy G (2000) Twenty years of document image analysis in pami. IEEE Trans Pattern Anal Mach Intell 22(1):38–62
Article MathSciNet Google Scholar
Newell AJ, Griffin LD (2011) Multiscale histogram of oriented gradient descriptors for robust character recognition. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1085–1089
Pan YF, Hou XW, Liu CL (2011) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813
Article MathSciNet Google Scholar
Perantonis SJ, Gatos B, Maragos V (2003) A novel web image processing algorithm for text area identification that helps commercial ocr engines to improve their web image recognition efficiency. In: Proc. of the 2nd Int’l Workshop on Web Document Analysis. pp 61–64
Perantonis SJ, Gatos B, Maragos V, Karkaletsis V, Petasis G (2004) Text area identification in web images. In: Proc. of Methods and Applications of Artificial Intelligence. pp 82–92
Situ LJ, Liu RZ, Tan CL (2011) Text localization in web images using probabilistic candidate selection model. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1359–1363
Uchida S, Shigeyoshi Y, Kunishige Y, Yaokai F (2011) A keypoint-based approach toward scenery character detection. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 819–823
Wakahara T, Kita K (2011) Binarization of color character strings in scene images using k-means clustering and support vector machines. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 274–278
Wang K, Belongie S (2010) Word spotting in the wild. In: Proc. of the European Conf. on Computer Vision. IEEE, pp 519–604
Wang K, Babenko B, Belongie S (2011a) End-to-end scene text recognition. In: Proc. of the 13th Int’l Conf. on Computer Vision. IEEE, pp 1457–1464
Wang XF, Huang L, Liu CP (2011b) A novel method for embedded text segmentation based on stroke and color. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 151–155
Weinman JJ, Learned-Miller, Hanson A (2009) Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans Pattern Anal Mach Intell 31(10):1733–1746
Article Google Scholar
Yao C, Bai X, Shi B, Liu W (2014) Strokelets: a learned multi-scale representation for scene text recognition. In: IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, pp 4042–4049
Yi CC, Tian YL (2011) Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans Image Process 20(9):2594–2605
Zhang HW, Liu CS, Yang C, Ding XQ (2011) An improved scene text extraction method using conditional random field and optical character recognition. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 708–712
Zhao X, Lin KH, Fu Y, Hu Y, Liu Y, Huang TS (2011) Text from corners: a novel approach to detect text and caption in videos. IEEE Trans Image Process 20(3):790–799
Article MathSciNet Google Scholar
Zheng Q, Chen K, Zhou Y, Gu C, Guan H (2010) Text localization and recognition in complex scenes using local features. In: Proc. of the 10th Asian Conf. on Computer Vision. Springer, pp 121–132

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper. This work is supported by the National Natural Science Foundation of China under Grant No. 61201424, 61301238, 61212005, the Tianjin Fundamental Research Funds under Grant No. 14JCTPJC00501, 14JCTPJC00556, and the Natural Science Foundation of Tianjin, China under Grant No. 12JCYBJC10100, 14ZCDZGX00831.

Author information

Authors and Affiliations

College of Computer and Control Engineering, Nankai University, Tianjin, People’s Republic of China
Jun Chen, Hong Zhao, Jufeng Yang, Jian Zhang, Tao Li & Kai Wang

Authors

Jun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jufeng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Kai Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Wang.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Zhao, H., Yang, J. et al. An intelligent character recognition method to filter spam images on cloud. Soft Comput 21, 753–763 (2017). https://doi.org/10.1007/s00500-015-1811-5

Download citation

Published: 25 July 2015
Issue Date: February 2017
DOI: https://doi.org/10.1007/s00500-015-1811-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An intelligent character recognition method to filter spam images on cloud

Abstract

Access this article

Similar content being viewed by others

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Improving spam email classification accuracy using ensemble techniques: a stacking approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An intelligent character recognition method to filter spam images on cloud

Abstract

Access this article

Similar content being viewed by others

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Improving spam email classification accuracy using ensemble techniques: a stacking approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation