Skip to main content
Log in

An intelligent character recognition method to filter spam images on cloud

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Cloud storage has become an important way for data sharing in recent years. Data protection for data owner and harmful data filtering for data recipients are two non-negligible problems in cloud storage. Illegal or unsuitable messages on cloud have a negative impact on minors and they are easily converted into images to avoid text-based filtering. To detect the spam image with the embedded harmful messages on cloud, soft computing methods are required for intelligent character recognition. HOG, proposed by Dalal and Triggs, has been demonstrated so far to be one of the best features for intelligent character recognition. A pre-defined sliding window is always used for the generation of candidate character images when HOG is applied to recognize the whole word. However, due to the difference in character sizes, the pre-defined window cannot exactly match with each character. Variations on scale and translation usually occur in the character image to be recognized, which have a great influence on the performance of intelligent character recognition. Aiming to solve this problem, STRHOG, an extended version of HOG, is proposed in this paper. Experiments on two public datasets and one our dataset have shown encouraging results for our work. The improved intelligent character recognition is helpful for filtering spam images on cloud. To make a fair comparison with other methods, nearest neighbor classifier is used for the intelligent character recognition. It is expected that the performance should be further improved by using better classifiers such as fuzzy neural network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Attar A, Rad RM, Atani RE (2013) A survey of image spamming and filtering techniques 40:71–105

  • Biggio B, Fumera G, Pillai I, Roli F (2011) A survey and experimental evaluation of image spam filtering techniques. Pattern Recogn Lett 32(10):1436–1446

    Article  Google Scholar 

  • Blanzieri E, Bryl A (2008) A survey of learning-based techniques of email spam filtering. Artif Intell Rev 29:63–92

    Article  Google Scholar 

  • Caruana G, Li M (2012) A survey of emerging approaches to spam filtering. ACM Comput Surv 44(2) (article 9)

  • Cho MS, Seok JH, Lee S, Kim JH (2011) Scene text extraction by superpixel crfs combining multiple character features. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1034–1038

  • Creusen IM, Wijnhoven RGJ, Herbschleb E, de With PHN (2010) Color exploitation in hog-based traffic sign detection. In: Proc. of the 17th Int’l Conf. on Image Processing. IEEE, pp 2669–2672

  • Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, pp 886–893

  • de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. In: Proc. of the Int’l Conf. on Computer Vision Theory and Application

  • Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, pp 2963–2970

  • Esposito C, Ficco M, Palmieri F, Castiglione A (2015) Smart cloud storage service selection based on fuzzy logic, theory of evidence and game theory. IEEE Trans Comput

  • Garg R, Hassan E, Chaudhury S, Gopal M (2011) A crf based scheme for overlapping multi-colored text graphics separation. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1215–1219

  • Ghebghoub Y, Boussaid O, Oukid S (2014) security model based encryption to protect data on cloud. In: Proc. of the International Conference on Information Systems and Design of Communication. pp 50–55

  • He JY, Li SF (2004) Hybrid chinese/english text identification in web images. In: Proc. of the 3rd Int’l Conf. on Image and Graphics. pp 361–364

  • Iwamura M, Kobayashi T, Kise K (2011) Recognition of multiple characters in a scene image using arrangement of local features. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1409–1413

  • Janeela Theresa MM, Joseph Raj V (2015) A maximum spanning tree-based dynamic fuzzy supervised neural network architecture for classification of murder cases. In: Soft Computing

  • Karatzas D, Mestre SR, Mas J, Nourbakhsh F, Roy PP (2011) Icdar 2011 robust reading competition challenge 1: reading text in born-digital images (web and email). In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 485–1490

  • Keys R (1981) Cubic convolution interpolation for digital image processing. IEEE Trans Acoust Speech Signal Process 29(6):1153–1160

    Article  MathSciNet  MATH  Google Scholar 

  • Lee C, Chung P, Hwang M (2013) A survey on attribute-based encryption schemes of access control in cloud environments. Int J Netw Secur 15(4):231–240

    Google Scholar 

  • Lee JJ, Lee PH, Lee SW, Yuille A, Koch C (2011) Adaboost for text detection in natural scene. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 429–434

  • Li J, Kim K (2010) Hidden attribute-based signatures without anonymity revocation. IEEE Trans Parallel Distrib Syst 180(9):1681–1689

    MathSciNet  MATH  Google Scholar 

  • Li J, Wang Q, Wang C, Cao N, Ren K, Lou W (2010) Fuzzy keyword search over encrypted data in cloud computing. In: Proc. of the 29th IEEE International Conference on Computer Communications(INFOCOM 2010). IEEE, pp 441–445

  • Li J, Huang X, Li J, Chen X, Xiang Y (2014) Securely outsourcing attribute-based encryption with checkability. IEEE Trans Parallel Distrib Syst 25(8):2201–2210

    Article  Google Scholar 

  • Li J, Li J, Chen X, Jia C, Lou W (2015) Identity-based encryption with outsourced revocation in cloud computing. In: IEEE Transactions on Computer. IEEE (available online)

  • Li M, Yu S, Zheng Y, Ren K, Lou W (2013) Scalable and secure sharing of personal health records in cloud computing using attribute-based encryption. IEEE Trans Parallel Distrib Syst 24(1):131–143

    Article  Google Scholar 

  • Liu J, Zhang SW, Li HP, Liang W (2011) A chinese character localization method based on intergrating structure and cc-clustering for advertising images. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1044–1048

  • Liu Z, Li J, Chen X, Yang J, Jia C (2014) Thin-model data sharing scheme supporting keyword search in cloud storage. In: 19th Australasian Conference on Information Security and Privacy (ACISP)

  • Mishra A, Alahari K, Jawahar CV (2011) An mrf model for binarization of natural scene text. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 11–16

  • Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, pp 2687–2694

  • Nagy G (2000) Twenty years of document image analysis in pami. IEEE Trans Pattern Anal Mach Intell 22(1):38–62

    Article  MathSciNet  Google Scholar 

  • Newell AJ, Griffin LD (2011) Multiscale histogram of oriented gradient descriptors for robust character recognition. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1085–1089

  • Pan YF, Hou XW, Liu CL (2011) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813

    Article  MathSciNet  Google Scholar 

  • Perantonis SJ, Gatos B, Maragos V (2003) A novel web image processing algorithm for text area identification that helps commercial ocr engines to improve their web image recognition efficiency. In: Proc. of the 2nd Int’l Workshop on Web Document Analysis. pp 61–64

  • Perantonis SJ, Gatos B, Maragos V, Karkaletsis V, Petasis G (2004) Text area identification in web images. In: Proc. of Methods and Applications of Artificial Intelligence. pp 82–92

  • Situ LJ, Liu RZ, Tan CL (2011) Text localization in web images using probabilistic candidate selection model. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 1359–1363

  • Uchida S, Shigeyoshi Y, Kunishige Y, Yaokai F (2011) A keypoint-based approach toward scenery character detection. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 819–823

  • Wakahara T, Kita K (2011) Binarization of color character strings in scene images using k-means clustering and support vector machines. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 274–278

  • Wang K, Belongie S (2010) Word spotting in the wild. In: Proc. of the European Conf. on Computer Vision. IEEE, pp 519–604

  • Wang K, Babenko B, Belongie S (2011a) End-to-end scene text recognition. In: Proc. of the 13th Int’l Conf. on Computer Vision. IEEE, pp 1457–1464

  • Wang XF, Huang L, Liu CP (2011b) A novel method for embedded text segmentation based on stroke and color. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 151–155

  • Weinman JJ, Learned-Miller, Hanson A (2009) Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans Pattern Anal Mach Intell 31(10):1733–1746

    Article  Google Scholar 

  • Yao C, Bai X, Shi B, Liu W (2014) Strokelets: a learned multi-scale representation for scene text recognition. In: IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, pp 4042–4049

  • Yi CC, Tian YL (2011) Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans Image Process 20(9):2594–2605

  • Zhang HW, Liu CS, Yang C, Ding XQ (2011) An improved scene text extraction method using conditional random field and optical character recognition. In: Proc. of the 11th Int’l Conf. on Document Analysis and Recognition. IEEE, pp 708–712

  • Zhao X, Lin KH, Fu Y, Hu Y, Liu Y, Huang TS (2011) Text from corners: a novel approach to detect text and caption in videos. IEEE Trans Image Process 20(3):790–799

    Article  MathSciNet  Google Scholar 

  • Zheng Q, Chen K, Zhou Y, Gu C, Guan H (2010) Text localization and recognition in complex scenes using local features. In: Proc. of the 10th Asian Conf. on Computer Vision. Springer, pp 121–132

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper. This work is supported by the National Natural Science Foundation of China under Grant No. 61201424, 61301238, 61212005, the Tianjin Fundamental Research Funds under Grant No. 14JCTPJC00501, 14JCTPJC00556, and the Natural Science Foundation of Tianjin, China under Grant No. 12JCYBJC10100, 14ZCDZGX00831.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Wang.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Zhao, H., Yang, J. et al. An intelligent character recognition method to filter spam images on cloud. Soft Comput 21, 753–763 (2017). https://doi.org/10.1007/s00500-015-1811-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-015-1811-5

Keywords

Navigation