Skip to main content
Log in

A machine learning attack against variable-length Chinese character CAPTCHAs

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

CAPTCHA (Completely Automated Public Turing test to tell Computer and Human Apart) is widely used as a standard security mechanism to protect resources on websites. Among various kinds of CAPTCHAs, the text-based CAPTCHA is the most popular scheme, which consists of English letters, Arabic digits and other character sets, such as Chinese characters. Due to the large quantity of Chinese characters and complicated character structure, it is difficult for bots to crack Chinese character CAPTCHAs. Thus, Chinese character CAPTCHAs have been widely applied in China. Nevertheless, effective offensive approaches are necessary to help CAPTCHA designers find security vulnerabilities to improve defense mechanisms. To deal with variable-length Chinese character CAPTCHAs with noises, an automatic attacking approach is proposed, which includes preprocessing, character segmentation and character recognition. For character recognition, two methods are proposed: MGLCR (Multi-scale Gabor and Logistic regression based CAPTCHA Recognition) and CCR (Convolutional neural network based CAPTCHA Recognition). MGLCR extracts features by multi-scale Gabor filters and classifies characters with logistic regression. CCR extracts features and recognize characters automatically with CNN (Convolutional Neural Network). Experimental results show that the proposed approaches are efficient in attacking variable-length Chinese character CAPTCHAs with noises. The pros and cons of proposed MGLCR and CCR methods are discussed, which outperform state-of-the-art methods. Besides, the proposed methods could achieve satisfactory results in breaking the mixed character CAPTCHAs which consist of English letters, Arabic digits, Chinese characters and mathematical operators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Abdalla K, Kaya M (2017) An evaluation of different types of captcha: effectiveness, user-friendliness, and limitations International Journal of Scientific Research in Information Systems and Engineering (IJSRISE) 2(3)

  2. Al-Fannah NM (2017) Using aesthetic judgements to distinguish between humans and computers. arXiv:1704.02972

  3. Anand S, Mittal S, Tuzel O, Meer P (2014) Semi-supervised kernel mean shift clustering. IEEE Trans Pattern Anal Mach Intell 36(6):1201–1215

    Article  Google Scholar 

  4. Arfan Jaffar M (2017) A dynamic fuzzy genetic algorithm for natural image segmentation using adaptive mean shift. J Exp Theor Artif Intell 29(1):149–156

    Article  Google Scholar 

  5. Benchaou S, Nasri M, El Melhaoui O (2017) Features extraction for offline handwritten character recognition. In: Europe and MENA cooperation advances in information and communication technologies, Springer, pp 209–217

  6. Burger W, Burge MJ (2016) Digital image processing: an algorithmic introduction using Java. Springer, Berlin

    Book  Google Scholar 

  7. Bursztein E, Aigrain J, Moscicki A, Mitchell JC (2014) The end is nigh: gneric solving of text-based captchas. In: WOOT

  8. Bursztein E, Martin M, Mitchell J (2011) Text-based captcha strengths and weaknesses. In: Proceedings of the 18th ACM conference on Computer and communications security, ACM, pp 125–138

  9. Chandavale AA, Sapkal A (2012) Security analysis of captcha. In: International conference on security in computer networks and distributed systems, Springer, pp 97–109

  10. Cote M, Albu AB (2015) Robust texture classification by aggregating pixel-based lbp statistics. IEEE Signal Process Lett 22(11):2102–2106

    Article  Google Scholar 

  11. Eswaran S, Ashok A, Krishnan RH (2017) Graphical passwords effects of tolerance password, image choice and otp login security. International Journal of Research and Engineering 4(1):31–34

    Google Scholar 

  12. Gao H, Cao F, Zhang P (2016) Annulus: a novel image-based captcha scheme. In: Region 10 conference (TENCON), 2016 IEEE, IEEE, pp 464–467

  13. Gao H, Tang M, Liu Y, Zhang P, Liu X (2017) Research on the security of microsoft’s two-layer captcha. IEEE Trans Inf Forensics Secur 12(7):1671–1685

    Article  Google Scholar 

  14. Garg G, Pollett C (2016) Neural network captcha crackers. In: Future technologies conference (FTC), IEEE, pp 853–861

  15. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  16. Hussain R, Gao H, Shaikh RA (2016) Segmentation of connected characters in text-based captchas for intelligent character recognition. Multimedia Tools and Applications 76(24):25547–25561

    Article  Google Scholar 

  17. Jeong J, Yoon TS, Park JB (2017) Mean shift tracker combined with online learning-based detector and kalman filtering for real-time tracking. Expert Syst Appl 79:194–206

    Article  Google Scholar 

  18. Karthik CP, Recasens RA (2015) Breaking microsoft’s captcha. Tech. rep., Tech. Rep

  19. Kaur R et al (2016) A non ocr approach for math captcha design based on boolean algebra using digital gates to enhance web security. In: International conference on Wireless communications, signal processing and networking (wiSPNET), IEEE, pp 862–866

  20. Khan M, Shah T, Batool SI (2016) A new implementation of chaotic s-boxes in captcha. SIViP 10 (2):293–300

    Article  Google Scholar 

  21. Khan S, Hussain M, Aboalsamh H, Bebis G (2017) A comparison of different gabor feature extraction approaches for mass classification in mammography. Multimedia Tools and Applications 76(1):33–57

    Article  Google Scholar 

  22. LeCun Y et al (2015) Lenet-5, convolutional neural networks. http://yann.lecun.com/exdb/lenet

  23. Li K, Wu Y, Song S, sun Y, Wang J, Li Y (2017) A novel method for spacecraft electrical fault detection based on fcm clustering and wpsvm classification with pca feature extraction. Proceedings of the Institution of Mechanical Engineers Part G: Journal of Aerospace Engineering 231(1):98–108

    Article  Google Scholar 

  24. Li P, Peng L, Wen J (2016) Rejecting character recognition errors using cnn based confidence estimation. Chin J Electron 25(3):520–526

    Article  Google Scholar 

  25. Liu L, Lao S, Fieguth PW, Guo Y, Wang X, Pietikäinen M. (2016) Median robust extended local binary pattern for texture classification. IEEE Trans Image Process 25(3):1368–1381

    Article  MathSciNet  MATH  Google Scholar 

  26. Lv Y, Cai F, Lin D, Cao D (2016) Chinese character captcha recognition based on convolution neural network. In: 2016 IEEE congress on evolutionary computation (CEC), IEEE, pp 4854–4859

  27. Najafi MH, Salehi ME (2016) A fast fault-tolerant architecture for sauvola local image thresholding algorithm using stochastic computing. IEEE Trans Very Large Scale Integr VLSI Syst 24(2):808–812

    Article  Google Scholar 

  28. Nguyen KC, Nakagawa M (2016) Text-line and character segmentation for offline recognition of handwritten japanese text. IEICE Technical Report 115(517):53–58

    Google Scholar 

  29. Pastore J, Brun M, Bouchet A, Ballarin V (2017) Color morphological reconstruction as a segmentation tool for microscope cell images. In: VII Latin american congress on biomedical engineering CLAIB 2016, bucaramanga, santander, colombia, october 26th-28th, 2016, Springer, pp s312–315

  30. Qiu Y, Ming D, Zhang X (2016) Object oriented land cover classification combining scale parameter preestimation and mean-shift segmentation. In: 2016 IEEE international geoscience and remote sensing symposium (IGARSS), IEEE, pp 6332–6335

  31. Shi G, Yu W, Xiao Z (2016) A method for license plate recognition in vehicle based on gabor feature and synergetic neural network. Journal of Hebei University(Natural Science Edition 36(2):210–217

    Google Scholar 

  32. Shi J, Wang Y, Xu D, Yan C, Chen T, He Y, Tang L, Nie M, Duan P, Yan D et al (2017) Terahertz imaging based on morphological reconstruction. IEEE J Sel Top Quantum Electron 23(4):1–7

    Article  Google Scholar 

  33. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  34. Singh VP, Pal P (2014) Survey of different types of captcha. International Journal of Computer Science and Information Technologies 5(2):2242–2245

    MathSciNet  Google Scholar 

  35. Starostenko O, Cruz-Perez C, Uceda-Ponga F, Alarcon-Aquino V (2015) Breaking text-based captchas with variable word and character orientation. Pattern Recogn 48(4):1101–1112

    Article  Google Scholar 

  36. Szabó K. Z., Jordan G, Petrik A, Horváth Á. , Szabó C. (2017) Spatial analysis of ambient gamma dose equivalent rate data by means of digital image processing techniques. J Environ Radioact 166:309–320

    Article  Google Scholar 

  37. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9

  38. Thakur A, Chaware R, Nikhil S, Islam SH (2015) A reading oriented overlapping text based captcha. In: 2015 international conference on trends in automation, communications and computing technology (i-TACT-15), vol 1. IEEE, pp 1–6

  39. Thangavelu S, Purusothaman T, Gowrison G (2017) Action based color captcha approach based on human cognitive factor in web applications. Asian Journal of Research in Social Sciences and Humanities 7(2):784–795

    Article  Google Scholar 

  40. Torok L, Liatsis P, Viterbo J, Conci A et al (2017) k-ms. Pattern Recogn 66(C):392–403

    Google Scholar 

  41. Wang Y, Huang Y, Zheng W, Zhou Z, Liu D, Lu M (2017) Combining convolutional neural network and self-adaptive algorithm to defeat synthetic multi-digit text-based captcha. In: 2017 IEEE international conference on Industrial technology (ICIT), IEEE, pp 980–985

  42. Xiao X, Yang Y, Ahmad T, Jin L, Chang T (2017) Design of a very compact cnn classifier for online handwritten chinese character recognition using dropweight and global pooling. arXiv:1705.05207

  43. Xu S, Li M, Zheng RR, Michael S (2017) Manchu character segmentation and recognition method. J Discret Math Sci Cryptogr 20(1):43–53

    Article  Google Scholar 

  44. Yan J (2016) A simple generic attack on text captchas

  45. Yin L, Yin D, Zhang R, Wang D (2014) A recognition method for distorted and merged text-based captcha. Pattern Recognit Artif Intell 27(3):235–241

    Google Scholar 

  46. Zhang Y, Yan H, Zou X, Tao F, Zhang L (2016) Image threshold processing based on simulated annealing and otsu method. In: Proceedings of the 2015 Chinese intelligent systems conference, Springer, pp 223–231

Download references

Acknowledgements

This paper is supported by National Natural Science Foundation of China under grant No. 61303094, also supported by the Science and Technology Commission of Shanghai Municipality under grant No. 16111107800 and No. 16511102400, by Innovation Program of Shanghai Municipal Education Commission under grant No. 14YZ024 and by Shanghai Key Laboratory of Financial Information Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xing Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, X., Dai, S., Guo, Y. et al. A machine learning attack against variable-length Chinese character CAPTCHAs. Appl Intell 49, 1548–1565 (2019). https://doi.org/10.1007/s10489-018-1342-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1342-8

Keywords

Navigation