Abstract
CAPTCHA (Completely Automated Public Turing test to tell Computer and Human Apart) is widely used as a standard security mechanism to protect resources on websites. Among various kinds of CAPTCHAs, the text-based CAPTCHA is the most popular scheme, which consists of English letters, Arabic digits and other character sets, such as Chinese characters. Due to the large quantity of Chinese characters and complicated character structure, it is difficult for bots to crack Chinese character CAPTCHAs. Thus, Chinese character CAPTCHAs have been widely applied in China. Nevertheless, effective offensive approaches are necessary to help CAPTCHA designers find security vulnerabilities to improve defense mechanisms. To deal with variable-length Chinese character CAPTCHAs with noises, an automatic attacking approach is proposed, which includes preprocessing, character segmentation and character recognition. For character recognition, two methods are proposed: MGLCR (Multi-scale Gabor and Logistic regression based CAPTCHA Recognition) and CCR (Convolutional neural network based CAPTCHA Recognition). MGLCR extracts features by multi-scale Gabor filters and classifies characters with logistic regression. CCR extracts features and recognize characters automatically with CNN (Convolutional Neural Network). Experimental results show that the proposed approaches are efficient in attacking variable-length Chinese character CAPTCHAs with noises. The pros and cons of proposed MGLCR and CCR methods are discussed, which outperform state-of-the-art methods. Besides, the proposed methods could achieve satisfactory results in breaking the mixed character CAPTCHAs which consist of English letters, Arabic digits, Chinese characters and mathematical operators.
Similar content being viewed by others
References
Abdalla K, Kaya M (2017) An evaluation of different types of captcha: effectiveness, user-friendliness, and limitations International Journal of Scientific Research in Information Systems and Engineering (IJSRISE) 2(3)
Al-Fannah NM (2017) Using aesthetic judgements to distinguish between humans and computers. arXiv:1704.02972
Anand S, Mittal S, Tuzel O, Meer P (2014) Semi-supervised kernel mean shift clustering. IEEE Trans Pattern Anal Mach Intell 36(6):1201–1215
Arfan Jaffar M (2017) A dynamic fuzzy genetic algorithm for natural image segmentation using adaptive mean shift. J Exp Theor Artif Intell 29(1):149–156
Benchaou S, Nasri M, El Melhaoui O (2017) Features extraction for offline handwritten character recognition. In: Europe and MENA cooperation advances in information and communication technologies, Springer, pp 209–217
Burger W, Burge MJ (2016) Digital image processing: an algorithmic introduction using Java. Springer, Berlin
Bursztein E, Aigrain J, Moscicki A, Mitchell JC (2014) The end is nigh: gneric solving of text-based captchas. In: WOOT
Bursztein E, Martin M, Mitchell J (2011) Text-based captcha strengths and weaknesses. In: Proceedings of the 18th ACM conference on Computer and communications security, ACM, pp 125–138
Chandavale AA, Sapkal A (2012) Security analysis of captcha. In: International conference on security in computer networks and distributed systems, Springer, pp 97–109
Cote M, Albu AB (2015) Robust texture classification by aggregating pixel-based lbp statistics. IEEE Signal Process Lett 22(11):2102–2106
Eswaran S, Ashok A, Krishnan RH (2017) Graphical passwords effects of tolerance password, image choice and otp login security. International Journal of Research and Engineering 4(1):31–34
Gao H, Cao F, Zhang P (2016) Annulus: a novel image-based captcha scheme. In: Region 10 conference (TENCON), 2016 IEEE, IEEE, pp 464–467
Gao H, Tang M, Liu Y, Zhang P, Liu X (2017) Research on the security of microsoft’s two-layer captcha. IEEE Trans Inf Forensics Secur 12(7):1671–1685
Garg G, Pollett C (2016) Neural network captcha crackers. In: Future technologies conference (FTC), IEEE, pp 853–861
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hussain R, Gao H, Shaikh RA (2016) Segmentation of connected characters in text-based captchas for intelligent character recognition. Multimedia Tools and Applications 76(24):25547–25561
Jeong J, Yoon TS, Park JB (2017) Mean shift tracker combined with online learning-based detector and kalman filtering for real-time tracking. Expert Syst Appl 79:194–206
Karthik CP, Recasens RA (2015) Breaking microsoft’s captcha. Tech. rep., Tech. Rep
Kaur R et al (2016) A non ocr approach for math captcha design based on boolean algebra using digital gates to enhance web security. In: International conference on Wireless communications, signal processing and networking (wiSPNET), IEEE, pp 862–866
Khan M, Shah T, Batool SI (2016) A new implementation of chaotic s-boxes in captcha. SIViP 10 (2):293–300
Khan S, Hussain M, Aboalsamh H, Bebis G (2017) A comparison of different gabor feature extraction approaches for mass classification in mammography. Multimedia Tools and Applications 76(1):33–57
LeCun Y et al (2015) Lenet-5, convolutional neural networks. http://yann.lecun.com/exdb/lenet
Li K, Wu Y, Song S, sun Y, Wang J, Li Y (2017) A novel method for spacecraft electrical fault detection based on fcm clustering and wpsvm classification with pca feature extraction. Proceedings of the Institution of Mechanical Engineers Part G: Journal of Aerospace Engineering 231(1):98–108
Li P, Peng L, Wen J (2016) Rejecting character recognition errors using cnn based confidence estimation. Chin J Electron 25(3):520–526
Liu L, Lao S, Fieguth PW, Guo Y, Wang X, Pietikäinen M. (2016) Median robust extended local binary pattern for texture classification. IEEE Trans Image Process 25(3):1368–1381
Lv Y, Cai F, Lin D, Cao D (2016) Chinese character captcha recognition based on convolution neural network. In: 2016 IEEE congress on evolutionary computation (CEC), IEEE, pp 4854–4859
Najafi MH, Salehi ME (2016) A fast fault-tolerant architecture for sauvola local image thresholding algorithm using stochastic computing. IEEE Trans Very Large Scale Integr VLSI Syst 24(2):808–812
Nguyen KC, Nakagawa M (2016) Text-line and character segmentation for offline recognition of handwritten japanese text. IEICE Technical Report 115(517):53–58
Pastore J, Brun M, Bouchet A, Ballarin V (2017) Color morphological reconstruction as a segmentation tool for microscope cell images. In: VII Latin american congress on biomedical engineering CLAIB 2016, bucaramanga, santander, colombia, october 26th-28th, 2016, Springer, pp s312–315
Qiu Y, Ming D, Zhang X (2016) Object oriented land cover classification combining scale parameter preestimation and mean-shift segmentation. In: 2016 IEEE international geoscience and remote sensing symposium (IGARSS), IEEE, pp 6332–6335
Shi G, Yu W, Xiao Z (2016) A method for license plate recognition in vehicle based on gabor feature and synergetic neural network. Journal of Hebei University(Natural Science Edition 36(2):210–217
Shi J, Wang Y, Xu D, Yan C, Chen T, He Y, Tang L, Nie M, Duan P, Yan D et al (2017) Terahertz imaging based on morphological reconstruction. IEEE J Sel Top Quantum Electron 23(4):1–7
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Singh VP, Pal P (2014) Survey of different types of captcha. International Journal of Computer Science and Information Technologies 5(2):2242–2245
Starostenko O, Cruz-Perez C, Uceda-Ponga F, Alarcon-Aquino V (2015) Breaking text-based captchas with variable word and character orientation. Pattern Recogn 48(4):1101–1112
Szabó K. Z., Jordan G, Petrik A, Horváth Á. , Szabó C. (2017) Spatial analysis of ambient gamma dose equivalent rate data by means of digital image processing techniques. J Environ Radioact 166:309–320
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9
Thakur A, Chaware R, Nikhil S, Islam SH (2015) A reading oriented overlapping text based captcha. In: 2015 international conference on trends in automation, communications and computing technology (i-TACT-15), vol 1. IEEE, pp 1–6
Thangavelu S, Purusothaman T, Gowrison G (2017) Action based color captcha approach based on human cognitive factor in web applications. Asian Journal of Research in Social Sciences and Humanities 7(2):784–795
Torok L, Liatsis P, Viterbo J, Conci A et al (2017) k-ms. Pattern Recogn 66(C):392–403
Wang Y, Huang Y, Zheng W, Zhou Z, Liu D, Lu M (2017) Combining convolutional neural network and self-adaptive algorithm to defeat synthetic multi-digit text-based captcha. In: 2017 IEEE international conference on Industrial technology (ICIT), IEEE, pp 980–985
Xiao X, Yang Y, Ahmad T, Jin L, Chang T (2017) Design of a very compact cnn classifier for online handwritten chinese character recognition using dropweight and global pooling. arXiv:1705.05207
Xu S, Li M, Zheng RR, Michael S (2017) Manchu character segmentation and recognition method. J Discret Math Sci Cryptogr 20(1):43–53
Yan J (2016) A simple generic attack on text captchas
Yin L, Yin D, Zhang R, Wang D (2014) A recognition method for distorted and merged text-based captcha. Pattern Recognit Artif Intell 27(3):235–241
Zhang Y, Yan H, Zou X, Tao F, Zhang L (2016) Image threshold processing based on simulated annealing and otsu method. In: Proceedings of the 2015 Chinese intelligent systems conference, Springer, pp 223–231
Acknowledgements
This paper is supported by National Natural Science Foundation of China under grant No. 61303094, also supported by the Science and Technology Commission of Shanghai Municipality under grant No. 16111107800 and No. 16511102400, by Innovation Program of Shanghai Municipal Education Commission under grant No. 14YZ024 and by Shanghai Key Laboratory of Financial Information Technology.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, X., Dai, S., Guo, Y. et al. A machine learning attack against variable-length Chinese character CAPTCHAs. Appl Intell 49, 1548–1565 (2019). https://doi.org/10.1007/s10489-018-1342-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1342-8