A machine learning attack against variable-length Chinese character CAPTCHAs

Wu, Xing; Dai, Shuji; Guo, Yike; Fujita, Hamido

doi:10.1007/s10489-018-1342-8

A machine learning attack against variable-length Chinese character CAPTCHAs

Published: 20 November 2018

Volume 49, pages 1548–1565, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xing Wu¹,
Shuji Dai¹,
Yike Guo¹ &
…
Hamido Fujita²

544 Accesses
14 Citations
Explore all metrics

Abstract

CAPTCHA (Completely Automated Public Turing test to tell Computer and Human Apart) is widely used as a standard security mechanism to protect resources on websites. Among various kinds of CAPTCHAs, the text-based CAPTCHA is the most popular scheme, which consists of English letters, Arabic digits and other character sets, such as Chinese characters. Due to the large quantity of Chinese characters and complicated character structure, it is difficult for bots to crack Chinese character CAPTCHAs. Thus, Chinese character CAPTCHAs have been widely applied in China. Nevertheless, effective offensive approaches are necessary to help CAPTCHA designers find security vulnerabilities to improve defense mechanisms. To deal with variable-length Chinese character CAPTCHAs with noises, an automatic attacking approach is proposed, which includes preprocessing, character segmentation and character recognition. For character recognition, two methods are proposed: MGLCR (Multi-scale Gabor and Logistic regression based CAPTCHA Recognition) and CCR (Convolutional neural network based CAPTCHA Recognition). MGLCR extracts features by multi-scale Gabor filters and classifies characters with logistic regression. CCR extracts features and recognize characters automatically with CNN (Convolutional Neural Network). Experimental results show that the proposed approaches are efficient in attacking variable-length Chinese character CAPTCHAs with noises. The pros and cons of proposed MGLCR and CCR methods are discussed, which outperform state-of-the-art methods. Besides, the proposed methods could achieve satisfactory results in breaking the mixed character CAPTCHAs which consist of English letters, Arabic digits, Chinese characters and mathematical operators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognizing Character-Matching CAPTCHA Using Convolutional Neural Networks with Triple Loss

A Transformer-Based Network with Character-Level Masks for CAPTCHA Recognition

ALEC: An Accurate, Light and Efficient Network for CAPTCHA Recognition

References

Abdalla K, Kaya M (2017) An evaluation of different types of captcha: effectiveness, user-friendliness, and limitations International Journal of Scientific Research in Information Systems and Engineering (IJSRISE) 2(3)
Al-Fannah NM (2017) Using aesthetic judgements to distinguish between humans and computers. arXiv:1704.02972
Anand S, Mittal S, Tuzel O, Meer P (2014) Semi-supervised kernel mean shift clustering. IEEE Trans Pattern Anal Mach Intell 36(6):1201–1215
Article Google Scholar
Arfan Jaffar M (2017) A dynamic fuzzy genetic algorithm for natural image segmentation using adaptive mean shift. J Exp Theor Artif Intell 29(1):149–156
Article Google Scholar
Benchaou S, Nasri M, El Melhaoui O (2017) Features extraction for offline handwritten character recognition. In: Europe and MENA cooperation advances in information and communication technologies, Springer, pp 209–217
Burger W, Burge MJ (2016) Digital image processing: an algorithmic introduction using Java. Springer, Berlin
Book Google Scholar
Bursztein E, Aigrain J, Moscicki A, Mitchell JC (2014) The end is nigh: gneric solving of text-based captchas. In: WOOT
Bursztein E, Martin M, Mitchell J (2011) Text-based captcha strengths and weaknesses. In: Proceedings of the 18th ACM conference on Computer and communications security, ACM, pp 125–138
Chandavale AA, Sapkal A (2012) Security analysis of captcha. In: International conference on security in computer networks and distributed systems, Springer, pp 97–109
Cote M, Albu AB (2015) Robust texture classification by aggregating pixel-based lbp statistics. IEEE Signal Process Lett 22(11):2102–2106
Article Google Scholar
Eswaran S, Ashok A, Krishnan RH (2017) Graphical passwords effects of tolerance password, image choice and otp login security. International Journal of Research and Engineering 4(1):31–34
Google Scholar
Gao H, Cao F, Zhang P (2016) Annulus: a novel image-based captcha scheme. In: Region 10 conference (TENCON), 2016 IEEE, IEEE, pp 464–467
Gao H, Tang M, Liu Y, Zhang P, Liu X (2017) Research on the security of microsoft’s two-layer captcha. IEEE Trans Inf Forensics Secur 12(7):1671–1685
Article Google Scholar
Garg G, Pollett C (2016) Neural network captcha crackers. In: Future technologies conference (FTC), IEEE, pp 853–861
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hussain R, Gao H, Shaikh RA (2016) Segmentation of connected characters in text-based captchas for intelligent character recognition. Multimedia Tools and Applications 76(24):25547–25561
Article Google Scholar
Jeong J, Yoon TS, Park JB (2017) Mean shift tracker combined with online learning-based detector and kalman filtering for real-time tracking. Expert Syst Appl 79:194–206
Article Google Scholar
Karthik CP, Recasens RA (2015) Breaking microsoft’s captcha. Tech. rep., Tech. Rep
Kaur R et al (2016) A non ocr approach for math captcha design based on boolean algebra using digital gates to enhance web security. In: International conference on Wireless communications, signal processing and networking (wiSPNET), IEEE, pp 862–866
Khan M, Shah T, Batool SI (2016) A new implementation of chaotic s-boxes in captcha. SIViP 10 (2):293–300
Article Google Scholar
Khan S, Hussain M, Aboalsamh H, Bebis G (2017) A comparison of different gabor feature extraction approaches for mass classification in mammography. Multimedia Tools and Applications 76(1):33–57
Article Google Scholar
LeCun Y et al (2015) Lenet-5, convolutional neural networks. http://yann.lecun.com/exdb/lenet
Li K, Wu Y, Song S, sun Y, Wang J, Li Y (2017) A novel method for spacecraft electrical fault detection based on fcm clustering and wpsvm classification with pca feature extraction. Proceedings of the Institution of Mechanical Engineers Part G: Journal of Aerospace Engineering 231(1):98–108
Article Google Scholar
Li P, Peng L, Wen J (2016) Rejecting character recognition errors using cnn based confidence estimation. Chin J Electron 25(3):520–526
Article Google Scholar
Liu L, Lao S, Fieguth PW, Guo Y, Wang X, Pietikäinen M. (2016) Median robust extended local binary pattern for texture classification. IEEE Trans Image Process 25(3):1368–1381
Article MathSciNet MATH Google Scholar
Lv Y, Cai F, Lin D, Cao D (2016) Chinese character captcha recognition based on convolution neural network. In: 2016 IEEE congress on evolutionary computation (CEC), IEEE, pp 4854–4859
Najafi MH, Salehi ME (2016) A fast fault-tolerant architecture for sauvola local image thresholding algorithm using stochastic computing. IEEE Trans Very Large Scale Integr VLSI Syst 24(2):808–812
Article Google Scholar
Nguyen KC, Nakagawa M (2016) Text-line and character segmentation for offline recognition of handwritten japanese text. IEICE Technical Report 115(517):53–58
Google Scholar
Pastore J, Brun M, Bouchet A, Ballarin V (2017) Color morphological reconstruction as a segmentation tool for microscope cell images. In: VII Latin american congress on biomedical engineering CLAIB 2016, bucaramanga, santander, colombia, october 26th-28th, 2016, Springer, pp s312–315
Qiu Y, Ming D, Zhang X (2016) Object oriented land cover classification combining scale parameter preestimation and mean-shift segmentation. In: 2016 IEEE international geoscience and remote sensing symposium (IGARSS), IEEE, pp 6332–6335
Shi G, Yu W, Xiao Z (2016) A method for license plate recognition in vehicle based on gabor feature and synergetic neural network. Journal of Hebei University(Natural Science Edition 36(2):210–217
Google Scholar
Shi J, Wang Y, Xu D, Yan C, Chen T, He Y, Tang L, Nie M, Duan P, Yan D et al (2017) Terahertz imaging based on morphological reconstruction. IEEE J Sel Top Quantum Electron 23(4):1–7
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Singh VP, Pal P (2014) Survey of different types of captcha. International Journal of Computer Science and Information Technologies 5(2):2242–2245
MathSciNet Google Scholar
Starostenko O, Cruz-Perez C, Uceda-Ponga F, Alarcon-Aquino V (2015) Breaking text-based captchas with variable word and character orientation. Pattern Recogn 48(4):1101–1112
Article Google Scholar
Szabó K. Z., Jordan G, Petrik A, Horváth Á. , Szabó C. (2017) Spatial analysis of ambient gamma dose equivalent rate data by means of digital image processing techniques. J Environ Radioact 166:309–320
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9
Thakur A, Chaware R, Nikhil S, Islam SH (2015) A reading oriented overlapping text based captcha. In: 2015 international conference on trends in automation, communications and computing technology (i-TACT-15), vol 1. IEEE, pp 1–6
Thangavelu S, Purusothaman T, Gowrison G (2017) Action based color captcha approach based on human cognitive factor in web applications. Asian Journal of Research in Social Sciences and Humanities 7(2):784–795
Article Google Scholar
Torok L, Liatsis P, Viterbo J, Conci A et al (2017) k-ms. Pattern Recogn 66(C):392–403
Google Scholar
Wang Y, Huang Y, Zheng W, Zhou Z, Liu D, Lu M (2017) Combining convolutional neural network and self-adaptive algorithm to defeat synthetic multi-digit text-based captcha. In: 2017 IEEE international conference on Industrial technology (ICIT), IEEE, pp 980–985
Xiao X, Yang Y, Ahmad T, Jin L, Chang T (2017) Design of a very compact cnn classifier for online handwritten chinese character recognition using dropweight and global pooling. arXiv:1705.05207
Xu S, Li M, Zheng RR, Michael S (2017) Manchu character segmentation and recognition method. J Discret Math Sci Cryptogr 20(1):43–53
Article Google Scholar
Yan J (2016) A simple generic attack on text captchas
Yin L, Yin D, Zhang R, Wang D (2014) A recognition method for distorted and merged text-based captcha. Pattern Recognit Artif Intell 27(3):235–241
Google Scholar
Zhang Y, Yan H, Zou X, Tao F, Zhang L (2016) Image threshold processing based on simulated annealing and otsu method. In: Proceedings of the 2015 Chinese intelligent systems conference, Springer, pp 223–231

Download references

Acknowledgements

This paper is supported by National Natural Science Foundation of China under grant No. 61303094, also supported by the Science and Technology Commission of Shanghai Municipality under grant No. 16111107800 and No. 16511102400, by Innovation Program of Shanghai Municipal Education Commission under grant No. 14YZ024 and by Shanghai Key Laboratory of Financial Information Technology.

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
Xing Wu, Shuji Dai & Yike Guo
Intelligent Software Systems Laboratory, Iwate Prefectural University, 020-0193, Iwate, Japan
Hamido Fujita

Authors

Xing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shuji Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yike Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hamido Fujita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xing Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, X., Dai, S., Guo, Y. et al. A machine learning attack against variable-length Chinese character CAPTCHAs. Appl Intell 49, 1548–1565 (2019). https://doi.org/10.1007/s10489-018-1342-8

Download citation

Published: 20 November 2018
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s10489-018-1342-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A machine learning attack against variable-length Chinese character CAPTCHAs

Abstract

Access this article

Similar content being viewed by others

Recognizing Character-Matching CAPTCHA Using Convolutional Neural Networks with Triple Loss

A Transformer-Based Network with Character-Level Masks for CAPTCHA Recognition

ALEC: An Accurate, Light and Efficient Network for CAPTCHA Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A machine learning attack against variable-length Chinese character CAPTCHAs

Abstract

Access this article

Similar content being viewed by others

Recognizing Character-Matching CAPTCHA Using Convolutional Neural Networks with Triple Loss

A Transformer-Based Network with Character-Level Masks for CAPTCHA Recognition

ALEC: An Accurate, Light and Efficient Network for CAPTCHA Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation