Abstract
Manual data annotation is a time consuming activity. A novel strategy for automatic training of the CAPTCHA breaking system with no manual dataset creation is presented in this paper. We demonstrate the feasibility of the attack against a text-based CAPTCHA scheme utilizing similar network infrastructure used for Denial of Service attacks. The main goal of our research is to present a possible vulnerability in CAPTCHA systems when combining the brute-force attack with transfer learning. The classification step utilizes a simple convolutional neural network with 15 layers. Training stage uses automatically prepared dataset created without any human intervention and transfer learning for fine-tuning the deep neural network classifier. The designed system for breaking text-based CAPTCHAs achieved 80% classification accuracy after 6 fine-tuning steps for a 5 digit text-based CAPTCHA system. The results presented in this paper suggest, that even the simple attack with a large number of attacking computers can be an effective alternative to current CAPTCHA breaking systems.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
von Ahn L, Blum M, Hopper NJ, Langford J (2003) CAPTCHA: Using Hard AI Problems for Security. Lecture Notes in Computer Science. Springer, Berlin, pp 294–311. https://doi.org/10.1007/3-540-39200-9_18
Arai T, Okabe Y, Matsumoto Y (2021) Precursory analysis of attack-log time series by machine learning for detecting bots in CAPTCHA. In: 2021 International Conference on information networking (ICOIN), pp. 295–300 https://doi.org/10.1109/ICOIN50884.2021.9333881
Arai T, Okabe Y, Matsumoto Y, Kawamura, K (2020) Detection of Bots in CAPTCHA as a cloud service utilizing machine learning. In: 2020 International conference on information networking (ICOIN), pp. 584–589 https://doi.org/10.1109/ICOIN48656.2020.9016522
Athanasopoulos E, Antonatos S (2006) Enhanced CAPTCHAs: using animation to tell humans and computers apart. Ifip Int Federation Information Process 4237:97–108. https://doi.org/10.1007/11909033_9
Bursztein E, Aigrain J, Moscicki A, Mitchell JC (2014) The end is nigh: generic solving of text-based CAPTCHAs http://portal.acm.org/citation.cfm?id=2671296
Bursztein E, Beauxis R, Paskov H, Perito D, Fabry C, Mitchell J (2011)The failure of noise-based non-continuous audio captchas. In: Proceedings - IEEE symposium on security and privacy, pp. 19–31 https://doi.org/10.1109/SP.2011.14
Bursztein E, Bethard S (2009) Decaptcha: breaking 75% of eBay audio CAPTCHAs. Proceedings of the 3rd USENIX conference on Offensive technologies 1(8), 1–7
Bursztein E, Bethard S, Fabry C, Mitchell JC, Jurafsky D (2010) How good are humans at solving CAPTCHAs? a large scale evaluation. In: Proceedings - IEEE symposium on security and privacy. pp. 399–413. IEEE . https://doi.org/10.1109/SP.2010.31. http://ieeexplore.ieee.org/document/5504799/
Bursztein E, Martin M, Mitchell JC (2011) Text-based CAPTCHA strengths and weaknesses. In: proceedings of the ACM conference on computer and communications security, pp. 125–138 . https://doi.org/10.1145/2046707.2046724
Bursztein E, Moscicki A, Fabry C, Bethard S, Mitchell JC, Jurafasky D (2014) Easy Does It: more usable CAPTCHAs. In: CHI ’14 proceedings of the SIGCHI conference on human factors in computing systems. pp. 2637–2646. 1600 Amphitheatre Pkwy https://www.elie.net/publication/easy-does-it-more-usable-captchas
Chellapilla K, Larson K, Simard P, Czerwinski M (2005) Computers beat humans at single character recognition in reading based human interaction proofs (HIPs). In: 2nd Conference on Email and Anti-Spam, pp. 1–8. Conference on Email and Anti-Spam, CEAS
Chellapilla K, Simard P (2005) Using machine learning to break visual human interaction proofs (HIPs). In: Saul L, Weiss Y, Bottou L (eds.) Advances in neural information processing systems, vol 17. MIT Press, Vancouver, pp 265–272. https://proceedings.neurips.cc/paper/2004/file/283085d30e10513624c8cece7993f4de-Paper.pdf
Chow YW, Susilo W (2011) AniCAP: An Animated 3D CAPTCHA scheme based on motion parallax. In: D. Lin, G. Tsudik, X. Wang (eds.) Cryptology and network security: 10th International conference, CANS 2011, Sanya, China, December 10-12, 2011. Proceedings, pp. 255–271. Springer Berlin Heidelberg, Berlin, Heidelberg . https://doi.org/10.1007/978-3-642-25513-7_18
Desai A, Patadia P (2009) Drag and drop: a better approach to CAPTCHA. In: 2009 Annual IEEE India Conference, pp. 1–4 . https://doi.org/10.1109/INDCON.2009.5409359
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139. https://doi.org/10.1006/jcss.1997.1504
Gao H, Tang M, Liu Y, Zhang P, Liu X (2017) Research on the security of microsoft’s two-layer captcha. IEEE Transactions Information Forensics Secur 12(7):1671–1685. https://doi.org/10.1109/TIFS.2017.2682704
Gao H, Wang W, Qi J, Wang X, Liu X, Yan J (2013) The robustness of hollow CAPTCHAs. In: Proceedings of the ACM conference on computer and communications security. pp 1075–1086 . https://doi.org/10.1145/2508859.2516732
Gao H, Yan J, Cao F, Zhang Z, Lei L, Tang M, Zhang P, Zhou X, Wang X, Li J (2016) A simple generic attack on text captchas. In: Network and distributed system security symposium (NDSS 2016), pp. 1–26. https://doi.org/10.14722/ndss.2016.23154
Horak K, Sablatnig R (2019) Deep learning concepts and datasets for image recognition: overview 2019. In: Eleventh international conference on digital image processing (ICDIP 2019), 11179, pp 484–491. SPIE . https://doi.org/10.1117/12.2539806
Kaur K, Behal S (2015) Designing a secure text-based CAPTCHA. Procedia Comput Sci 57:122–125. https://doi.org/10.1016/j.procs.2015.07.381
Kiselak J, Lu Y, Svihra J, Szepe P, Stehlik M (2021) “SPOCU”: scaled polynomial constant unit activation function. Neural Comput Appl 33:3385–3401
Kisel’ák J, Lu Y, Švihra J, Szépe P, Stehlík M (2020) Correction to: SPOCU: scaled polynomial constant unit activation function. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05412-6
Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) Self-Normalizing neural networks. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.) Advances in neural information processing systems 30 (NIPS 2017), vol 30. Curran Associates, Inc., pp 971–980. https://proceedings.neurips.cc/paper/2017/file/5d44ee6f2c3f71b73125876103c8f6c4-Paper.pdf
Mori G, Malik J (2003) Recognizing objects in adversarial clutter: breaking a visual CAPTCHA. 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings. 1, I–I
Murphy KP (2012) Machine learning: a probabilistic perspective, 1, edition. The MIT Press, Cambridge, MA
Nair V, Hinton G (2010) Rectified linear units improve restricted boltzmann machines Vinod Nair. Proceedings of ICML 27:807–814
Nguyen VD, Chow YW, Susilo W (2014) A CAPTCHA scheme based on the identification of character locations. In: X. Huang, J. Zhou (eds.) Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 8434 LNCS, pp. 60–74. Springer, Cham . https://doi.org/10.1007/978-3-319-06320-1_6
Noury Z, Rezaei M (2020) Deep-CAPTCHA: a deep learning based CAPTCHA solver for vulnerability assessment . arXiv:2006.08296
Starostenko O, Cruz-Perez C, Uceda-Ponga F, Alarcon-Aquino V (2015) Breaking text-based CAPTCHAs with variable word and character orientation. Pattern Recognit. https://doi.org/10.1016/j.patcog.2014.09.006
Tang M, Gao H, Zhang Y, Liu Y, Zhang P, Wang P (2018) Research on deep learning techniques in breaking text-based captchas and designing image-based captcha. IEEE Transactions Information Forensics Secur 13(10):2522–2537. https://doi.org/10.1109/TIFS.2018.2821096
Wang P, Gao H, Shi Z, Yuan Z, Hu J (2020) Simple and easy: transfer learning-based attacks to text CAPTCHA. IEEE Access 8:59044–59058. https://doi.org/10.1109/ACCESS.2020.2982945
Yan J, Ahmad ASE (2008) Breaking visual CAPTCHAs with naive pattern recognition algorithms. In: Twenty-Third annual computer security applications conference (ACSAC 2007), pp. 279–297 . https://doi.org/10.1109/acsac.2007.4412996
Yang H (2020) GitHub - lepture/captcha: A CAPTCHA library that generates audio and image CAPTCHAs. . https://github.com/lepture/captcha/
Ye G, Tang Z, Fang D, Zhu Z, Feng Y, Xu P, Chen X, Wang Z (2018) Yet another text captcha solver: a generative adversarial network based approach. In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, CCS ’18, pp. 332–348. Association for computing machinery, New York, NY, USA . https://doi.org/10.1145/3243734.3243754
Zhang N, Ebrahimi M, Li W, Chen H (2020) A generative adversarial learning framework for breaking text-based CAPTCHA in the dark web. In: 2020 IEEE International conference on intelligence and security informatics (ISI), pp. 1–6 . https://doi.org/10.1109/ISI49825.2020.9280537
Zi Y, Gao H, Cheng Z, Liu Y (2020) An end-to-end attack on text CAPTCHAs. IEEE Transactions Information Forensics Secur 15:753–766. https://doi.org/10.1109/TIFS.2019.2928622
Acknowledgements
The completion of this paper was made possible by the grant No. FEKT-S-20-6205 - “Research in Automation, Cybernetics and Artificial Intelligence within Industry 4.0” financially supported by the Internal science fund of Brno University of Technology.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bostik, O., Horak, K., Kratochvila, L. et al. Semi-supervised deep learning approach to break common CAPTCHAs. Neural Comput & Applic 33, 13333–13343 (2021). https://doi.org/10.1007/s00521-021-05957-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-05957-0