Skip to main content

Advertisement

Log in

Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Over last few years, CAPTCHAs are ubiquitously found on internet as a security mechanism to distinguish between humans and spams. The text-based CAPTCHAs offer users to recognize the distorted text from the challenged images. Having based on hard AI problem, they have emerged as a hot research topic in computer vision and machine learning. The contemporary text-based CAPTCHAs are based on the segmentation problem that involves their decomposition into sub-images of individual characters. This is a challenging task for current OCR programs which is not yet solved to a great extent. In this paper, we present a novel segmentation and recognition method which uses simple image processing techniques including thresholding, thinning and pixel count methods along with an artificial neural network for text-based CAPTCHAs. We attack the popular CCT (Crowded Characters Together) based CAPTCHAs and compare our results with other schemes. As overall, our system achieves an overall precision of 51.3, 27.1 and 53.2% for Taobao, MSN and eBay datasets with 1000,500 and 1000 CAPTCHAs respectively. The benefits of this research are twofold: by recognizing text-based CAPTCHAs, we not only explore the weaknesses in the current design but also find a way to segment and recognize the connected characters from images. The proposed algorithm can be used in digitization of ancient books, handwriting recognition and other similar tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ahn LV, Blum M, John L (2004) Telling humans and computers apart automatically. Commun ACM 47(2):56–60

    Article  Google Scholar 

  2. Blumenstein M, Verma B, Basli H (2003) A novel feature extraction technique for the recognition of segmented handwritten characters. In: Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference (pp. 137–141). IEEE

  3. Bursztein E, Martin M, Mitchell J (2011) Text-based CAPTCHA strengths and weaknesses. In: Proceedings of the 18th ACM conference on Computer and communications security, pp. 125–138. ACM

  4. Chandavale AA, Sapkal A (2012) A new approach towards segmentation for breaking CAPTCHA. In: International Conference on Security in Computer Networks and Distributed Systems (pp. 323–335). Springer Berlin Heidelberg

  5. Chellapilla K, Larson K, Simard PY, Czerwinski M (2005) Building segmentation based human-friendly human interaction proofs (HIPs), Human Interactive Proofs pp. 1–26. Springer, Berlin Heidelberg

    Book  Google Scholar 

  6. El Ahmad AS, Yan J, Tayara M (2011) The robustness of Google CAPTCHA’s. Computing Science, Newcastle University

  7. Fang K, Bu Z, Xia ZY (2012) Segmentation of CAPTCHAs based on complex networks. In: International Conference on Artificial Intelligence and Computational Intelligence (pp. 735–743). Springer Berlin Heidelberg

  8. Gao H, Wang X, Cao F, Zhang Z, Lei L, Qi J, Liu X (2016) Robustness of text-based completely automated public turing test to tell computers and humans apart. IET Inf Secur 10(1):45–52

    Article  Google Scholar 

  9. Gao H, Wang W, Fan Y, Qi J, Liu X (2014) The Robustness of “Connecting Characters Together” CAPTCHAs. J Inf Sci Eng 30(2):347–369

    Google Scholar 

  10. Gaurav DD, Ramesh R (2012). A feature extraction technique based on character geometry for character recognition. arXiv preprint arXiv:1202.3884

  11. Huang SY, Lee YK, Bell G, Ou ZH (2010) “An efficient segmentation algorithm for CAPTCHAs”, with line cluttering and character warping. Multimed Tools Appl 48(2):267–289

    Article  Google Scholar 

  12. Mori G, Malik J (2003) Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In: Computer Vision and Pattern Recognition, (Vol. 1, pp. I-134). Proceedings of IEEE Computer Society Conference IEEE

  13. Otsu N (1975) A threshold selection method from gray-level histograms. Automatica 11:285–296

    Article  Google Scholar 

  14. Simard PY (2004) Using machine learning to break visual human interaction proofs. Adv Neural Inf Proces Syst 17:265–272

    Google Scholar 

  15. Starostenko O, Cruz-Perez C, Uceda-Ponga F, Alarcon-Aquino V (2015) Breaking text-based CAPTCHAs with variable word and character orientation. Pattern Recogn 48(4):1101–1112

    Article  Google Scholar 

  16. Yan J, El Ahmad AS (2008) A low-cost attack on a microsoft CAPTCHA. In: Proceedings of the 15th ACM conference on Computer and communications security (pp. 543–554) ACM

  17. Zhang TY, Suen CY (1984) A fast parallel algorithm for thinning digital patterns. Commun ACM 27(3):236–239

    Article  Google Scholar 

  18. Zhang H, Wen X (2014) The recognition of CAPTCHA based on fuzzy matching. In: Foundations of Intelligent Systems (pp. 759–768). Springer Berlin Heidelberg

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafaqat Hussain.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hussain, R., Gao, H. & Shaikh, R.A. Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition. Multimed Tools Appl 76, 25547–25561 (2017). https://doi.org/10.1007/s11042-016-4151-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4151-2

Keywords

Navigation