skip to main content
10.1145/3336191.3371795acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Text Recognition Using Anonymous CAPTCHA Answers

Published: 22 January 2020 Publication History

Abstract

Internet companies use crowdsourcing to collect large amounts of data needed for creating products based on machine learning techniques. A significant source of such labels for OCR data sets is (re)CAPTCHA, which distinguishes humans from automated bots by asking them to recognize text and, at the same time, receives new labeled data in this way. An important component of such approach to data collection is the reduction of noisy labels produced by bots and non-qualified users.
In this paper, we address the problem of labeling text images via CAPTCHA, where user identification is generally impossible. We propose a new algorithm to aggregate multiple guesses collected through CAPTCHA. We employ incremental relabeling to minimize the number of guesses needed for obtaining the recognized text of a good accuracy. The aggregation model and the stopping rule for our incremental relabeling are based on novel machine learning techniques and use meta features of CAPTCHA tasks and accumulated guesses. Our experiments show that our approach can provide a large amount of accurately recognized texts using a minimal number of user guesses. Finally, we report the great improvements of an optical character recognition model after implementing our approach in Yandex.

References

[1]
[n. d.]. Google Cloud Vision OCR. https://cloud.google.com/vision/docs/ocr.
[2]
1997. Yandex. https://yandex.com/company/.
[3]
Ittai Abraham, Omar Alonso, Vasilis Kandylas, Rajesh Patel, Steven Shelford, and Aleksandrs Slivkins. 2016. How many workers to ask?: Adaptive exploration for collecting high quality labels. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 473--482.
[4]
Suhas Aggarwal. 2012. CAPTCHAs with a Purpose. In Workshops at the TwentySixth AAAI Conference on Artificial Intelligence.
[5]
Kailash Atal, Ashish Arora, Devendra Singh Sachan, PK Bora, and Amit Sethi. 2013. reCAPTCHA assisted OCR for Devanagiri Texts. In Proceedings of the 1st Indian Workshop on Machine.
[6]
Kartik Audhkhasi, Panayiotis Georgiou, and Shrikanth S Narayanan. 2011. Accurate transcription of broadcast news speech using multiple noisy transcribers and unsupervised reliability metrics. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on.
[7]
Alessandro Bissacco, Mark Cummins, Yuval Netzer, and Hartmut Neven. 2013. Photoocr: Reading text in uncontrolled conditions. In Proceedings of the IEEE International Conference on Computer Vision. 785--792.
[8]
A. P. Dawid and A. M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied statistics (1979), 20--28.
[9]
Pinar Donmez, Jaime G Carbonell, and Jeff Schneider. 2009. Efficiently learning the accuracy of labeling sources for selective sampling. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 259--268.
[10]
Seyda Ertekin, Haym Hirsh, and Cynthia Rudin. 2012. Learning to predict the wisdom of crowds. arXiv preprint arXiv:1204.3611 (2012).
[11]
Keelan Evanini, Derrick Higgins, and Klaus Zechner. 2010. Using Amazon Mechanical Turk for transcription of non-native speech. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon's Mechanical Turk.
[12]
Siamak Faridani and Georg Buscher. 2013. LabelBoost: An Ensemble Model for Ground Truth Inference Using Boosted Trees. In First AAAI Conference on Human Computation and Crowdsourcing.
[13]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics (2001).
[14]
Nguyen Quoc Viet Hung, Nguyen Thanh Tam, Lam Ngoc Tran, and Karl Aberer. 2013. An evaluation of aggregation techniques in crowdsourcing. In International Conference on Web Information Systems Engineering. 1--15.
[15]
S Impedovo, L Ottaviano, and S Occhinegro. 1991. Optical character recognition -- a survey. International Journal of Pattern Recognition and Artificial Intelligence 5, 01n02 (1991), 1--24.
[16]
P G Ipeirotis, F Provost, V S Sheng, and J Wang. 2014. Repeated labeling using multiple noisy labelers. In Data Mining and Knowledge Discovery. 402--441.
[17]
Diane Kelly and Jaime Teevan. 2003. Implicit feedback for inferring user preference: a bibliography. In Acm Sigir Forum, Vol. 37. ACM, 18--28.
[18]
Kurt Alfred Kluever and Richard Zanibbi. 2009. Balancing usability and security in a video CAPTCHA. In Proceedings of the 5th Symposium on Usable Privacy and Security. 14.
[19]
Martin Kopp, Matej Nikl, and Martin Holena. 2017. Breaking CAPTCHAs with Convolutional Neural Networks. In Proceedings of the 17th Conference on Information Technologies-Applications and Theory.
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25. 1097--1105.
[21]
Chen-Yu Lee and Simon Osindero. 2016. Recursive Recurrent Nets With Attention Modeling for OCR in the Wild. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22]
Ping Li, Qiang Wu, and Christopher J Burges. 2008. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in neural information processing systems. 897--904.
[23]
Christopher H Lin, M Mausam, and Daniel S Weld. 2014. To Re(label), or Not To Re(label). In Second AAAI conference on human computation and crowdsourcing.
[24]
Christopher H Lin, M Mausam, and Daniel S Weld. 2016. Re-Active Learning: Active Learning with Relabeling. In AAAI. 1845--1852.
[25]
Matthew Marge, Satanjeev Banerjee, and Alexander I Rudnicky. 2010. Using the Amazon Mechanical Turk for transcription of spoken language. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on.
[26]
Donn Morrison, Stéphane Marchand-Maillet, and Éric Bruno. 2009. TagCaptcha: annotating images with CAPTCHAs. In Proceedings of the ACM SIGKDD Workshop on Human Computation. 44--45.
[27]
Marti Motoyama, Kirill Levchenko, Chris Kanich, Damon McCoy, Geoffrey M. Voelker, and Stefan Savage. 2010. Re: CAPTCHAs: Understanding CAPTCHAsolving Services in an Economic Context. In Proceedings of the 19th USENIX Conference on Security (USENIX Security'10). 28--28.
[28]
P. Ruvolo, J. Whitehill, and J. R Movellan. 2013. Exploiting Commonality and Interaction Effects in Crowdsourcing Tasks Using Latent Factor Models. (2013).
[29]
Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 614--622.
[30]
Alexander Shishkin, Anastasia Bezzubtseva, Valentina Fedorova, Alexey Drutsa, and Gleb Gusev. [n. d.]. Text Recognition Using Anonymous CAPTCHA Answers (Supplementary Materials). https://yadi.sk/i/usrtuCPZNsYO8w.
[31]
Rachele Sprugnoli, Giovanni Moretti, Matteo Fuoli, Diego Giuliani, Luisa Bentivogli, Emanuele Pianta, Roberto Gretter, and Fabio Brugnara. 2013. Comparing two methods for crowdsourcing speech transcription. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[32]
Fabian Stark, Caner Hazrba, Rudolph Triebel, and Daniel Cremers. 2015. CAPTCHA Recognition with Active Deep Learning. In German Conference on Pattern Recognition Workshop.
[33]
Luis Von Ahn, Manuel Blum, Nicholas J Hopper, and John Langford. 2003. CAPTCHA: Using hard AI problems for security. In International Conference on the Theory and Applications of Cryptographic Techniques. 294--311.
[34]
Luis Von Ahn, Benjamin Maurer, Colin McMillen, David Abraham, and Manuel Blum. 2008. reCAPTCHA: Human-based character recognition via web security measures. Science 321, 5895 (2008), 1465--1468.
[35]
Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff. 2011. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. In Proc. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR'11). 21--26.
[36]
Fabian L Wauthier and Michael I Jordan. 2011. Bayesian bias mitigation for crowdsourcing. In Advances in neural information processing systems. 1800--1808.
[37]
Peter Welinder, Steve Branson, Pietro Perona, and Serge J Belongie. 2010. The multidimensional wisdom of crowds. In Advances in neural information processing systems. 2424--2432.
[38]
J. Whitehill, T. Wu, J. Bergsma, J. R Movellan, and P. L Ruvolo. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in neural information processing systems. 2035--2043.
[39]
Jason D Williams, I Dan Melamed, Tirso Alonso, Barbara Hollister, and Jay Wilpon. 2011. Crowd-sourcing for difficult transcription of speech. In Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on.
[40]
Yan Yan, Romer Rosales, Glenn Fung, and Jennifer G Dy. 2011. Active learning from crowds. In ICML, Vol. 11. 1161--1168.
[41]
Liyue Zhao, Gita Sukthankar, and Rahul Sukthankar. 2011. Incremental Relabeling for Active Learning with Noisy Crowdsourced Annotations. In SocialCom/PASSAT.
[42]
Liyue Zhao, Gita Sukthankar, and Rahul Sukthankar. 2012. Importance-weighted label prediction for active learning with noisy annotations. In Pattern Recognition (ICPR), 2012 21st International Conference on.
[43]
Liyue Zhao, Yu Zhang, and Gita Sukthankar. 2014. An active learning approach for jointly estimating worker performance and annotation reliability with crowdsourced data. arXiv preprint arXiv:1401.3836 (2014).
[44]
D. Zhou, Q. Liu, J. C Platt, C. Meek, and N. B Shah. 2015. Regularized minimax conditional entropy for crowdsourcing. arXiv preprint arXiv:1503.07240 (2015).
[45]
Qiang Zhu and Eamonn Keogh. 2010. Using CAPTCHAs to index cultural artifacts. In International Symposium on Intelligent Data Analysis. 245--257.

Cited By

View all
  • (2021)Eye Gaze and Interaction Differences of Holistic Versus Analytic Users in Image-Recognition Human Interaction Proof SchemesHCI for Cybersecurity, Privacy and Trust10.1007/978-3-030-77392-2_5(66-75)Online publication date: 3-Jul-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '20: Proceedings of the 13th International Conference on Web Search and Data Mining
January 2020
950 pages
ISBN:9781450368223
DOI:10.1145/3336191
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 January 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. aggregation
  2. captcha
  3. crowdsourcing
  4. incremental labeling
  5. noisy labels
  6. text recognition

Qualifiers

  • Research-article

Conference

WSDM '20

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Eye Gaze and Interaction Differences of Holistic Versus Analytic Users in Image-Recognition Human Interaction Proof SchemesHCI for Cybersecurity, Privacy and Trust10.1007/978-3-030-77392-2_5(66-75)Online publication date: 3-Jul-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media