Abstract.
This paper describes the robust reading competitions for ICDAR 2003. With the rapid growth in research over the last few years on recognizing text in natural scenes, there is an urgent need to establish some common benchmark datasets and gain a clear understanding of the current state of the art. We use the term ‘robust reading’ to refer to text images that are beyond the capabilities of current commercial OCR packages. We chose to break down the robust reading problem into three subproblems and run competitions for each stage, and also a competition for the best overall system. The subproblems we chose were text locating, character recognition and word recognition. By breaking down the problem in this way, we hoped to gain a better understanding of the state of the art in each of the subproblems. Furthermore, our methodology involved storing detailed results of applying each algorithm to each image in the datasets, allowing researchers to study in depth the strengths and weaknesses of each algorithm. The text-locating contest was the only one to have any entries. We give a brief description of each entry and present the results of this contest, showing cases where the leading entries succeed and fail. We also describe an algorithm for combining the outputs of the individual text locators and show how the combination scheme improves on any of the individual systems.
Similar content being viewed by others
References
Baird H, Popat K (2002) Human interactive proofs and document image analysis. In: Proceedings of the 5th IAPR international workshop on document analysis systems, Princeton, NJ, pp 507-518
Baird HS (1993) Document image defect models and their uses. In: Proceedings of the 2nd IAPR international conference on document analysis and recognition, pp 62-67
Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York
Bieber G, Carpenter J Introduction to service-oriented programming (rev 2.1). http://www.openwings.org/download/specs/ ServiceOrientedIntroduction.pdf
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121-167
Celenk M (1990) A color clustering technique for image segmentation. Comput Vis Graph Image Process 52:145-170
Chang J, Chen X, Hanneman A, Yang J, Waibel A (2002) A robust approach for recognition of text embedded in natural scenes. Proceedings of the international conference on pattern recognition, pp 204-207
Clark P, Mirmehdi M (2000) Combining statistical measures to find image text regions. In: Proceedings of the 15th international conference on pattern recognition, pp 450-453. IEEE Press, New York
Collobert R, Bengio S (2001) SVMTorch: Support vector machines for large-scale regression problems. J Mach Learn Res 1:143-160
Jain AK, Yu B (1998) Automatic text location in images and video frame. Pattern Recog 31(12):2055-2076
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital videos. IEEE Trans Image Process 9(1):147-156
Liang J, Phillips I, Haralick R (1997) Performance evaluation of document layout analysis algorithms on the UW data set. In: Proceedings of SPIE, Document Recognition IV, pp 149-160
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256-268
Liu J, Yang YH (1994) Multiresolution color image segmentation. IEEE Trans Pattern Anal Mach Intell 16:689-700
Lucas S (2002) Web-based evaluation and deployment of pattern recognizers. Proceedings of the international conference on pattern recognition, pp 419-422
Maio D, Maltoni D, Cappelli R, Wayman J, Jain A (2002) Fvc2000: Fingerprint verification competition. IEEE Trans Pattern Anal Mach Intell 24:402-412
Mariano V, Min J, Park J-H, Kasturi R, Mihalcik D, Li H, Doermann D, Drayer T (2002) Performance evaluation of object detection algorithms. In: Proceedings of the 16th international conference on pattern recognition. IEEE Press, New York, 3:965-969
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62-66
Park SH, Yun ID, Lee SU (1998) Color image segmentation based on 3-d clustering: a morphological approach. Pattern Recog 31(8):1061-1076
Pavlidis T (1982) Algorithms for graphics and image processing. Computer Science Press, Rockville, MD
Rahman A, Fairhurst M (2003) Multiple classifier decision combination strategies for character recognition: a review. Int J Doc Anal Recog 5(4):166-194
Todoran L, Worring M, Smeulders A (2002) Data groundtruth, complexity and evaluation measures for color document analysis. In: Proceedings of the 5th IAPR international workshop on document analysis systems, Princeton, NJ, pp 519-531
Trier O, Jain A (1995) Goal-directed evaluation of binarization methods. IEEE Trans Pattern Anal Mach Intell 17:1191-1201
von Ahn L, Blum M, Hopper N, Langford J, Manber U The CAPTCHA project. http://www.captcha.net
Vapnik V (1998) Statistical learning theory. Wiley, New York
Wu V, Manmatha R, Riseman E (1999) Textfinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21(11):1224-1229
Wolf C (2003) Text detection in images taken from videos sequences for semantic indexing. PhD thesis, Institut National de Sciences Appliquées de Lyon, 20, rue Albert Einstein, 69621 Villeurbanne Cedex, France
Wolf C, Jolion J, Chassaing F (2001) Procédé de détection de zones de texte dans une image vidéo. Patent France Télécom, Ref. No. FR 01 06776, June 2001
Wolf C, Jolion J, Laurent C (2003) Extraction d’informations textuelles contenues dans les images et les séquences audio-visuelles par une approche de type machine á vecteurs supports. Patent France Télécom, Ref. No. FR 03 11918, October 2003
Wolf C, Jolion J-M (2002) Extraction and recognition of artificial text in multimedia documents. Technical Report 2002.01, Technical Report, Reconnaissance de Formes et Vision Lab
Wolf C, Jolion J-M (2003) Extraction and recognition of artificial text in multimedia documents. Pattern Anal Appl 6(4):309-326
Wolf C, Jolion J-M, Chassaing F (2002) Text localization, enhancement and binarization in multimedia documents. In: Proceedings of the international conference on pattern recognition, 4:1037-1040
Wu V, Manmatha R, Riseman EM (1997) Finding text in images. In: Proceedings of the 2nd ACM conference on digital libraries, pp 3-12
Author information
Authors and Affiliations
Additional information
Published online: 21 June 2005
Rights and permissions
About this article
Cite this article
Lucas, S.M., Panaretos, A., Sosa, L. et al. ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7, 105–122 (2005). https://doi.org/10.1007/s10032-004-0134-3
Issue Date:
DOI: https://doi.org/10.1007/s10032-004-0134-3