ICDAR 2003 robust reading competitions: entries, results, and future directions

Lucas, Simon M.; Panaretos, Alex; Sosa, Luis; Tang, Anthony; Wong, Shirley; Young, Robert; Ashida, Kazuki; Nagai, Hiroki; Okamoto, Masayuki; Yamamoto, Hiroaki; Miyao, Hidetoshi; Zhu, JunMin; Ou, WuWen; Wolf, Christian; Jolion, Jean-Michel; Todoran, Leon; Worring, Marcel; Lin, Xiaofan

doi:10.1007/s10032-004-0134-3

ICDAR 2003 robust reading competitions: entries, results, and future directions

Published: July 2005

Volume 7, pages 105–122, (2005)
Cite this article

International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Simon M. Lucas¹,
Alex Panaretos¹,
Luis Sosa¹,
Anthony Tang¹,
Shirley Wong¹,
Robert Young¹,
Kazuki Ashida²,
Hiroki Nagai²,
Masayuki Okamoto²,
Hiroaki Yamamoto²,
Hidetoshi Miyao²,
JunMin Zhu³,
WuWen Ou³,
Christian Wolf⁴,
Jean-Michel Jolion⁴,
Leon Todoran⁵,
Marcel Worring⁵ &
…
Xiaofan Lin⁶

803 Accesses
198 Citations
Explore all metrics

Abstract.

This paper describes the robust reading competitions for ICDAR 2003. With the rapid growth in research over the last few years on recognizing text in natural scenes, there is an urgent need to establish some common benchmark datasets and gain a clear understanding of the current state of the art. We use the term ‘robust reading’ to refer to text images that are beyond the capabilities of current commercial OCR packages. We chose to break down the robust reading problem into three subproblems and run competitions for each stage, and also a competition for the best overall system. The subproblems we chose were text locating, character recognition and word recognition. By breaking down the problem in this way, we hoped to gain a better understanding of the state of the art in each of the subproblems. Furthermore, our methodology involved storing detailed results of applying each algorithm to each image in the datasets, allowing researchers to study in depth the strengths and weaknesses of each algorithm. The text-locating contest was the only one to have any entries. We give a brief description of each entry and present the results of this contest, showing cases where the leading entries succeed and fail. We also describe an algorithm for combining the outputs of the individual text locators and show how the combination scheme improves on any of the individual systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Baird H, Popat K (2002) Human interactive proofs and document image analysis. In: Proceedings of the 5th IAPR international workshop on document analysis systems, Princeton, NJ, pp 507-518
Baird HS (1993) Document image defect models and their uses. In: Proceedings of the 2nd IAPR international conference on document analysis and recognition, pp 62-67
Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York
Bieber G, Carpenter J Introduction to service-oriented programming (rev 2.1). http://www.openwings.org/download/specs/ ServiceOrientedIntroduction.pdf
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121-167
Google Scholar
Celenk M (1990) A color clustering technique for image segmentation. Comput Vis Graph Image Process 52:145-170
Google Scholar
Chang J, Chen X, Hanneman A, Yang J, Waibel A (2002) A robust approach for recognition of text embedded in natural scenes. Proceedings of the international conference on pattern recognition, pp 204-207
Clark P, Mirmehdi M (2000) Combining statistical measures to find image text regions. In: Proceedings of the 15th international conference on pattern recognition, pp 450-453. IEEE Press, New York
Collobert R, Bengio S (2001) SVMTorch: Support vector machines for large-scale regression problems. J Mach Learn Res 1:143-160
Google Scholar
Jain AK, Yu B (1998) Automatic text location in images and video frame. Pattern Recog 31(12):2055-2076
Google Scholar
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital videos. IEEE Trans Image Process 9(1):147-156
Google Scholar
Liang J, Phillips I, Haralick R (1997) Performance evaluation of document layout analysis algorithms on the UW data set. In: Proceedings of SPIE, Document Recognition IV, pp 149-160
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256-268
Google Scholar
Liu J, Yang YH (1994) Multiresolution color image segmentation. IEEE Trans Pattern Anal Mach Intell 16:689-700
Google Scholar
Lucas S (2002) Web-based evaluation and deployment of pattern recognizers. Proceedings of the international conference on pattern recognition, pp 419-422
Maio D, Maltoni D, Cappelli R, Wayman J, Jain A (2002) Fvc2000: Fingerprint verification competition. IEEE Trans Pattern Anal Mach Intell 24:402-412
Google Scholar
Mariano V, Min J, Park J-H, Kasturi R, Mihalcik D, Li H, Doermann D, Drayer T (2002) Performance evaluation of object detection algorithms. In: Proceedings of the 16th international conference on pattern recognition. IEEE Press, New York, 3:965-969
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62-66
Google Scholar
Park SH, Yun ID, Lee SU (1998) Color image segmentation based on 3-d clustering: a morphological approach. Pattern Recog 31(8):1061-1076
Google Scholar
Pavlidis T (1982) Algorithms for graphics and image processing. Computer Science Press, Rockville, MD
Rahman A, Fairhurst M (2003) Multiple classifier decision combination strategies for character recognition: a review. Int J Doc Anal Recog 5(4):166-194
Google Scholar
Todoran L, Worring M, Smeulders A (2002) Data groundtruth, complexity and evaluation measures for color document analysis. In: Proceedings of the 5th IAPR international workshop on document analysis systems, Princeton, NJ, pp 519-531
Trier O, Jain A (1995) Goal-directed evaluation of binarization methods. IEEE Trans Pattern Anal Mach Intell 17:1191-1201
Google Scholar
von Ahn L, Blum M, Hopper N, Langford J, Manber U The CAPTCHA project. http://www.captcha.net
Vapnik V (1998) Statistical learning theory. Wiley, New York
Wu V, Manmatha R, Riseman E (1999) Textfinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21(11):1224-1229
Google Scholar
Wolf C (2003) Text detection in images taken from videos sequences for semantic indexing. PhD thesis, Institut National de Sciences Appliquées de Lyon, 20, rue Albert Einstein, 69621 Villeurbanne Cedex, France
Wolf C, Jolion J, Chassaing F (2001) Procédé de détection de zones de texte dans une image vidéo. Patent France Télécom, Ref. No. FR 01 06776, June 2001
Wolf C, Jolion J, Laurent C (2003) Extraction d’informations textuelles contenues dans les images et les séquences audio-visuelles par une approche de type machine á vecteurs supports. Patent France Télécom, Ref. No. FR 03 11918, October 2003
Wolf C, Jolion J-M (2002) Extraction and recognition of artificial text in multimedia documents. Technical Report 2002.01, Technical Report, Reconnaissance de Formes et Vision Lab
Wolf C, Jolion J-M (2003) Extraction and recognition of artificial text in multimedia documents. Pattern Anal Appl 6(4):309-326
Google Scholar
Wolf C, Jolion J-M, Chassaing F (2002) Text localization, enhancement and binarization in multimedia documents. In: Proceedings of the international conference on pattern recognition, 4:1037-1040
Wu V, Manmatha R, Riseman EM (1997) Finding text in images. In: Proceedings of the 2nd ACM conference on digital libraries, pp 3-12

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Essex, CO4 3SQ, Colchester, UK
Simon M. Lucas, Alex Panaretos, Luis Sosa, Anthony Tang, Shirley Wong & Robert Young
Department of Information Engineering, Faculty of Engineering, Shinshu University, 4-17-1 Wakasato Nagano, 380-8553, Japan
Kazuki Ashida, Hiroki Nagai, Masayuki Okamoto, Hiroaki Yamamoto & Hidetoshi Miyao
Institute of Automation, Chinese Academy of Science, PO Box 2738, 100080, Beijing, P.R. China
JunMin Zhu & WuWen Ou
Lyon Research Center for Images and Intelligent Information Systems (LIRIS), INSA de Lyon, Bt. J. Verne 20, rue Albert Einstein, 69621, Villeurbanne cedex, France
Christian Wolf & Jean-Michel Jolion
Informatics Institute, University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
Leon Todoran & Marcel Worring
Hewlett-Packard Laboratories, 1501 Page Mill Road, MS 1203, CA 94304, Palo Alto, USA
Xiaofan Lin

Authors

Simon M. Lucas
View author publications
You can also search for this author inPubMed Google Scholar
Alex Panaretos
View author publications
You can also search for this author inPubMed Google Scholar
Luis Sosa
View author publications
You can also search for this author inPubMed Google Scholar
Anthony Tang
View author publications
You can also search for this author inPubMed Google Scholar
Shirley Wong
View author publications
You can also search for this author inPubMed Google Scholar
Robert Young
View author publications
You can also search for this author inPubMed Google Scholar
Kazuki Ashida
View author publications
You can also search for this author inPubMed Google Scholar
Hiroki Nagai
View author publications
You can also search for this author inPubMed Google Scholar
Masayuki Okamoto
View author publications
You can also search for this author inPubMed Google Scholar
Hiroaki Yamamoto
View author publications
You can also search for this author inPubMed Google Scholar
Hidetoshi Miyao
View author publications
You can also search for this author inPubMed Google Scholar
JunMin Zhu
View author publications
You can also search for this author inPubMed Google Scholar
WuWen Ou
View author publications
You can also search for this author inPubMed Google Scholar
Christian Wolf
View author publications
You can also search for this author inPubMed Google Scholar
Jean-Michel Jolion
View author publications
You can also search for this author inPubMed Google Scholar
Leon Todoran
View author publications
You can also search for this author inPubMed Google Scholar
Marcel Worring
View author publications
You can also search for this author inPubMed Google Scholar
Xiaofan Lin
View author publications
You can also search for this author inPubMed Google Scholar

Additional information

Published online: 21 June 2005

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lucas, S.M., Panaretos, A., Sosa, L. et al. ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7, 105–122 (2005). https://doi.org/10.1007/s10032-004-0134-3

Download citation

Issue Date: July 2005
DOI: https://doi.org/10.1007/s10032-004-0134-3

Keywords:

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ICDAR 2003 robust reading competitions: entries, results, and future directions

Abstract.

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark

Vision Language Models are blind

Bidirectional extraction and recognition of scene text with layout consistency

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords:

Subscribe and save

Buy Now

ICDAR 2003 robust reading competitions: entries, results, and future directions

Abstract.

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark

Vision Language Models are blind

Bidirectional extraction and recognition of scene text with layout consistency

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Subscribe and save

Buy Now