A framework for improved video text detection and recognition

Yang, Haojin; Quehl, Bernhard; Sack, Harald

doi:10.1007/s11042-012-1250-6

A framework for improved video text detection and recognition

Published: 11 October 2012

Volume 69, pages 217–245, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Haojin Yang¹,
Bernhard Quehl¹ &
Harald Sack¹

2909 Accesses
26 Citations
Explore all metrics

Abstract

Text displayed in a video is an essential part for the high-level semantic information of the video content. Therefore, video text can be used as a valuable source for automated video indexing in digital video libraries. In this paper, we propose a workflow for video text detection and recognition. In the text detection stage, we have developed a fast localization-verification scheme, in which an edge-based multi-scale text detector first identifies potential text candidates with high recall rate. Then, detected candidate text lines are refined by using an image entropy-based filter. Finally, Stroke Width Transform (SWT)- and Support Vector Machine (SVM)-based verification procedures are applied to eliminate the false alarms. For text recognition, we have developed a novel skeleton-based binarization method in order to separate text from complex backgrounds to make it processible for standard OCR (Optical Character Recognition) software. Operability and accuracy of proposed text detection and binarization methods have been evaluated by using publicly available test data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optical Character Recognition Systems

Image and video processing on mobile devices: a survey

Article 21 June 2021

OCR-MRD: performance analysis of different optical character recognition engines for medical report digitization

Article 24 November 2023

Notes

http://www.youtube.com
Mediaglobe is a SME project of the THESEUS research program, supported by the German Federal Ministry of Economics and Technology on the basis of a decision by the German Bundestag, cf. http://www.projekt-mediaglobe.de/ (last access: 14/09/2012).
Text localization is the first task of “reading text in born-digital images (web and email)” challenge.
http://www.yovisto.com/labs/VideoOCR/ (last access: 14/09/2012)
http://trecvid.nist.gov/ (last access: 14/09/2012)
http://code.google.com/p/tesseract-ocr/ (last access: 14/09/2012)
http://liris.cnrs.fr/christian.wolf/software/binarize/index.html (last access:14/09/2012)
\(\mathit{k}\) serves as a constant parameter used to determine the local threshold in [21, 27, 34].
http://hunspell.sourceforge.net/ (last access: 14/09/2012)
http://finereader.abbyy.com/

References

Anthimopoulos M, Gatos B, Pratikakis I (2010) A two-stage scheme for text detection in video images. J Image Vis Comput 28:1413–1426
Article Google Scholar
Bhaskar H, Mihaylova L (2010) Combined feature-level video indexing using block-based motion estimation. In: Proc. of 13th conference on information fusion (FUSION). Edinburgh, pp 1–8
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698
Article Google Scholar
Chen D, Odobez JM, Bourlard H (2004) Text detection and recognition in images and video frames. J Pattern Recogn Soc 37(3):595–608
Google Scholar
Deza MM, Deza E (2009) Encyclopedia of distances. Springer
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: Proc. of international conference on computer vision and pattern recognition, pp 2963–2970
Gllavata J, Ewerth R, Freisleben B (2004) Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In: Proceedings of 17th international conference on (ICPR’04), vol 1, pp 425–428
Gllavata J, Qeli E, Freisleben B (2006) Detecting text in videos using fuzzy clustering ensembles. In: Proceedings of the 8th IEEE international symposium on multimedia, ISM ’06. IEEE Computer Society. Washington, DC, pp 283–290
Hanif SM, Prevost L (2009) Text detection and localization in complex scene images using constrained adaboost algorithm. In: Proceedings of the 2009 10th international conference on document analysis and recognition, ICDAR ’09. IEEE Computer Society. Washington, DC, pp 1–5
Hua XS, Chen XY, Zhang HJ (2001) Automatic location of text in video frames. In: Proc. of ACM multimedia 2001 workshops: multimedia information retrieval, pp 24–27
Hua XS, Liu WY, Zhang HJ (2004) An automatic performance evaluation protocol for video text detection algorithms. IEEE Trans Circuits Syst Video Technol 14(4):498–507
Google Scholar
ICDAR RWR (2011) http://www.cvc.uab.es/icdar2011competition/?com=results (last access: 10/07/2012)
Jung K, Kim KI, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recogn 37(5):977–997
Article Google Scholar
Karatzas D, Mestre SR, Mas J, Nourbakhsh F, Roy PP (2011) Icdar 2011 robust reading competition: challenge 1: reading text in born-digital images (web and email). In: Proc. international conference on document analysis and recognition (ICDAR). Beijing, pp 1485–1490
Keysers D (2006) Comparison and combination of state-of-the-art techniques for handwritten character recognition: topping the mnist benchmark
Kim HH (2011) Toward video semantic search based on a structured folksonomy. J Am Soc Inf Sci Technol 62(3):478–492
Google Scholar
Kim KI, Jung K, Park SH, Kim HJ (2001) Support vector machine-based text detection in digital video. Pattern Recogn 34(2):527–529
Article Google Scholar
Li H, Kia O, Doermann D (1999) Text emhancement in digital video. In: Proc. of SPIE, document recognition IV, pp 1–8
Li H, Doermann DS, Kia OE (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(1):147–156
Article Google Scholar
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256–268
Google Scholar
Niblack W (1986) An introduction to digital image processing. Prentice-Hall, Englewood Cliffs
Google Scholar
Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29(1):51–59
Article Google Scholar
Otsu N (1978) A threshold selection method from gray level histogram. IEEE Trans Syst Man Cybern 19(1):62–66
Google Scholar
Pan YF, Hou X, Liu CL (2008) A robust system to detect and localize texts in natural scene images. In: Proceedings of the 2008 the eighth IAPR international workshop on document analysis systems, DAS ’08. IEEE Computer Society. Washington, DC, pp 35–42
Qian X, Liu G, Wang H, Su R (2007) Text detection, localization and tracking in compressed video. In: Proc. of international conference on signal processing: image communication, pp 752–768
Sato T, Kanade T, Hughes EK, Smith MA, Satoh S (1999) Video OCR: indexing digital new libraries by recognition of superimposed captions. Multimedia Syst 7(5):385–395
Article Google Scholar
Sauvola J, Pietikainen M (2000) Adaptive document image binarization. Pattern Recogn 33(2):225–236
Article Google Scholar
Serra J (1983) Image analysis and mathematical morphology. Academic Press, Orlando
Google Scholar
Shivakumara P, Phan TQ, Tan CL (2009) Video text detection based on filters and edge features. In: Proc. of the 2009 international conference on multimedia and expo. IEEE, pp 1–4
Sobel I (1990) An isotropic 3×3 image gradient operator. In: Machine version for three-dimentional scenes, pp 376–379
Sobottka K, Bunke H, Kronenberg H (1999) Identification of terxt on colored book and journal covers. In: Proc. of international conference on document analysis and recognition, pp 57–63
Sonnenburg S, Rätsch G, Henschel S, Widmer C, Behr J, Zien A, Bona, FD, Binder A, Gehl C, Franc V (2010) The shogun machine learning toolbox. J Mach Learn Res 11:1799–1802
MATH Google Scholar
Thillou CM, Gosselin B (2007) Color text extraction with selective metric-based clustering. Comput Vis Image Underst 107:1–2
Article Google Scholar
Wolf C, Jolion JM, Chassaing F (2002) Text localization, enhancement and binarization in multimedia documents. In: Proc. of the international conference on pattern recognition, vol 2, pp 1037–1040
Yang H, Siebert M, Lühner P, Sack H, Meinel C (2011) Automatic lecture video indexing using video OCR technology. In: Proc. of international symposium on multimedia (ISM), pp 111–116
Zeng C, Ma H (2010) Robust head-shoulder detection by pca-based multilevel hog-lbp detector for people counting. In: Proceedings of the 2010 20th international conference on pattern recognition, ICPR ’10. IEEE Computer Society. Washington, DC, pp 2069–2072
Zhao M, Li S, Kwok J (2010) Text detection in images using sparse representation with discriminative dictionaries. J Image Vis Comput 28:1590–1599
Article Google Scholar
Zhong Y, Zhang HJ, Jain A (2000) Automatic caption localization in compressed video. IEEE Trans Pattern Anal Mach Intell 22(4):385–392
Google Scholar
Zhou Z, Li L, Tan CL (2010) Edge based binarization for video text images. In: Proc. of 20th international conference on pattern recognition. Singapore, pp 133–136

Download references

Acknowledgement

This work has been supported by the Mediaglobe project. Mediaglobe is a SME project of the THESEUS research program, supported by the German Federal Ministry of Economics and Technology on the basis of a decision by the German Bundestag (FKZ: 01MQ09031).

Author information

Authors and Affiliations

Hasso-Plattner-Institute for IT-Systems Engineering, University of Potsdam, Prof.-Dr.-Helmert Str. 2-4, 14467, Potsdam, Germany
Haojin Yang, Bernhard Quehl & Harald Sack

Authors

Haojin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Quehl
View author publications
You can also search for this author in PubMed Google Scholar
Harald Sack
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haojin Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, H., Quehl, B. & Sack, H. A framework for improved video text detection and recognition. Multimed Tools Appl 69, 217–245 (2014). https://doi.org/10.1007/s11042-012-1250-6

Download citation

Published: 11 October 2012
Issue Date: March 2014
DOI: https://doi.org/10.1007/s11042-012-1250-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for improved video text detection and recognition

Abstract

Access this article

Similar content being viewed by others

Optical Character Recognition Systems

Image and video processing on mobile devices: a survey

OCR-MRD: performance analysis of different optical character recognition engines for medical report digitization

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A framework for improved video text detection and recognition

Abstract

Access this article

Similar content being viewed by others

Optical Character Recognition Systems

Image and video processing on mobile devices: a survey

OCR-MRD: performance analysis of different optical character recognition engines for medical report digitization

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation