Skip to main content
Log in

A framework for improved video text detection and recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Text displayed in a video is an essential part for the high-level semantic information of the video content. Therefore, video text can be used as a valuable source for automated video indexing in digital video libraries. In this paper, we propose a workflow for video text detection and recognition. In the text detection stage, we have developed a fast localization-verification scheme, in which an edge-based multi-scale text detector first identifies potential text candidates with high recall rate. Then, detected candidate text lines are refined by using an image entropy-based filter. Finally, Stroke Width Transform (SWT)- and Support Vector Machine (SVM)-based verification procedures are applied to eliminate the false alarms. For text recognition, we have developed a novel skeleton-based binarization method in order to separate text from complex backgrounds to make it processible for standard OCR (Optical Character Recognition) software. Operability and accuracy of proposed text detection and binarization methods have been evaluated by using publicly available test data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. http://www.youtube.com

  2. Mediaglobe is a SME project of the THESEUS research program, supported by the German Federal Ministry of Economics and Technology on the basis of a decision by the German Bundestag, cf. http://www.projekt-mediaglobe.de/ (last access: 14/09/2012).

  3. Text localization is the first task of “reading text in born-digital images (web and email)” challenge.

  4. http://www.yovisto.com/labs/VideoOCR/ (last access: 14/09/2012)

  5. http://trecvid.nist.gov/ (last access: 14/09/2012)

  6. http://code.google.com/p/tesseract-ocr/ (last access: 14/09/2012)

  7. http://liris.cnrs.fr/christian.wolf/software/binarize/index.html (last access:14/09/2012)

  8. \(\mathit{k}\) serves as a constant parameter used to determine the local threshold in [21, 27, 34].

  9. http://hunspell.sourceforge.net/ (last access: 14/09/2012)

  10. http://finereader.abbyy.com/

References

  1. Anthimopoulos M, Gatos B, Pratikakis I (2010) A two-stage scheme for text detection in video images. J Image Vis Comput 28:1413–1426

    Article  Google Scholar 

  2. Bhaskar H, Mihaylova L (2010) Combined feature-level video indexing using block-based motion estimation. In: Proc. of 13th conference on information fusion (FUSION). Edinburgh, pp 1–8

  3. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698

    Article  Google Scholar 

  4. Chen D, Odobez JM, Bourlard H (2004) Text detection and recognition in images and video frames. J Pattern Recogn Soc 37(3):595–608

    Google Scholar 

  5. Deza MM, Deza E (2009) Encyclopedia of distances. Springer

  6. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: Proc. of international conference on computer vision and pattern recognition, pp 2963–2970

  7. Gllavata J, Ewerth R, Freisleben B (2004) Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In: Proceedings of 17th international conference on (ICPR’04), vol 1, pp 425–428

  8. Gllavata J, Qeli E, Freisleben B (2006) Detecting text in videos using fuzzy clustering ensembles. In: Proceedings of the 8th IEEE international symposium on multimedia, ISM ’06. IEEE Computer Society. Washington, DC, pp 283–290

  9. Hanif SM, Prevost L (2009) Text detection and localization in complex scene images using constrained adaboost algorithm. In: Proceedings of the 2009 10th international conference on document analysis and recognition, ICDAR ’09. IEEE Computer Society. Washington, DC, pp 1–5

  10. Hua XS, Chen XY, Zhang HJ (2001) Automatic location of text in video frames. In: Proc. of ACM multimedia 2001 workshops: multimedia information retrieval, pp 24–27

  11. Hua XS, Liu WY, Zhang HJ (2004) An automatic performance evaluation protocol for video text detection algorithms. IEEE Trans Circuits Syst Video Technol 14(4):498–507

    Google Scholar 

  12. ICDAR RWR (2011) http://www.cvc.uab.es/icdar2011competition/?com=results (last access: 10/07/2012)

  13. Jung K, Kim KI, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recogn 37(5):977–997

    Article  Google Scholar 

  14. Karatzas D, Mestre SR, Mas J, Nourbakhsh F, Roy PP (2011) Icdar 2011 robust reading competition: challenge 1: reading text in born-digital images (web and email). In: Proc. international conference on document analysis and recognition (ICDAR). Beijing, pp 1485–1490

  15. Keysers D (2006) Comparison and combination of state-of-the-art techniques for handwritten character recognition: topping the mnist benchmark

  16. Kim HH (2011) Toward video semantic search based on a structured folksonomy. J Am Soc Inf Sci Technol 62(3):478–492

    Google Scholar 

  17. Kim KI, Jung K, Park SH, Kim HJ (2001) Support vector machine-based text detection in digital video. Pattern Recogn 34(2):527–529

    Article  Google Scholar 

  18. Li H, Kia O, Doermann D (1999) Text emhancement in digital video. In: Proc. of SPIE, document recognition IV, pp 1–8

  19. Li H, Doermann DS, Kia OE (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(1):147–156

    Article  Google Scholar 

  20. Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256–268

    Google Scholar 

  21. Niblack W (1986) An introduction to digital image processing. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  22. Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29(1):51–59

    Article  Google Scholar 

  23. Otsu N (1978) A threshold selection method from gray level histogram. IEEE Trans Syst Man Cybern 19(1):62–66

    Google Scholar 

  24. Pan YF, Hou X, Liu CL (2008) A robust system to detect and localize texts in natural scene images. In: Proceedings of the 2008 the eighth IAPR international workshop on document analysis systems, DAS ’08. IEEE Computer Society. Washington, DC, pp 35–42

  25. Qian X, Liu G, Wang H, Su R (2007) Text detection, localization and tracking in compressed video. In: Proc. of international conference on signal processing: image communication, pp 752–768

  26. Sato T, Kanade T, Hughes EK, Smith MA, Satoh S (1999) Video OCR: indexing digital new libraries by recognition of superimposed captions. Multimedia Syst 7(5):385–395

    Article  Google Scholar 

  27. Sauvola J, Pietikainen M (2000) Adaptive document image binarization. Pattern Recogn 33(2):225–236

    Article  Google Scholar 

  28. Serra J (1983) Image analysis and mathematical morphology. Academic Press, Orlando

    Google Scholar 

  29. Shivakumara P, Phan TQ, Tan CL (2009) Video text detection based on filters and edge features. In: Proc. of the 2009 international conference on multimedia and expo. IEEE, pp 1–4

  30. Sobel I (1990) An isotropic 3×3 image gradient operator. In: Machine version for three-dimentional scenes, pp 376–379

  31. Sobottka K, Bunke H, Kronenberg H (1999) Identification of terxt on colored book and journal covers. In: Proc. of international conference on document analysis and recognition, pp 57–63

  32. Sonnenburg S, Rätsch G, Henschel S, Widmer C, Behr J, Zien A, Bona, FD, Binder A, Gehl C, Franc V (2010) The shogun machine learning toolbox. J Mach Learn Res 11:1799–1802

    MATH  Google Scholar 

  33. Thillou CM, Gosselin B (2007) Color text extraction with selective metric-based clustering. Comput Vis Image Underst 107:1–2

    Article  Google Scholar 

  34. Wolf C, Jolion JM, Chassaing F (2002) Text localization, enhancement and binarization in multimedia documents. In: Proc. of the international conference on pattern recognition, vol 2, pp 1037–1040

  35. Yang H, Siebert M, Lühner P, Sack H, Meinel C (2011) Automatic lecture video indexing using video OCR technology. In: Proc. of international symposium on multimedia (ISM), pp 111–116

  36. Zeng C, Ma H (2010) Robust head-shoulder detection by pca-based multilevel hog-lbp detector for people counting. In: Proceedings of the 2010 20th international conference on pattern recognition, ICPR ’10. IEEE Computer Society. Washington, DC, pp 2069–2072

  37. Zhao M, Li S, Kwok J (2010) Text detection in images using sparse representation with discriminative dictionaries. J Image Vis Comput 28:1590–1599

    Article  Google Scholar 

  38. Zhong Y, Zhang HJ, Jain A (2000) Automatic caption localization in compressed video. IEEE Trans Pattern Anal Mach Intell 22(4):385–392

    Google Scholar 

  39. Zhou Z, Li L, Tan CL (2010) Edge based binarization for video text images. In: Proc. of 20th international conference on pattern recognition. Singapore, pp 133–136

Download references

Acknowledgement

This work has been supported by the Mediaglobe project. Mediaglobe is a SME project of the THESEUS research program, supported by the German Federal Ministry of Economics and Technology on the basis of a decision by the German Bundestag (FKZ: 01MQ09031).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haojin Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, H., Quehl, B. & Sack, H. A framework for improved video text detection and recognition. Multimed Tools Appl 69, 217–245 (2014). https://doi.org/10.1007/s11042-012-1250-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1250-6

Keywords

Navigation