Skip to main content
Log in

Robust Video Text Detection with Morphological Filtering Enhanced MSER

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Video text detection is a challenging problem, since video image background is generally complex and its subtitles often have the problems of color bleeding, fuzzy boundaries and low contrast due to video lossy compression and low resolution. In this paper, we propose a robust framework to solve these problems. Firstly, we exploit gradient amplitude map (GAM) to enhance the edge of an input image, which can overcome the problems of color bleeding and fuzzy boundaries. Secondly, a two-direction morphological filtering is developed to filter background noise and enhance the contrast between background and text. Thirdly, maximally stable extremal region (MSER) is applied to detect text regions with two extreme colors, and we use the mean intensity of the regions as the graph cuts’ label set, and the Euclidean distance of three channels in HSI color space as the graph cuts smooth term, to get optimal segmentations. Finally, we group them into text lines using the geometric characteristics of the text, and then corner detection, multi-frame verification, and some heuristic rules are used to eliminate non-text regions. We test our scheme with some challenging videos, and the results prove that our text detection framework is more robust than previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp.2963-2970.

  2. Zhang J, Kasturi R. Character energy and link energy-based text extraction in scene images. In Proc. the 10th Asian Conference on Computer Vision, Nov. 2010, pp.308-320.

  3. Lyu M R, Song J, Cai M. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology, 2005, 15(2): 243–255.

    Article  Google Scholar 

  4. Huang X, Ma H, Yuan H. A novel video text detection and localization approach. In Proc. the 9th Pacific Rim Conference on Multimedia, Dec. 2008, pp.525-534.

  5. Huang X, Ma H. Automatic detection and localization of natural scene text in video. In Proc. the 20th IEEE International Conference on Pattern Recognition, Aug. 2010, pp.3216-3219.

  6. Zhao X, Lin K H, Fu Y, Hu Y, Liu Y, Huang T S. Text from corners: A novel approach to detect text and caption in videos. IEEE Transactions on Image Processing, 2011, 20(3): 790–799.

    Article  MathSciNet  Google Scholar 

  7. Kim W, Kim C. A new approach for overlay text detection and extraction from complex video scene. IEEE Transactions on Image Processing, 2009, 18(2): 401–411.

    Article  MathSciNet  Google Scholar 

  8. Shivakumara P, Phan T Q, Tan C L. A robust wavelet transform based technique for video text detection. In Proc. the 10th International Conference on Document Analysis and Recognition, Jul. 2009, pp.1285-1289.

  9. Shivakumara P, Phan T Q, Tan C L. A Laplacian approach to multi-oriented text detection in video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 412–419.

    Article  Google Scholar 

  10. Yi C, Tian Y. Text detection in natural scene images by stroke Gabor words. In Proc. the 11th International Conference on Document Analysis and Recognition, Sept. 2011, pp.177-181.

  11. Pan Y F, Hou X, Liu C L. A hybrid approach to detect and localize texts in natural scene images. IEEE Transactions on Image Processing, 2011, 20(3): 800–813.

    Article  MathSciNet  Google Scholar 

  12. Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool L J V. A comparison of affine region detectors. International Journal of Computer Vision, 2005, 65(1/2): 43–72.

    Article  Google Scholar 

  13. Donoser M, Bischof H. Efficient maximally stable extremal region (MSER) tracking. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2006, pp.553-560.

  14. Donoser M, Bischof H, Wiltsche M. Color blob segmentation by MSER analysis. In Proc. IEEE International Conference on Image Processing, Oct. 2006, pp.757-760.

  15. Jackway P. Improved morphological top-hat. Electronics Letters, 2000, 36(14): 1194–1195.

    Article  Google Scholar 

  16. Ye B, Peng J. Small target detection method based on morphology top-hat operator. Journal of Image and Graphics, 2002, 7(7): 638–642. (in Chinese)

    Google Scholar 

  17. Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(11): 1222-1239.

    Article  Google Scholar 

  18. Freedman D, Zhang T. Interactive graph cut based segmentation with shape priors. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2005, pp.755-762.

  19. Yi C, Tian Y. Text string detection from natural scenes by structure-based partition and grouping. IEEE Transactions on Image Processing, 2011, 20(9): 2594–2605.

    Article  MathSciNet  Google Scholar 

  20. Chen H, Tsai S, Schroth G, Chen D, Grzeszczuk R, Girod B. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In Proc. the 18th IEEE International Conference on Image Processing, Sept. 2011, pp.2609-2612.

  21. He X C, Yang N H C. Curvature scale space corner detector with adaptive threshold and dynamic region of support. In Proc. the 17th IEEE International Conference on Pattern Recognition, Aug. 2004, pp.791-794.

  22. Liu X, Wang W. Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Transactions on Multimedia, 2012, 14(2): 482–489.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hu-Chuan Lu.

Additional information

Special Section on Object Recognition

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhuge, YZ., Lu, HC. Robust Video Text Detection with Morphological Filtering Enhanced MSER. J. Comput. Sci. Technol. 30, 353–363 (2015). https://doi.org/10.1007/s11390-015-1528-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-015-1528-z

Keywords

Navigation