Skip to main content
Log in

Arbitrarily-oriented multi-lingual text detection in video

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Text detection in arbitrarily-oriented multi-lingual video is an emerging area of research because it plays a vital role for developing real-time indexing and retrieval systems. In this paper, we propose to explore moments for identifying text candidates. We introduce a novel idea for determining automatic windows to extract moments for tackling multi-font and multi-sized text in video based on stroke width information. The temporal information is explored to find deviations between moving and non-moving pixels in successive frames iteratively, which results in static clusters containing caption text and dynamic clusters containing scene text, as well as background pixels. The gradient directions of pixels in static and dynamic clusters are analyzed to identify the potential text candidates. Furthermore, boundary growing is proposed that expands the boundary of potential text candidates until it finds neighbor components based on the nearest neighbor criterion. This process outputs text lines appearing in the video. Experimental results on standard video data, namely, ICDAR 2013, ICDAR 2015, YVT videos and on our own English and Multi-lingual videos demonstrate that the proposed method outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Bernsen J (1986) Dynamic thresholding of gray-level images. In Proc. ICPR, 1251–1255

  2. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In Proc CVPR, 2963–2970

  3. Huang X (2011) A novel approach to detecting scene text in video. In Proc ICISP, 469–473

  4. Huang W, Shivakumara P, Tan CL (2008) Detecting moving text in video using temporal information. In Proc ICPR, 1–4

  5. Huang X, Ma H, Ling CX, Gao G (2014) Detecting both superimposed and scene text with multiple languages and multiple alignments in video. MTA 70:1703–1727

    Google Scholar 

  6. Karatzas D, Shafait F, Uchida S, Iwamura M, Boorda LGI, Mestre SR, Mas J, Mota DF, Almazan JA, De las Heras LP (2013) ICDAR 2013 robust reading competition. In Proc. ICDAR, 1115–1124

  7. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanow A, Iwamura M, Matas J, Neumann L, Chandrsekhar VR (2015) ICDAR 2015 Competition on Robust Reading. In Proc ICDAR, 1156–1160

  8. Khare V, Shivakumara P, Raveendran P (2015) A new histogram oriented moments descriptor for multi-oriented moving text detection in video. ESWA 42:7627–7640

    Google Scholar 

  9. Li H, Doermann D, Kia O (2000) Automatic text Detection and tracking in digital video. IEEE Trans. IP 9:147–156

    Google Scholar 

  10. Li L, Li J, Song Y, Wang L (2010) A multiple frame integration and mathematical morphology based technique for video text extraction. In Proc ICCIA, 434–437

  11. Liang G, Shivakumara P, Lu T, Tan CL (2015) Multi-spectral fusion based approach for arbitrarily-oriented scene text detection in video image, IEEE Trans. IP 24(11):4488–4501

  12. Liu X, Wang W (2012) Robustly extracting captions in videos based on stroke-line edges and spatio-temporal analysis. IEEE Trans. MM 14:482–489

    Google Scholar 

  13. Liu C, Wang C, Dai R (2005) Text detection in images based on unsupervised classification of edge-based features. In Proc. ICDAR, 610–614

  14. Liu X, Fu H, Jia Y (2008) Gaussian mixture modeling and learning on neighboring characters for multilingual text extraction in images. Pattern Recogn 41:484–493

    Article  MATH  Google Scholar 

  15. Mi C, Xu Y, Lu H, Xue X (2005) A novel video text extraction approach based on multiple frames. In Proc ICICSP, 678–682

  16. Nguyen P, Wang K, Belongie S (2014) Video text detection and recognition: dataset and benchmark. In Proc WCACV, 776–783

  17. Otsu N (1979) A threshold selection method from gray-level histograms, IEEE Trans. SMAC, 62–66

  18. Phan TQ, Shivakumara P, Tan CL (2012) Detecting text in the real world. In Proc ACMMM, 765–768

  19. Qian X, Wang H, Hou X (2014) Video text detection and localization in intra-frames of H.264/AVC compressed video. MTA 70:1487–1502

    Google Scholar 

  20. Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. ESWA 41:8027–8048

    Google Scholar 

  21. Roy S, Shivakumara P, Roy PP, Pal U, Tan CL (2015) Bayesian classifier for multi-oriented video text recognition system. Pattern Recogn:5554–5565

  22. Shi A, Yao C, Zhang C, Guo Z, Huang F, Bai X (2015) Automatic Script Identification in the Wild. In Proc. ICDAR, 531–535

  23. Shivakumara P, Phan TQ, Tan CL (2010) New fourier-statistical features in rgb space for video text detection. IEEE Trans. CSVT 20(11):1520–1532

  24. Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans. PAMI, 33 412–419

  25. Shivakumara P, Sreedhar RP, Phan TQ, Lu S, Tan CL (2012) Multi-oriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans. CSVT 22:1227–1235

    Google Scholar 

  26. Shivakumara P, Phan TQ, Lu S, Tan CL (2013) Gradient vector flow and grouping based method for arbitrarily-oriented scene text detection in video images. IEEE Trans. CSVT 23:1729–1739

    Google Scholar 

  27. Shivakumara P, Dutta A, Tan CL, Pal U (2014) Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. MTA 72:515–539

    Google Scholar 

  28. Su F, Xu H (2015) Robust seed-based stroke width transform for text detection in natural images. In Proc. ICDAR, 916–920

  29. Tesseract (2016) http://code.google.com/p/tesseract-ocr/

  30. Tian S, Bhattacharya U, Lu S, Su B, Tan CL (2016) Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recogn 51:125–134

    Article  Google Scholar 

  31. Wang YK, Chen JM (2006) Detection video texts using spatial-temporal wavelet transform. In Proc. ICPR, 754–757

  32. Wu L, Shivakumara P, Lu T, Tan CL (2015) A new technique for multi-oriented scene text detection and tracking. IEEE Trans. MM 17:1137–1152

    Google Scholar 

  33. Wu H, Zou BJ, Zhao YQ, Fu HP (2016) An automatic video text detection method based on BP-adaboost

  34. Yang H, Quehl B, Sack H (2014) A framework for improved video text detection and recognition. MTA 69:217–245

    Google Scholar 

  35. Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE. Trans. PAMI 37:1480–1500

    Article  Google Scholar 

  36. Yin XC, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE trans. PAMI 36:970–983

    Article  Google Scholar 

  37. Zhao Z, Lin KH, Fu Y, Hu Y, Liu Y, Huang TS (2011) Text from corners: A novel approach to detect text and caption in videos. IEEE Trans. IP 20:790–799

    MathSciNet  Google Scholar 

  38. Zhou J (2007) A robust system for text extraction in video. In Proc ICMV, 119–124

  39. Zhou Y, Feild J, Miller EL, Wang R (2013) Scene text segmentation via inverse rendering, In Proc. ICDAR, 457–461

Download references

Acknowledgments

The work is also partly supported by the University of Malaya HIR under Grant No: UM.C/625/1/HIR/MOHE/ENG/42. The authors would like to thank the anonymous reviewers for their constructive comments and suggestions, which helped us to improve the quality and to clarify the paper significantly.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Palaiahnakote Shivakumara.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khare, V., Shivakumara, P., Paramesran, R. et al. Arbitrarily-oriented multi-lingual text detection in video. Multimed Tools Appl 76, 16625–16655 (2017). https://doi.org/10.1007/s11042-016-3941-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3941-x

Keywords

Navigation