Abstract
Text detection in arbitrarily-oriented multi-lingual video is an emerging area of research because it plays a vital role for developing real-time indexing and retrieval systems. In this paper, we propose to explore moments for identifying text candidates. We introduce a novel idea for determining automatic windows to extract moments for tackling multi-font and multi-sized text in video based on stroke width information. The temporal information is explored to find deviations between moving and non-moving pixels in successive frames iteratively, which results in static clusters containing caption text and dynamic clusters containing scene text, as well as background pixels. The gradient directions of pixels in static and dynamic clusters are analyzed to identify the potential text candidates. Furthermore, boundary growing is proposed that expands the boundary of potential text candidates until it finds neighbor components based on the nearest neighbor criterion. This process outputs text lines appearing in the video. Experimental results on standard video data, namely, ICDAR 2013, ICDAR 2015, YVT videos and on our own English and Multi-lingual videos demonstrate that the proposed method outperforms the state-of-the-art methods.
Similar content being viewed by others
References
Bernsen J (1986) Dynamic thresholding of gray-level images. In Proc. ICPR, 1251–1255
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In Proc CVPR, 2963–2970
Huang X (2011) A novel approach to detecting scene text in video. In Proc ICISP, 469–473
Huang W, Shivakumara P, Tan CL (2008) Detecting moving text in video using temporal information. In Proc ICPR, 1–4
Huang X, Ma H, Ling CX, Gao G (2014) Detecting both superimposed and scene text with multiple languages and multiple alignments in video. MTA 70:1703–1727
Karatzas D, Shafait F, Uchida S, Iwamura M, Boorda LGI, Mestre SR, Mas J, Mota DF, Almazan JA, De las Heras LP (2013) ICDAR 2013 robust reading competition. In Proc. ICDAR, 1115–1124
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanow A, Iwamura M, Matas J, Neumann L, Chandrsekhar VR (2015) ICDAR 2015 Competition on Robust Reading. In Proc ICDAR, 1156–1160
Khare V, Shivakumara P, Raveendran P (2015) A new histogram oriented moments descriptor for multi-oriented moving text detection in video. ESWA 42:7627–7640
Li H, Doermann D, Kia O (2000) Automatic text Detection and tracking in digital video. IEEE Trans. IP 9:147–156
Li L, Li J, Song Y, Wang L (2010) A multiple frame integration and mathematical morphology based technique for video text extraction. In Proc ICCIA, 434–437
Liang G, Shivakumara P, Lu T, Tan CL (2015) Multi-spectral fusion based approach for arbitrarily-oriented scene text detection in video image, IEEE Trans. IP 24(11):4488–4501
Liu X, Wang W (2012) Robustly extracting captions in videos based on stroke-line edges and spatio-temporal analysis. IEEE Trans. MM 14:482–489
Liu C, Wang C, Dai R (2005) Text detection in images based on unsupervised classification of edge-based features. In Proc. ICDAR, 610–614
Liu X, Fu H, Jia Y (2008) Gaussian mixture modeling and learning on neighboring characters for multilingual text extraction in images. Pattern Recogn 41:484–493
Mi C, Xu Y, Lu H, Xue X (2005) A novel video text extraction approach based on multiple frames. In Proc ICICSP, 678–682
Nguyen P, Wang K, Belongie S (2014) Video text detection and recognition: dataset and benchmark. In Proc WCACV, 776–783
Otsu N (1979) A threshold selection method from gray-level histograms, IEEE Trans. SMAC, 62–66
Phan TQ, Shivakumara P, Tan CL (2012) Detecting text in the real world. In Proc ACMMM, 765–768
Qian X, Wang H, Hou X (2014) Video text detection and localization in intra-frames of H.264/AVC compressed video. MTA 70:1487–1502
Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. ESWA 41:8027–8048
Roy S, Shivakumara P, Roy PP, Pal U, Tan CL (2015) Bayesian classifier for multi-oriented video text recognition system. Pattern Recogn:5554–5565
Shi A, Yao C, Zhang C, Guo Z, Huang F, Bai X (2015) Automatic Script Identification in the Wild. In Proc. ICDAR, 531–535
Shivakumara P, Phan TQ, Tan CL (2010) New fourier-statistical features in rgb space for video text detection. IEEE Trans. CSVT 20(11):1520–1532
Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans. PAMI, 33 412–419
Shivakumara P, Sreedhar RP, Phan TQ, Lu S, Tan CL (2012) Multi-oriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans. CSVT 22:1227–1235
Shivakumara P, Phan TQ, Lu S, Tan CL (2013) Gradient vector flow and grouping based method for arbitrarily-oriented scene text detection in video images. IEEE Trans. CSVT 23:1729–1739
Shivakumara P, Dutta A, Tan CL, Pal U (2014) Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. MTA 72:515–539
Su F, Xu H (2015) Robust seed-based stroke width transform for text detection in natural images. In Proc. ICDAR, 916–920
Tesseract (2016) http://code.google.com/p/tesseract-ocr/
Tian S, Bhattacharya U, Lu S, Su B, Tan CL (2016) Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recogn 51:125–134
Wang YK, Chen JM (2006) Detection video texts using spatial-temporal wavelet transform. In Proc. ICPR, 754–757
Wu L, Shivakumara P, Lu T, Tan CL (2015) A new technique for multi-oriented scene text detection and tracking. IEEE Trans. MM 17:1137–1152
Wu H, Zou BJ, Zhao YQ, Fu HP (2016) An automatic video text detection method based on BP-adaboost
Yang H, Quehl B, Sack H (2014) A framework for improved video text detection and recognition. MTA 69:217–245
Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE. Trans. PAMI 37:1480–1500
Yin XC, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE trans. PAMI 36:970–983
Zhao Z, Lin KH, Fu Y, Hu Y, Liu Y, Huang TS (2011) Text from corners: A novel approach to detect text and caption in videos. IEEE Trans. IP 20:790–799
Zhou J (2007) A robust system for text extraction in video. In Proc ICMV, 119–124
Zhou Y, Feild J, Miller EL, Wang R (2013) Scene text segmentation via inverse rendering, In Proc. ICDAR, 457–461
Acknowledgments
The work is also partly supported by the University of Malaya HIR under Grant No: UM.C/625/1/HIR/MOHE/ENG/42. The authors would like to thank the anonymous reviewers for their constructive comments and suggestions, which helped us to improve the quality and to clarify the paper significantly.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khare, V., Shivakumara, P., Paramesran, R. et al. Arbitrarily-oriented multi-lingual text detection in video. Multimed Tools Appl 76, 16625–16655 (2017). https://doi.org/10.1007/s11042-016-3941-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3941-x