Arbitrarily-oriented multi-lingual text detection in video

Khare, Vijeta; Shivakumara, Palaiahnakote; Paramesran, Raveendran; Blumenstein, Michael

doi:10.1007/s11042-016-3941-x

Arbitrarily-oriented multi-lingual text detection in video

Published: 20 September 2016

Volume 76, pages 16625–16655, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Vijeta Khare¹,
Palaiahnakote Shivakumara^2,3,
Raveendran Paramesran¹ &
…
Michael Blumenstein⁴

480 Accesses
19 Citations
Explore all metrics

Abstract

Text detection in arbitrarily-oriented multi-lingual video is an emerging area of research because it plays a vital role for developing real-time indexing and retrieval systems. In this paper, we propose to explore moments for identifying text candidates. We introduce a novel idea for determining automatic windows to extract moments for tackling multi-font and multi-sized text in video based on stroke width information. The temporal information is explored to find deviations between moving and non-moving pixels in successive frames iteratively, which results in static clusters containing caption text and dynamic clusters containing scene text, as well as background pixels. The gradient directions of pixels in static and dynamic clusters are analyzed to identify the potential text candidates. Furthermore, boundary growing is proposed that expands the boundary of potential text candidates until it finds neighbor components based on the nearest neighbor criterion. This process outputs text lines appearing in the video. Experimental results on standard video data, namely, ICDAR 2013, ICDAR 2015, YVT videos and on our own English and Multi-lingual videos demonstrate that the proposed method outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decade research on text detection in images/videos: a review

Article 06 June 2019

A Novel Arbitrary-Oriented Multilingual Text Detection in Images/Video

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

Article 25 March 2017

References

Bernsen J (1986) Dynamic thresholding of gray-level images. In Proc. ICPR, 1251–1255
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In Proc CVPR, 2963–2970
Huang X (2011) A novel approach to detecting scene text in video. In Proc ICISP, 469–473
Huang W, Shivakumara P, Tan CL (2008) Detecting moving text in video using temporal information. In Proc ICPR, 1–4
Huang X, Ma H, Ling CX, Gao G (2014) Detecting both superimposed and scene text with multiple languages and multiple alignments in video. MTA 70:1703–1727
Google Scholar
Karatzas D, Shafait F, Uchida S, Iwamura M, Boorda LGI, Mestre SR, Mas J, Mota DF, Almazan JA, De las Heras LP (2013) ICDAR 2013 robust reading competition. In Proc. ICDAR, 1115–1124
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanow A, Iwamura M, Matas J, Neumann L, Chandrsekhar VR (2015) ICDAR 2015 Competition on Robust Reading. In Proc ICDAR, 1156–1160
Khare V, Shivakumara P, Raveendran P (2015) A new histogram oriented moments descriptor for multi-oriented moving text detection in video. ESWA 42:7627–7640
Google Scholar
Li H, Doermann D, Kia O (2000) Automatic text Detection and tracking in digital video. IEEE Trans. IP 9:147–156
Google Scholar
Li L, Li J, Song Y, Wang L (2010) A multiple frame integration and mathematical morphology based technique for video text extraction. In Proc ICCIA, 434–437
Liang G, Shivakumara P, Lu T, Tan CL (2015) Multi-spectral fusion based approach for arbitrarily-oriented scene text detection in video image, IEEE Trans. IP 24(11):4488–4501
Liu X, Wang W (2012) Robustly extracting captions in videos based on stroke-line edges and spatio-temporal analysis. IEEE Trans. MM 14:482–489
Google Scholar
Liu C, Wang C, Dai R (2005) Text detection in images based on unsupervised classification of edge-based features. In Proc. ICDAR, 610–614
Liu X, Fu H, Jia Y (2008) Gaussian mixture modeling and learning on neighboring characters for multilingual text extraction in images. Pattern Recogn 41:484–493
Article MATH Google Scholar
Mi C, Xu Y, Lu H, Xue X (2005) A novel video text extraction approach based on multiple frames. In Proc ICICSP, 678–682
Nguyen P, Wang K, Belongie S (2014) Video text detection and recognition: dataset and benchmark. In Proc WCACV, 776–783
Otsu N (1979) A threshold selection method from gray-level histograms, IEEE Trans. SMAC, 62–66
Phan TQ, Shivakumara P, Tan CL (2012) Detecting text in the real world. In Proc ACMMM, 765–768
Qian X, Wang H, Hou X (2014) Video text detection and localization in intra-frames of H.264/AVC compressed video. MTA 70:1487–1502
Google Scholar
Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. ESWA 41:8027–8048
Google Scholar
Roy S, Shivakumara P, Roy PP, Pal U, Tan CL (2015) Bayesian classifier for multi-oriented video text recognition system. Pattern Recogn:5554–5565
Shi A, Yao C, Zhang C, Guo Z, Huang F, Bai X (2015) Automatic Script Identification in the Wild. In Proc. ICDAR, 531–535
Shivakumara P, Phan TQ, Tan CL (2010) New fourier-statistical features in rgb space for video text detection. IEEE Trans. CSVT 20(11):1520–1532
Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans. PAMI, 33 412–419
Shivakumara P, Sreedhar RP, Phan TQ, Lu S, Tan CL (2012) Multi-oriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans. CSVT 22:1227–1235
Google Scholar
Shivakumara P, Phan TQ, Lu S, Tan CL (2013) Gradient vector flow and grouping based method for arbitrarily-oriented scene text detection in video images. IEEE Trans. CSVT 23:1729–1739
Google Scholar
Shivakumara P, Dutta A, Tan CL, Pal U (2014) Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. MTA 72:515–539
Google Scholar
Su F, Xu H (2015) Robust seed-based stroke width transform for text detection in natural images. In Proc. ICDAR, 916–920
Tesseract (2016) http://code.google.com/p/tesseract-ocr/
Tian S, Bhattacharya U, Lu S, Su B, Tan CL (2016) Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recogn 51:125–134
Article Google Scholar
Wang YK, Chen JM (2006) Detection video texts using spatial-temporal wavelet transform. In Proc. ICPR, 754–757
Wu L, Shivakumara P, Lu T, Tan CL (2015) A new technique for multi-oriented scene text detection and tracking. IEEE Trans. MM 17:1137–1152
Google Scholar
Wu H, Zou BJ, Zhao YQ, Fu HP (2016) An automatic video text detection method based on BP-adaboost
Yang H, Quehl B, Sack H (2014) A framework for improved video text detection and recognition. MTA 69:217–245
Google Scholar
Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE. Trans. PAMI 37:1480–1500
Article Google Scholar
Yin XC, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE trans. PAMI 36:970–983
Article Google Scholar
Zhao Z, Lin KH, Fu Y, Hu Y, Liu Y, Huang TS (2011) Text from corners: A novel approach to detect text and caption in videos. IEEE Trans. IP 20:790–799
MathSciNet Google Scholar
Zhou J (2007) A robust system for text extraction in video. In Proc ICMV, 119–124
Zhou Y, Feild J, Miller EL, Wang R (2013) Scene text segmentation via inverse rendering, In Proc. ICDAR, 457–461

Download references

Acknowledgments

The work is also partly supported by the University of Malaya HIR under Grant No: UM.C/625/1/HIR/MOHE/ENG/42. The authors would like to thank the anonymous reviewers for their constructive comments and suggestions, which helped us to improve the quality and to clarify the paper significantly.

Author information

Authors and Affiliations

Faculty of Engineering, University of Malaya, Kuala Lumpur, Malaysia
Vijeta Khare & Raveendran Paramesran
Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Palaiahnakote Shivakumara
Computer Systems and Information Technology, University of Malaya, BS-18, Annex Building, 50603, Malaysia, Malaysia
Palaiahnakote Shivakumara
School of Software, University of Technology Sydney, Sydney, Australia
Michael Blumenstein

Authors

Vijeta Khare
View author publications
You can also search for this author in PubMed Google Scholar
Palaiahnakote Shivakumara
View author publications
You can also search for this author in PubMed Google Scholar
Raveendran Paramesran
View author publications
You can also search for this author in PubMed Google Scholar
Michael Blumenstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Palaiahnakote Shivakumara.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khare, V., Shivakumara, P., Paramesran, R. et al. Arbitrarily-oriented multi-lingual text detection in video. Multimed Tools Appl 76, 16625–16655 (2017). https://doi.org/10.1007/s11042-016-3941-x

Download citation

Received: 07 February 2016
Revised: 11 July 2016
Accepted: 05 September 2016
Published: 20 September 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11042-016-3941-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Arbitrarily-oriented multi-lingual text detection in video

Abstract

Access this article

Similar content being viewed by others

Decade research on text detection in images/videos: a review

A Novel Arbitrary-Oriented Multilingual Text Detection in Images/Video

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Arbitrarily-oriented multi-lingual text detection in video

Abstract

Access this article

Similar content being viewed by others

Decade research on text detection in images/videos: a review

A Novel Arbitrary-Oriented Multilingual Text Detection in Images/Video

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation