Abstract
In this paper, we address two complex issues: 1) Text frame classification and 2) Multi-oriented text detection in video text frame. We first divide a video frame into 16 blocks and propose a combination of wavelet and median-moments with k-means clustering at the block level to identify probable text blocks. For each probable text block, the method applies the same combination of feature with k-means clustering over a sliding window running through the blocks to identify potential text candidates. We introduce a new idea of symmetry on text candidates in each block based on the observation that pixel distribution in text exhibits a symmetric pattern. The method integrates all blocks containing text candidates in the frame and then all text candidates are mapped on to a Sobel edge map of the original frame to obtain text representatives. To tackle the multi-orientation problem, we present a new method called Angle Projection Boundary Growing (APBG) which is an iterative algorithm and works based on a nearest neighbor concept. APBG is then applied on the text representatives to fix the bounding box for multi-oriented text lines in the video frame. Directional information is used to eliminate false positives. Experimental results on a variety of datasets such as non-horizontal, horizontal, publicly available data (Hua’s data) and ICDAR-03 competition data (camera images) show that the proposed method outperforms existing methods proposed for video and the state of the art methods for scene text as well.
Similar content being viewed by others
References
Cai M, Song J and Lyu MR (2002) “A new approach for video text detection”. In Proc ICIP 117–120
Chen D, Odobez JM and Thiran JP (2004) “A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning”. Signal Process Image Commun 205–217
Crandall D and Kasturi R (2001) “Robust detection of stylized text events in digital video”. In Proc ICDAR 865–869
Epshtein B, Ofek E and Wexler Y (2010) “Detecting text in natural scenes with stroke width transform”. CVPR 2963–2970
Guo J, Gurrin C, Lao S, Foley C and Smeaton AF (2011) “Localization and recognition of the scoreboard in sports video on sift point matching”. In Proc MMM 337–347
Hua XS, Wenyin L and Zhang HJ (2004) “An automatic performance evaluation protocol for video text detection algorithms”. IEEE Trans CSVT 498–507
Jain AK and Yu B (1998) “Automatic text location in images and video frames”. Pattern Recogn 2055–2076
Jung K, Kim KI and Jain AK (2004) “Text information extraction in images and video: a survey”. Pattern Recogn 977–997
Li H, Doermann D and Kia O (2000) “Automatic text detection and tracking in digital video”. IEEE Trans IP 147–156
Liu C, Wang C and Dai R (2005) “Text detection in images based on unsupervised classification of edge-based features”. In Proc ICDAR 610–614
Lucas SM (2005) “ICDAR 2005 text locating competition results”. In Proc ICDAR 80–84
Mariano VY and Kasturi R (2000) “Locating uniform-colored text in video frames”. In Proc ICPR 539–542
Minetto R, Thome N, Cord M, Fabrizio J and Marcotegui B (2010) “SNOOPERTEXT: a multiresolution system for text detection in complex visual scenes”. In Proc ICIP 3861–3864
Neumann L and Matas J (2012) “Real-time scene text localization and recognition”. In Proc CVPR 3538–3545
Pan YF, Hou X and Liu CL (2011) “A hybrid approach to detect and localize texts in natural scene images”. IEEE Trans on IP 800–813
Phan TQ, Shivakumara P and Tan CL (2012) “Detecting text in the real world”. In Proc ACM MM 765–768
Sharma N, Pal U and Blumenstein M (2012) “Recent advances in video based document processing: a review”. In Proc DAS 63–68
Sharma N, Shivakumara P, Pal U, Blumenstein M, Chew Lim Tan (2012) “A new method for arbitrarily-oriented text detection in video.” In Proc DAS 74–78
Shivakumara P, Trung Quy Phan and Chew Lim Tan (2010) “New fourier-statistical features in RGB space for video text detection”. IEEE Trans CSVT 1520–1532
Shivakumara P, Dutta A, Phan TQ, Tan CL and Pal U (2011) “A novel mutual nearest neighbor based symmetry for text frame classification in video”. Pattern Recogn 1671–1683
Shivakumara P, Phan TQ and Tan CL (2011) “A laplacian approach to multi-oriented text detection in video”. IEEE Trans PAMI 412–419
Shivakumara P, Sreedhar RP, Phan TQ, Lu S and Tan CL (2012) “Multi-oriented video scene text detection through bayesian classification and boundary growing”. IEEE Trans CSVT 1227–1235
Wang X, Huang L and Liu C (2009) “A new block partitioned features for text verification”. In Proc ICDAR 366–370
Wong EK and Chen M (2003) “A new robust algorithm for video text extraction”. Pattern Recogn 1397–1406
Wu W, Chen X and Yang J (2004) “Incremental detection of text on road signs from video with applications to a driving assistant systems”. In Proc ACM MM 852–859
Xu C, Wang J, Wan K, Li Y and Duan L (2006) “Live sports event detection based on broadcast video and web-casting text”. In Proc ACM MM 221–230
Yao C, Bai X, Liu W, Ma Y and Tu Z (2012) “Detecting texts of arbitrary orientations in natural images”. In Proc CVPR 1083–1090
Yi C and Tian Y (2011) “Text string detection from natural scenes by structure-based partition and grouping”. IEEE Trans Image Process 2594–2605
Zang J and Kasturi R (2008) “Extraction of text objects in video documents: recent progress”. In Proc DAS 5–17
Zhang D and Chang SF (2002) “Event detection in baseball video using superimposed caption recognition”. In Proc ACM MM 315–318
Zhang J, Goldgof D and Kasturi R (2008) “A new edge-based text verification approach for video”. In Proc ICPR
Zhou J, Xu L, Xiao B and Dai R (2007) “A robust system for text extraction in video”. In Proc ICMV 119–124
Acknowledgments
This work is done jointly by National University of Singapore and Indian Statistical Institute, Kolkata, India. This research is supported in part by the A*STAR grant 092 101 0051 (WBS no. R252-000-402-305). We thank the anonymous reviewers for their valuable comments and suggestions that improve the quality of the work. Our special thanks to Prof. Andy Ming-Ham Yip, Department of Mathematics, National University of Singapore for his helpful discussion and comments on wavelet operations and other mathematical details.
Author information
Authors and Affiliations
Corresponding author
Additional information
Originality and contribution
There are three main contributions in this work that are (1) text frame classification by proposing new symmetry features on text candidates, (2) multi-oriented text detection in video with good accuracy, where we have proposed new angle projection boundary growing method to tackle the multi-orientation problem and (3) achieving the best accuracy for ICDAR-03 data according to ICDAR-03 measures compared to the state of the art methods. Originality: (1) The way we combine wavelet-median moments, (2) Defining symmetry based on text pattern appearance, (3) use of directional features for false positive elimination and (4) angle projection boundary growing method for traversing multi-oriented texts.
Rights and permissions
About this article
Cite this article
Shivakumara, P., Dutta, A., Tan, C.L. et al. Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. Multimed Tools Appl 72, 515–539 (2014). https://doi.org/10.1007/s11042-013-1385-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1385-0