Skip to main content
Log in

Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we address two complex issues: 1) Text frame classification and 2) Multi-oriented text detection in video text frame. We first divide a video frame into 16 blocks and propose a combination of wavelet and median-moments with k-means clustering at the block level to identify probable text blocks. For each probable text block, the method applies the same combination of feature with k-means clustering over a sliding window running through the blocks to identify potential text candidates. We introduce a new idea of symmetry on text candidates in each block based on the observation that pixel distribution in text exhibits a symmetric pattern. The method integrates all blocks containing text candidates in the frame and then all text candidates are mapped on to a Sobel edge map of the original frame to obtain text representatives. To tackle the multi-orientation problem, we present a new method called Angle Projection Boundary Growing (APBG) which is an iterative algorithm and works based on a nearest neighbor concept. APBG is then applied on the text representatives to fix the bounding box for multi-oriented text lines in the video frame. Directional information is used to eliminate false positives. Experimental results on a variety of datasets such as non-horizontal, horizontal, publicly available data (Hua’s data) and ICDAR-03 competition data (camera images) show that the proposed method outperforms existing methods proposed for video and the state of the art methods for scene text as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Cai M, Song J and Lyu MR (2002) “A new approach for video text detection”. In Proc ICIP 117–120

  2. Chen D, Odobez JM and Thiran JP (2004) “A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning”. Signal Process Image Commun 205–217

  3. Crandall D and Kasturi R (2001) “Robust detection of stylized text events in digital video”. In Proc ICDAR 865–869

  4. Epshtein B, Ofek E and Wexler Y (2010) “Detecting text in natural scenes with stroke width transform”. CVPR 2963–2970

  5. Guo J, Gurrin C, Lao S, Foley C and Smeaton AF (2011) “Localization and recognition of the scoreboard in sports video on sift point matching”. In Proc MMM 337–347

  6. Hua XS, Wenyin L and Zhang HJ (2004) “An automatic performance evaluation protocol for video text detection algorithms”. IEEE Trans CSVT 498–507

  7. Jain AK and Yu B (1998) “Automatic text location in images and video frames”. Pattern Recogn 2055–2076

  8. Jung K, Kim KI and Jain AK (2004) “Text information extraction in images and video: a survey”. Pattern Recogn 977–997

  9. Li H, Doermann D and Kia O (2000) “Automatic text detection and tracking in digital video”. IEEE Trans IP 147–156

  10. Liu C, Wang C and Dai R (2005) “Text detection in images based on unsupervised classification of edge-based features”. In Proc ICDAR 610–614

  11. Lucas SM (2005) “ICDAR 2005 text locating competition results”. In Proc ICDAR 80–84

  12. Mariano VY and Kasturi R (2000) “Locating uniform-colored text in video frames”. In Proc ICPR 539–542

  13. Minetto R, Thome N, Cord M, Fabrizio J and Marcotegui B (2010) “SNOOPERTEXT: a multiresolution system for text detection in complex visual scenes”. In Proc ICIP 3861–3864

  14. Neumann L and Matas J (2012) “Real-time scene text localization and recognition”. In Proc CVPR 3538–3545

  15. Pan YF, Hou X and Liu CL (2011) “A hybrid approach to detect and localize texts in natural scene images”. IEEE Trans on IP 800–813

  16. Phan TQ, Shivakumara P and Tan CL (2012) “Detecting text in the real world”. In Proc ACM MM 765–768

  17. Sharma N, Pal U and Blumenstein M (2012) “Recent advances in video based document processing: a review”. In Proc DAS 63–68

  18. Sharma N, Shivakumara P, Pal U, Blumenstein M, Chew Lim Tan (2012) “A new method for arbitrarily-oriented text detection in video.” In Proc DAS 74–78

  19. Shivakumara P, Trung Quy Phan and Chew Lim Tan (2010) “New fourier-statistical features in RGB space for video text detection”. IEEE Trans CSVT 1520–1532

  20. Shivakumara P, Dutta A, Phan TQ, Tan CL and Pal U (2011) “A novel mutual nearest neighbor based symmetry for text frame classification in video”. Pattern Recogn 1671–1683

  21. Shivakumara P, Phan TQ and Tan CL (2011) “A laplacian approach to multi-oriented text detection in video”. IEEE Trans PAMI 412–419

  22. Shivakumara P, Sreedhar RP, Phan TQ, Lu S and Tan CL (2012) “Multi-oriented video scene text detection through bayesian classification and boundary growing”. IEEE Trans CSVT 1227–1235

  23. Wang X, Huang L and Liu C (2009) “A new block partitioned features for text verification”. In Proc ICDAR 366–370

  24. Wong EK and Chen M (2003) “A new robust algorithm for video text extraction”. Pattern Recogn 1397–1406

  25. Wu W, Chen X and Yang J (2004) “Incremental detection of text on road signs from video with applications to a driving assistant systems”. In Proc ACM MM 852–859

  26. Xu C, Wang J, Wan K, Li Y and Duan L (2006) “Live sports event detection based on broadcast video and web-casting text”. In Proc ACM MM 221–230

  27. Yao C, Bai X, Liu W, Ma Y and Tu Z (2012) “Detecting texts of arbitrary orientations in natural images”. In Proc CVPR 1083–1090

  28. Yi C and Tian Y (2011) “Text string detection from natural scenes by structure-based partition and grouping”. IEEE Trans Image Process 2594–2605

  29. Zang J and Kasturi R (2008) “Extraction of text objects in video documents: recent progress”. In Proc DAS 5–17

  30. Zhang D and Chang SF (2002) “Event detection in baseball video using superimposed caption recognition”. In Proc ACM MM 315–318

  31. Zhang J, Goldgof D and Kasturi R (2008) “A new edge-based text verification approach for video”. In Proc ICPR

  32. Zhou J, Xu L, Xiao B and Dai R (2007) “A robust system for text extraction in video”. In Proc ICMV 119–124

Download references

Acknowledgments

This work is done jointly by National University of Singapore and Indian Statistical Institute, Kolkata, India. This research is supported in part by the A*STAR grant 092 101 0051 (WBS no. R252-000-402-305). We thank the anonymous reviewers for their valuable comments and suggestions that improve the quality of the work. Our special thanks to Prof. Andy Ming-Ham Yip, Department of Mathematics, National University of Singapore for his helpful discussion and comments on wavelet operations and other mathematical details.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Palaiahnakote Shivakumara.

Additional information

Originality and contribution

There are three main contributions in this work that are (1) text frame classification by proposing new symmetry features on text candidates, (2) multi-oriented text detection in video with good accuracy, where we have proposed new angle projection boundary growing method to tackle the multi-orientation problem and (3) achieving the best accuracy for ICDAR-03 data according to ICDAR-03 measures compared to the state of the art methods. Originality: (1) The way we combine wavelet-median moments, (2) Defining symmetry based on text pattern appearance, (3) use of directional features for false positive elimination and (4) angle projection boundary growing method for traversing multi-oriented texts.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shivakumara, P., Dutta, A., Tan, C.L. et al. Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. Multimed Tools Appl 72, 515–539 (2014). https://doi.org/10.1007/s11042-013-1385-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1385-0

Keywords

Navigation