Abstract
Automatically extracting texts from natural images is very useful for many applications such as augmented reality. Most of the existing text detection systems require that the texts to be detected (and recognized) in an image are taken from a nearly frontal viewpoint. However, texts in most images taken naturally by a camera or a mobile phone can have a significant affine or perspective deformation, making the existing text detection and the subsequent OCR engines prone to failures. In this paper, based on stroke width transform and texture invariant low-rank transform, we propose a framework that can detect and rectify texts in arbitrary orientations in the image against complex backgrounds, so that the texts can be correctly recognized by common OCR engines. Extensive experiments show the advantage of our method when compared to the state of art text detection systems.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Although −40∘, −20∘, 0∘, and +20∘ are more natural choices of rotation angles, as in natural scenes there are rarely texts that are exactly horizontal, from our experience these angles are not as effective for detecting texts in all directions.
Notice that this labeling is somewhat different from the ground truth labeling used in the ICDAR dataset in which each word is labeled separately. The dataset will be released for public evaluation if our paper is accepted for publication.
References
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)
Zhang, Z., Ganesh, A., Liang, X., Ma, Y.: TILT: transform invariant low-rank textures. In: ACCV (2010)
Dinh, V.C., Chun, S.S., Cha, S., Ryu, H., Sull, S.: An efficient method for text detection in video based on stroke width similarity. In: ACCV, vol. 1, pp. 200–209 (2007)
Subramanian, K., Natarajan, P., Decerbo, M., Castañón, D.: Character-stroke detection for text-localization and extraction. In: ICDAR, pp. 33–37 (2007)
Jung, C., Liu, Q., Kim, J.: A stroke filter and its application to text localization. Phys. Rev. Lett. 30(2), 114–122 (2009)
Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997 (2004)
Liang, J., Doermann, D.S., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. 7(2–3), 84–104 (2005)
Daniilidis, K., Maragos, P., Paragios, N. (eds.): Proceedings of Computer vision—ECCV 2010, 11th European conference on computer vision, Part I, Heraklion, Crete, Greece, September 5–11, 2010. Lecture Notes in Computer Science, vol. 6311. Springer, Berlin (2010)
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. In: ICDAR, pp. 440–445 (2011)
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV, vol. 3, pp. 770–783 (2010)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)
Shivakumara, P., Phan, T.Q., Tan, C.L.: New wavelet and color features for text detection in video. In: ICPR, pp. 3996–3999 (2010)
Ye, Q., Huang, Q., Gao, W., Zhao, D.: Fast and robust text detection in images and video frames. Image Vis. Comput. 23(6), 565–576 (2005)
Zhong, Y., Zhang, H., Jain, A.K.: Automatic caption localization in compressed video. IEEE TPAMI 22(4), 385–392 (2000)
Shivakumara, P., Huang, W., Tan, C.L.: Efficient video text detection using edge features. In: ICPR, pp. 1–4 (2008)
Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: an automatic system to detect and recognize text in images. IEEE TPAMI 21(11), 1224–1229 (1999)
Kim, K.I., Jung, K., Kim, J.H.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE TPAMI 25(12), 1631–1639 (2003)
Li, H., Doermann, D.S., Kia, O.E.: Automatic text detection and tracking in digital video. IEEE TIP 9(1), 147–156 (2000)
Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR, vol. 2, pp. 366–373 (2004)
Zhang, J., Kasturi, R.: Text detection using edge gradient and graph spectrum. In: ICPR, pp. 3979–3982 (2010)
Shivakumara, P., Phan, T.Q., Tan, C.L.: Video text detection based on filters and edge features. In: ICME, pp. 514–517 (2009)
Yi, J., Peng, Y., Xiao, J.: Color-based clustering for text detection and extraction in image. In: ACM Multimedia, pp. 847–850 (2007)
Liang, J., DeMenthon, D., Doermann, D.: Geometric rectification of camera-captured document images. IEEE TPAMI 30(4), 591–605 (2008)
Pastor, M., Toselli, A., Vidal, E.: Projection profile based algorithm for slant removal. In: Image Analysis and Understanding. Lecture Notes in Computer Science, vol. 3212, pp. 183–190 (2004)
Singha, C., Bhatiab, N., Kaur, A.: Hough transform based fast skew detection and accurate skew correction methods. Pattern Recogn. 14(12), 3528–3546 (2008)
Yuan, B., Tan, C.L.: Convex hull based skew estimation. Pattern Recogn. 40(2), 456–475 (2007)
Yan, H.: Skew correction of document images using interline cross-correlation. CVGIP, Graph. Models Image Process., 55(6), 538–543 (1993)
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: Rasl: robust alignment by sparse and low-rank decomposition for linearly correlated images. In: CVPR, pp. 763–770 (2010)
Lin, Z., Chen, M., Wu, L., Ma, Y.: The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. 2009, uIUC Technical Report UILU-ENG-09-2215. arXiv:1009.5055
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR, pp. 682–687 (2003)
Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011)
Everingham, M., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (voc) challenge 2006 (voc 2006) results
ABBYY Corp.: Abbyy finereader
HP Labs.: “Tesseract-ocr,” 1985–1995. Now available at: http://code.google.com/p/tesseract-ocr/
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)
He, B., Yang, H.: Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper. Res. Lett. 23, 151–161 (1998)
Cai, J., Candès, E., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Yin, W., Hale, E., Zhang, Y.: Fixed-point continuation for ℓ 1-minimization: methodology and convergence. SIAM J. Optim. 19(3), 1107–1130 (2008)
Acknowledgements
X. Zhang and F. Sun are supported by the National Natural Science Foundation of China (NNSFC) under Grant No. 2013CB329403. Z. Lin is supported by the NNSFC under Grant Nos. 61272341, 61231002, and 61121002. Y. Ma is partially supported by the funding of ONR N00014-09-1-0230, NSF CCF 09-64215, NSF IIS 11-16012.
Author information
Authors and Affiliations
Corresponding author
Appendix: Detailed derivation of the ADM for multi-component TILT
Appendix: Detailed derivation of the ADM for multi-component TILT
1.1 A.1 ALM and ADM methods
The alternating direction method (ADM, also called inexact augmented Lagrange multiplier (IALM) method in [29]) is a variant of the augmented Lagrange multiplier (ALM) method. The usual ALM method for solving the following model problem:
where \(f:\mathbb{R}^{n}\rightarrow \mathbb{R}\) and \(h:\mathbb{R}^{n} \rightarrow \mathbb{R}^{m}\), is to minimize over its augmented Lagrangian function:
where μ is a positive scalar and \(y\in \mathbb{R}^{m}\) is the Lagrange multiplier. The ALM method is outlined as Algorithm 1 (see [36] for more details).
Solving the subproblem
is actually not always easy. When f(x) is separable, e.g., f(x)=f 1(a)+f 2(b), where x=(a T,b T)T, we may solve this subproblem by minimizing over a and b alternately until convergence. However, in the case that f(x) is convex and h(x) is a linear function of x, the ADM requires updating a and b only once and it can still be proven that the iteration converges to the optimal solution [37]. The ADM for two variables is outlined as Algorithm 2. The generalization to the multi-variable and multi-constraint case is straightforward.
1.2 A.2 Subproblems of multi-component TILT
When applying ADM to the linearized multi-component TILT, whose augmented Lagrangian function is
where \(\mathcal{P}_{i}(A)\) extracts the ith block of A, we have to solve multiple subproblems:
When solving (11), by leaving out the terms in L that are independent of A i , we see
where
Then by Theorem 2.1 of [38], \(A_{i}^{(k+1)}\) can be obtained by thresholding the singular values of \(\hat{A}_{i}^{(k+1)}\). Namely, if the singular value decomposition (SVD) of \(\hat{A}_{i}^{(k+1)}\) is \(U_{ik}\varSigma_{ik}V_{ik}^{T}\), then
where
is the shrinkage operator.
Similarly, when solving (12) with some elaboration we have
where
in which \(\tilde{A}^{(k+1)}\) is a matrix whose ith block is \(A_{i}^{(k+1)}+(\mu^{(k)})^{-1}Y_{i}^{(k)}\). Then again we can apply Theorem 2.1 of [38] to solve for A (k+1) as we did above for \(A_{i}^{(k+1)}\).
When solving (13), we have
where
Then its well known [39] that the solution to the above problem is
where \(\mathcal{S}\) is the shrinkage operator defined above.
When solving (14), we have
where
So
where J † is the pseudo-inverse of J and \(\widehat{\Delta \tau}^{(k+1)}\) is rearranged into a vector.
1.3 A.3 The complete algorithm
Now we can present the ADM for multi-component TILT as Algorithm 3.
Note that in Algorithm 3, the transformed image D∘τ has to be normalized to prevent the region of interest from shrinking into a black pixel [2]. For more details of implementation issues, such as maintaining the area and aspect ratio, please refer to [2]. We will also post our code online if the paper is accepted.
Rights and permissions
About this article
Cite this article
Zhang, X., Lin, Z., Sun, F. et al. Transform invariant text extraction. Vis Comput 30, 401–415 (2014). https://doi.org/10.1007/s00371-013-0864-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-013-0864-7